Files
website-crawler/readme.txt
2025-06-26 00:48:28 +02:00

88 lines
2.5 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
webcrawler_app/
├── webcrawler.py ← the crawler logic (translated to English)
├── crawler_gui.py ← the Streamlit user interface
├── requirements.txt ← with all your dependencies
python -m venv venv
.\venv\Scripts\activate # On Windows
pip install -r requirements.txt
python -m pip install streamlit
python -m streamlit run crawler_gui.py
pyinstaller --noconfirm --onefile --windowed crawler_gui.py
🗂️ Project Structure
webcrawler_project/
├── crawler_cli.py
├── crawler_gui.py
├── webcrawler.py
├── requirements.txt
├── Dockerfile
└── docker-compose.yml
- The gui service gives you a browser interface at http://localhost:8501
- The cli service will run the crawler once and exit — perfect for automation
📦 How to Use It
- Build & run all services:
docker-compose up --build
- Want to run only the GUI?
docker-compose up gui
- Want to run only the CLI job?
docker-compose run --rm cli
- Stop services:
docker-compose down
📁 Saving Output Files
By default, output CSVs are saved inside the container. If you'd like to access them from your host system, update your cli service in docker-compose.yml like this:
volumes:
- ./output:/app/output
And update your crawler_cli.py to save files in output/filename.csv.
🔧 Step 1: Prepare Your Project
Make sure your project folder contains:
crawler-app/
├── crawler_cli.py
├── crawler_gui.py
├── webcrawler.py
├── requirements.txt
├── Dockerfile
└── docker-compose.yml
If you followed the setup I shared earlier, you're all set.
🚀 Step 2: Zip and Upload to Portainer
- Compress your project folder into a .zip file on your local machine.
- Log into Portainer.
- Go to Stacks → Add Stack.
- Give your stack a name (e.g. webcrawler).
- In the Web editor, paste the contents of your docker-compose.yml.
- Scroll down to “Advanced container settings” and upload the zipped project using the “Upload resources” option.
- Click Deploy the stack.
🖥️ Step 3: Access Your Services
- For the GUI, Portainer will expose port 8501 by default — access it via http://your-server-ip:8501.
- The CLI crawler will run on startup and exit. You can re-run it anytime from the Containers view.
✅ Optional Enhancements
- Mount a volume for persistent CSV output:
volumes:
- webcrawler_data:/app/output
- Add environment variables or schedules via Portainers built-in options.