50 lines (32 loc) · 1.08 KB

RAG Docs

A Python application for crawling and scraping documentation using Firecrawl API.

Features

Crawls websites and follows child links
Converts scraped content to markdown format
Saves documentation files with sanitized filenames
Handles duplicate filenames automatically

Requirements

Python 3.x
Firecrawl API key

Installation

pip install firecrawl-py

Usage

Set your Firecrawl API key (recommended: use environment variables)
Update the url and max_pages variables in firecrawlbasics.py
Run the script:

python firecrawlbasics.py

Configuration

The script can be configured by modifying variables in firecrawlbasics.py:

url: The starting URL to crawl
max_pages: Maximum number of pages to crawl
output_folder: Folder to save markdown files
include_paths: Path filters for crawling
exclude_paths: Paths to exclude from crawling

Security Note

⚠️ Important: Move your Firecrawl API key to an environment variable instead of hardcoding it in the script.

License

[Add your license here]