Skip to content

Latest commit

 

History

History
50 lines (32 loc) · 1.08 KB

File metadata and controls

50 lines (32 loc) · 1.08 KB

RAG Docs

A Python application for crawling and scraping documentation using Firecrawl API.

Features

  • Crawls websites and follows child links
  • Converts scraped content to markdown format
  • Saves documentation files with sanitized filenames
  • Handles duplicate filenames automatically

Requirements

  • Python 3.x
  • Firecrawl API key

Installation

pip install firecrawl-py

Usage

  1. Set your Firecrawl API key (recommended: use environment variables)
  2. Update the url and max_pages variables in firecrawlbasics.py
  3. Run the script:
python firecrawlbasics.py

Configuration

The script can be configured by modifying variables in firecrawlbasics.py:

  • url: The starting URL to crawl
  • max_pages: Maximum number of pages to crawl
  • output_folder: Folder to save markdown files
  • include_paths: Path filters for crawling
  • exclude_paths: Paths to exclude from crawling

Security Note

⚠️ Important: Move your Firecrawl API key to an environment variable instead of hardcoding it in the script.

License

[Add your license here]