SA-RAG Architecture

Overview

SA-RAG (Semantic-Accelerated Retrieval Augmentation) is a high-performance RAG system designed for enterprise knowledge bases. It combines Rust's performance with Python's flexibility to provide a complete RAG solution.

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Python Layer                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │   RAG    │  │  Client  │  │   LLM    │  │Embedding │   │
│  │ Pipeline │  │   API    │  │ Service  │  │ Service  │   │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘   │
│       │             │              │              │          │
│       └─────────────┴──────────────┴──────────────┘          │
│                         │                                     │
│                    Orchestrator                               │
└─────────────────────────┼─────────────────────────────────────┘
                          │ PyO3
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                      Rust Core                               │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │  Parser  │  │ Indexer  │  │  Graph   │  │  Memory  │   │
│  │          │  │          │  │          │  │          │   │
│  │ Semantic │  │ Vector   │  │ Adjacency│  │ Ebbinghaus│  │
│  │  Nodes   │  │  (HNSW)  │  │   List   │  │   Curve  │   │
│  │          │  │  + BM25  │  │          │  │          │   │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │           Differential Indexing                      │  │
│  │         (Document Version Management)                │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Components

Rust Core

1. Semantic Parser (`parser.rs`)

Purpose: Automatically segment documents into hierarchical semantic nodes.

Algorithm:

Identifies headings (Markdown style #, ##, or numbered sections)
Splits text into paragraphs, sentences, and chunks
Builds parent-child relationships
Creates SemanticNode structures with metadata

Key Features:

Supports both English and Chinese headings
Maintains document hierarchy
Preserves text spans for efficient retrieval

2. Vector Index (`indexer/vector.rs`)

Purpose: High-performance approximate nearest neighbor search.

Algorithm: HNSW (Hierarchical Navigable Small World)

Features:

Multi-layer graph structure
Greedy search from top layer
Dynamic insertion with neighbor selection
Cosine similarity for vector comparison

Parameters:

m: Maximum connections per layer (default: 16)
m_max: Controls layer sparsity (default: 64)
ef_construction: Search width during insertion

3. BM25 Index (`indexer/bm25.rs`)

Purpose: Full-text search with relevance scoring.

Algorithm: BM25 ranking function

Features:

Tokenization and stemming (English)
Inverted index for fast lookup
IDF (Inverse Document Frequency) calculation
TF (Term Frequency) weighting

Formula:

BM25(q, d) = Σ IDF(qi) * (f(qi, d) * (k1 + 1)) / (f(qi, d) + k1 * (1 - b + b * |d| / avgdl))

4. Hybrid Index (`indexer/hybrid.rs`)

Purpose: Combine vector and BM25 search results.

Algorithm: Reciprocal Rank Fusion (RRF)

Formula:

RRF(d) = Σ 1 / (k + rank_i(d))

Features:

Configurable weights for vector and BM25
Automatic score normalization
Top-k result selection

5. Knowledge Graph (`graph/graph.rs`)

Purpose: Store and query semantic relationships.

Structure: Adjacency list representation

Edge Types:

ParentChild: Hierarchical relationships
SemanticSimilarity: Similarity-based connections
Mention: Reference relationships
NextChunk: Sequential relationships
Reference: Cross-references
Synonym: Synonym relationships

Features:

Weighted edges (0.0 - 1.0)
Edge metadata support
Bidirectional edge support
Graph statistics

6. Graph Expansion (`graph/expansion.rs`)

Purpose: Expand retrieval results through graph traversal.

Strategies:

Basic Expansion: Breadth-first expansion
Weighted Expansion: Filter by edge weight threshold
Type-aware Expansion: Prioritize specific edge types
Diverse Expansion: Control expansion diversity
Smart Expansion: Combined strategy with decay

Algorithm:

BFS traversal from seed nodes
Score decay with depth
Type-based filtering
Weight threshold filtering

7. Long-term Memory (`memory.rs`)

Purpose: Store and retrieve long-term memories with temporal decay.

Algorithm: Ebbinghaus Forgetting Curve

Formula:

retention = exp(-decay_rate * time_passed / importance_factor)
score = importance * retention * relevance

Features:

Importance-weighted decay
Access frequency boost (spaced repetition)
Keyword and vector-based retrieval
Automatic memory pruning

8. Differential Indexing (`diff.rs`)

Purpose: Incremental document updates without full re-indexing.

Algorithm: Simplified Myers diff algorithm

Features:

Document version management
Changed segment identification
Partial embedding regeneration
Version history tracking

Python Layer

1. RAG Pipeline (`rag.py`)

Purpose: Orchestrate the complete RAG workflow.

Workflow:

Document indexing with embedding generation
Query rewriting
Hybrid retrieval (vector + BM25)
Graph expansion (optional)
Memory retrieval (optional)
Result fusion
Answer generation with LLM

2. LLM Service (`llm.py`)

Purpose: Interface with various LLM providers.

Supported Providers:

OpenAI (GPT models)
DeepSeek
Local models (via transformers)
Mock mode (for testing)

Features:

Chat completion
RAG-enhanced generation
Context management
Temperature control

3. Embedding Service (`embedding.py`)

Purpose: Generate text embeddings.

Supported Providers:

OpenAI embeddings
DeepSeek embeddings
Local models (sentence-transformers)
Mock mode (deterministic)

Features:

Batch embedding generation
Vector normalization
Cosine similarity calculation

4. Orchestrator (`orchestrator.py`)

Purpose: Coordinate query processing and result fusion.

Features:

Query rewriting (rule-based or LLM-based)
Retrieval strategy planning
Result fusion (RRF, weighted, max)
Graph expansion decision making

5. Client API (`client.py`)

Purpose: High-level API for end users.

Features:

Simplified interface
Statistics tracking
Error handling
Configuration management

Data Flow

Indexing Flow

Document Text
    │
    ▼
Python: RAG.index_documents()
    │
    ▼
Rust: RustCoreEngine.index_documents()
    │
    ├─► Parser: Parse into SemanticNodes
    │       │
    │       ├─► Create nodes with hierarchy
    │       └─► Build parent-child relationships
    │
    ├─► Graph: Add nodes and edges
    │       │
    │       └─► Create ParentChild edges
    │
    ├─► BM25: Index text content
    │       │
    │       └─► Build inverted index
    │
    └─► Python: Generate embeddings
            │
            └─► Rust: Update vector index (HNSW)

Retrieval Flow

User Query
    │
    ▼
Python: RAG.ask()
    │
    ├─► Orchestrator: Rewrite query
    │
    ├─► Embedding: Generate query embedding
    │
    ├─► Rust: Hybrid search
    │       │
    │       ├─► Vector search (HNSW)
    │       └─► BM25 search
    │
    ├─► Rust: Graph expansion (optional)
    │       │
    │       └─► Expand seed nodes
    │
    ├─► Rust: Memory retrieval (optional)
    │
    ├─► Orchestrator: Fuse results (RRF)
    │
    └─► LLM: Generate answer
            │
            └─► Return to user

Performance Characteristics

Indexing

Throughput: ~1000 documents/second (with embeddings)
Memory: O(n) where n is number of nodes
Disk: Optional persistence (not implemented)

Retrieval

Vector Search: O(log n) with HNSW
BM25 Search: O(m) where m is query term count
Graph Expansion: O(k * d) where k is seed nodes, d is depth
Total Latency: <50ms for typical queries

Memory

Storage: O(n) for n memories
Retrieval: O(n) with pruning optimization
Decay Calculation: O(1) per memory

Concurrency

Rust Core: Thread-safe with Arc<Mutex<>> for shared state
Python Layer: GIL-bound (single-threaded Python execution)
Future: Consider async/await for I/O-bound operations

Extensibility

Adding New Retrieval Methods

Implement in Rust core
Expose via PyO3
Integrate in HybridIndex
Add to orchestrator

Adding New LLM Providers

Implement in LLMService
Add provider-specific logic
Update configuration

Adding New Graph Expansion Strategies

Implement in graph/expansion.rs
Add to GraphExpansion struct
Expose via PyO3

Future Improvements

Persistence: Add disk-based storage for indexes
Distributed: Support distributed indexing and retrieval
Async: Async/await support for Python layer
Caching: Add result caching for common queries
Monitoring: Add metrics and observability
Optimization: Further optimize HNSW parameters
Multi-language: Better support for non-English text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SA-RAG Architecture

Overview

System Architecture

Components

Rust Core

1. Semantic Parser (`parser.rs`)

2. Vector Index (`indexer/vector.rs`)

3. BM25 Index (`indexer/bm25.rs`)

4. Hybrid Index (`indexer/hybrid.rs`)

5. Knowledge Graph (`graph/graph.rs`)

6. Graph Expansion (`graph/expansion.rs`)

7. Long-term Memory (`memory.rs`)

8. Differential Indexing (`diff.rs`)

Python Layer

1. RAG Pipeline (`rag.py`)

2. LLM Service (`llm.py`)

3. Embedding Service (`embedding.py`)

4. Orchestrator (`orchestrator.py`)

5. Client API (`client.py`)

Data Flow

Indexing Flow

Retrieval Flow

Performance Characteristics

Indexing

Retrieval

Memory

Concurrency

Extensibility

Adding New Retrieval Methods

Adding New LLM Providers

Adding New Graph Expansion Strategies

Future Improvements

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

SA-RAG Architecture

Overview

System Architecture

Components

Rust Core

1. Semantic Parser (parser.rs)

2. Vector Index (indexer/vector.rs)

3. BM25 Index (indexer/bm25.rs)

4. Hybrid Index (indexer/hybrid.rs)

5. Knowledge Graph (graph/graph.rs)

6. Graph Expansion (graph/expansion.rs)

7. Long-term Memory (memory.rs)

8. Differential Indexing (diff.rs)

Python Layer

1. RAG Pipeline (rag.py)

2. LLM Service (llm.py)

3. Embedding Service (embedding.py)

4. Orchestrator (orchestrator.py)

5. Client API (client.py)

Data Flow

Indexing Flow

Retrieval Flow

Performance Characteristics

Indexing

Retrieval

Memory

Concurrency

Extensibility

Adding New Retrieval Methods

Adding New LLM Providers

Adding New Graph Expansion Strategies

Future Improvements

1. Semantic Parser (`parser.rs`)

2. Vector Index (`indexer/vector.rs`)

3. BM25 Index (`indexer/bm25.rs`)

4. Hybrid Index (`indexer/hybrid.rs`)

5. Knowledge Graph (`graph/graph.rs`)

6. Graph Expansion (`graph/expansion.rs`)

7. Long-term Memory (`memory.rs`)

8. Differential Indexing (`diff.rs`)

1. RAG Pipeline (`rag.py`)

2. LLM Service (`llm.py`)

3. Embedding Service (`embedding.py`)

4. Orchestrator (`orchestrator.py`)

5. Client API (`client.py`)