A semantic code intelligence server designed to enhance Claude Code's context window efficiency through intelligent code analysis and advanced relationship traversal. Provides 80-90% token reduction and reduces follow-up queries by 85%.
Cortex V2.1 addresses Claude Code's primary limitation: 50-70% token waste due to manual, text-based code discovery. Our system provides:
- 80-90% token reduction through semantic understanding
- Multi-hop relationship traversal for complete context discovery
- MCP server architecture for seamless Claude Code integration
- Adaptive context modes balancing structure vs flexibility
Hybrid Local + Cloudflare architecture:
Local Machine Cloudflare
βββββββββββββββββββββββββββββ ββββββββββββββββββββββββββ
File Watcher (chokidar) Worker (cortex-embedder)
β β
Chunker (tree-sitter) βembedβ CF AI (BGE-small-en-v1.5)
β β
Delta detection βββββββββ vector[]
β
HTTP upsert βstoreβ Vectorize (shared index)
β
Semantic search query βsearchβ Vectorize (repoHash filter)
βββββββββ top-k results
Local pipeline: Git operations, chunking, change detection, MCP server, Claude integration
Cloudflare: BGE embedding generation (CF AI), vector storage and similarity search (Vectorize)
Storage: chunk-metadata.json (chunks without embeddings, ~10Γ smaller than old index.json)
- π― Advanced Relationship Traversal: Multi-hop relationship discovery with complete context in single queries
- π ONNX Runtime Stability: External Node.js processes with complete isolation and 10x parallelism
- π» Local Resource Management: Global thresholds for ProcessPoolEmbedder (CPU: 69%/49%, Memory: 78%/69%)
- π Cloud Strategy Separation: CloudflareAI uses API controls (circuit breakers, rate limiting) vs local resource monitoring
- π Signal Cascade System: Reliable parent-child process cleanup with zero orphaned processes
- π Auto-sync Intelligence: Eliminates manual storage commands with intelligent conflict resolution
- π Unified Timestamped Logging: Consistent ISO timestamp formatting across all components with standardized key=value format for improved debugging and readability
- π― Guarded MMR Context Window Optimization: Production-ready Maximal Marginal Relevance system with 95%+ critical set coverage, intelligent diversity balancing, and comprehensive token budget management for optimal context window utilization
- β‘ Workload-Aware Process Growth: Intelligent process scaling based on actual chunk count - prevents unnecessary resource usage for small workloads (β€400 chunks use single process, >400 chunks scale up)
- π¦ Intelligent Embedding Cache: 95-98% performance improvement with content-hash based caching and dual storage
- π§ͺ Embedding Strategy Selection: Auto-selection framework choosing optimal strategy based on dataset size and system resources
- ποΈ Clear Stage-Based Startup: Simplified 3-stage startup system with clear break-line delimiters for enhanced readability and debugging
- π§ Enhanced Error Handling: Improved TypeScript compatibility and robust error reporting throughout the system
- π― File-Content Hash Delta Detection: Fast file-level change detection with SHA256 hashing - eliminates false positives and achieves 7x faster startup times
- π Centralized Storage Architecture: Global constants and utilities for consistent storage path management with complete file paths in logs
- β° Unified Timestamped Logging: Consistent ISO timestamp formatting across all components with standardized logging utilities
- π‘οΈ Intelligent Pre-Rebuild Backup System: Automatic validation and backup of valuable embedding data before destructive operations - only backs up valid data (chunk count > 0), skips empty/corrupt storage
- π Smart File Watching: Real-time code intelligence updates with semantic change detection β IMPLEMENTED
- ποΈ Dual-Mode File Tracking: Git-tracked files processed directly, untracked files via intelligent staging β IMPLEMENTED
- π Smart Dependency Chains: Context window optimization with automatic inclusion of complete function call graphs and dependency context β IMPLEMENTED
- π― Enhanced Predictive Hysteresis: Two-step lookahead with trend dampening prevents resource threshold oscillation - CPU 20% gap (69%β49%), Memory 9% gap (78%β69%) with automatic validation β IMPLEMENTED
cortexyoung/
βββ src/ # Unified source code
β βββ types.ts # Shared types and interfaces
β βββ git-scanner.ts # Git repository scanning
β βββ chunker.ts # Smart code chunking
β βββ embedder.ts # BGE embedding generation
β βββ vector-store.ts # Vector storage and retrieval
β βββ indexer.ts # Main indexing logic
β βββ searcher.ts # Semantic search implementation
β βββ mcp-handlers.ts # MCP request handlers
β βββ mcp-tools.ts # MCP tool definitions
β βββ server.ts # MCP server implementation
β βββ index.ts # CLI entry point
β βββ semantic-watcher.ts # Real-time file watching system
β βββ staging-manager.ts # Dual-mode file tracking manager
β βββ context-invalidator.ts # Chunk invalidation for real-time updates
β βββ process-pool-embedder.ts # CPU + memory adaptive embedding
β βββ guarded-mmr-selector.ts # Maximal Marginal Relevance optimization
β βββ smart-dependency-chain.ts # Smart dependency chain traversal for context optimization
β βββ relationship-traversal-engine.ts # Advanced relationship analysis and graph traversal
β βββ logging-utils.ts # Unified timestamped logging
β βββ storage-constants.ts # Centralized storage path management
β βββ unified-storage-coordinator.ts # Auto-sync dual storage management
βββ dist/ # Compiled JavaScript output
βββ .fastembed_cache/ # Local ML model cache
βββ docs/ # Documentation
Phase 5: Cloudflare Vectorize Migration π§
- Add
[[vectorize]]binding towrangler.toml - Create shared
cortex-vectorsVectorize index (384-dim cosine) - Extend
cloudflare-worker.jswith/upsert,/search,/delete,/embedAndUpsertendpoints - Implement
src/cloudflare-vector-store.ts(replacesPersistentVectorStore) - Backend switching via
CORTEX_VECTOR_BACKEND=cloudflareenv var - Migration script for existing local embeddings β Vectorize
- Eliminate
index.json.gzandembedding-cache.json.gz
See VECTORIZE-MIGRATION-PLAN.md for full implementation plan.
Phase 1: Foundation β
- TypeScript monorepo setup
- Core types and interfaces
- Basic MCP server structure
- Development tooling
Phase 2: Core Implementation β
- Git repository scanner
- AST-based chunking (with fallbacks)
- fastembed-js integration (BGE-small-en-v1.5)
- Vector storage with real embeddings
- Working demo with pure Node.js architecture
- 408 chunks processed with real semantic embeddings
Phase 3: Claude Code Integration β
- Enhanced semantic tools (semantic_search, contextual_read, code_intelligence)
- Token budget management with adaptive context
- Context package formatting with structured groups
- Performance optimization (sub-100ms response times)
- MCP server fully operational on port 8765
- All curl tests passing with real embeddings
- Claude Code integration ready
Phase 4: Advanced Features β
- Smart File Watching System - Real-time semantic file watching fully implemented
- Dual-Mode File Tracking - Separate processing paths: direct indexing for git-tracked files, staging for untracked files
- Context Invalidator - Intelligent chunk management for real-time updates
- Staging Manager - File staging system with size/type filtering
- Real-time Status Tool - MCP tool for monitoring context freshness
- Cross-platform Compatibility - Works on Windows/macOS/Linux via chokidar
# Install dependencies
npm install
# Run demo (downloads BGE model on first run)
npm run demoFirst run: Downloads ~200MB BGE-small-en-v1.5 model to .fastembed_cache/
Subsequent runs: Uses cached model for instant startup
v2.1.9 - Enhanced Predictive Hysteresis System π
- π― Two-Step Predictive Lookahead: Enhanced prediction algorithm with weighted trend analysis using last 4 data points instead of 3
- π‘οΈ Trend Dampening: Aggressive trends (>5%) dampened by 25% to prevent over-prediction and oscillation
- βοΈ Robust Hysteresis Validation: Automatic validation of hysteresis gaps during initialization with warnings for gaps <10%
- π Dual-Constraint Logic: Enhanced STOP/RESUME conditions - STOP on current OR step1 prediction, RESUME only when current AND step2 prediction both safe
- π Enhanced Logging: Detailed hysteresis status tracking with gap information, constraint states (CONSTRAINED/FREE), and prediction values
- ποΈ Optimized Thresholds: CPU hysteresis 20% gap (69%β49%) excellent, Memory hysteresis 9% gap (78%β69%) with oscillation warning
- π§ Intelligent Step2 Dampening: Second prediction step dampened by 20% for additional stability in resource forecasting
- β‘ Production Validation: Tested with 11 processes, 85ms graceful shutdown, prevents oscillation between resource thresholds
v2.1.8 - Centralized Storage Architecture & Compression π
- ποΈ Centralized Storage Constants: New
storage-constants.tsmodule for unified storage path management across all components - π¦ Intelligent File Compression: Automatic compression for large storage files (>10MB) with configurable thresholds and .gz extension support
- π§ Enhanced Storage Utilities: Comprehensive path generation utilities for both local (.cortex) and global (~/.claude/cortex-embeddings) storage
- π― Repository Hash Consistency: Standardized repository identification using
repoName-16charsformat for reliable cross-session storage - β‘ Improved Storage Coordination: Enhanced unified storage coordinator with better error handling and compression support
- π§Ή Code Consolidation: Eliminated duplicate storage path logic across persistent stores and caching systems
- π Complete Path Management: Centralized handling of metadata, relationships, deltas, and embedding cache paths
- π Backward Compatibility: Maintains existing storage structure while adding new compression and organization features
v2.1.7 - Environment Variable Configuration Overhaul π
- π§ Centralized Environment Configuration: New
env-config.tsutility for type-safe configuration management - π·οΈ CORTEX_ Prefix Support: Added support for prefixed environment variables (
CORTEX_PORT,CORTEX_LOG_FILE, etc.) to prevent conflicts β οΈ Backward Compatibility: Maintains support for unprefixed variables with deprecation warnings- π Accurate Documentation: Updated README to reflect all actually implemented environment variables (25+ variables documented)
- π‘οΈ Type Safety: Implemented TypeScript interfaces and validation for all configuration options
- π§Ή Code Cleanup: Replaced direct
process.envaccess with centralizedcortexConfigobject - π Migration Path: Clear upgrade path from unprefixed to prefixed environment variables
- π Dual MCP Integration: Complete installation instructions for both Claude Code (HTTP) and Amazon Q CLI (stdio)
- π Global Configuration: User-level and global scope setup for seamless cross-project availability
v2.1.6 - Performance & Concurrency Optimizations π
- β‘ Parallel Operations: Vector store initialization now uses Promise.all for concurrent directory creation and file operations
- π Smart Health Checks: New
quickHealthCheck()method avoids expensive validation when index is healthy - π Concurrent Processing: Relationship building now runs in parallel with embedding generation for faster indexing
- π Streaming Embeddings: Large datasets (>100 chunks) use streaming generation with batched storage for memory efficiency
- π§ Intelligent Startup: Quick health checks before detailed analysis, reducing startup time for healthy indexes
- π§ Enhanced Logging: Improved argument handling in logging utilities with better error reporting
- πΎ Background Sync: Storage synchronization operations now run in background to avoid blocking startup
v2.1.5 - Complete Timestamp Coverage
- π Fixed missing ISO timestamps: All startup logs now have consistent
[YYYY-MM-DDTHH:mm:ss.sssZ]timestamps - π Enhanced logging consistency: Updated 5 core files (index.ts, startup-stages.ts, hierarchical-stages.ts, git-scanner.ts, persistent-relationship-store.ts)
- π§ Improved debugging experience: Complete timestamp coverage for relationship graph loading, git operations, and storage sync
- β‘ Maintained performance: Zero impact on startup time while adding comprehensive timestamp tracking
- π§Ή Unified logging architecture: All console fallbacks now use timestamped logging utilities
v2.1.4 - Simplified ProcessPool Architecture
- β‘ Removed redundant original strategy: ProcessPool with 1 process handles all workload sizes efficiently
- π§ Fixed 400-chunk batching: No adaptive sizing - always use optimal batch size for BGE-small-en-v1.5
- π§Ή Streamlined strategy selection: Auto-selection now chooses between cached (<500) and process-pool (β₯500)
- π Improved workload management: Only scale processes when chunk count justifies multiple processes
- π Legacy compatibility: Original strategy gracefully redirects to cached with deprecation warning
- π Accurate cleanup messaging: No more misleading "process pool cleanup" when no processes were created
v2.1.3 - Intelligent Embedding Cache & Strategy Selection
- π¦ Intelligent Embedding Cache: 95-98% performance improvement with content-hash based caching
- AST-stable chunk boundaries for optimal cache hit rates
- Dual storage system (local + global) with automatic synchronization
- Real-time hit rate tracking and performance monitoring
- Content invalidation using SHA-256 hashing for collision resistance
- π§ͺ Simplified Strategy Selection Framework: Streamlined auto-selection with ProcessPool backend
< 500 chunks: Cached strategy (intelligent caching + ProcessPool, starts with 1 process)β₯ 500 chunks: ProcessPool strategy (scales to multiple processes)- Original strategy deprecated: ProcessPool with 1 process handles all workload sizes efficiently
- Fixed 400-chunk batching: All strategies use optimal batch size for BGE-small-en-v1.5 model
- Environment variable overrides:
EMBEDDING_STRATEGY=auto|cached|process-pool(originalβcached)
- Performance Results: Single function edit 228s β 0.4s, Feature addition 228s β 2s, File refactoring 228s β 6.5s
v2.1.2 - Incremental Indexing Logic Fixes
- Fixed overly aggressive automatic full indexing that incorrectly discarded valid embeddings
- Corrected delta analysis to handle NEW, MODIFIED, and DELETED files independently
- Removed problematic automatic mode switching based on flawed percentage thresholds
- Ensured incremental mode is always used except for: first time, user explicit request, or complete corruption
- Fixed file hash population bug that prevented proper delta calculation
- Improved embedding preservation - system now keeps 57.9% valid embeddings instead of reprocessing everything
- Simplified file hash architecture - removed unnecessary validation and misleading warnings
- File hashes are rebuilt on startup by design (fast & always accurate, no persistence complexity)
v2.1.1 - Indexing Robustness
- Fixed ENOENT errors when processing deleted files during incremental indexing
- Enhanced Git scanner to properly filter out deleted files in both full and incremental scans
- Improved error handling and logging for better debugging experience
- Ensured consistent file existence validation across all scanning modes
npm run build- Compile TypeScript to JavaScriptnpm run startup- Start server with health checksnpm run shutdown- Clean shutdown with process cleanupnpm run health- HTTP-based health check
npm run server- Start MCP server with real-time watching (default)npm run server -- --no-watch- Start MCP server in static mode onlynpm run start:full- Full indexing modenpm run start:cloudflare- Cloud-based embedder (no local CPU/memory)
npm run demo- Run indexing demo with real embeddingsnpm run test:cpu-memory- Test CPU + memory adaptive scalingnpm run test:cleanup- Test process cleanupnpm run benchmark- Performance benchmarking
npm run server- Real-time enabled by defaultnpm run server -- --no-watch- Disable real-time (static mode only)DISABLE_REAL_TIME=true npm run server- Alternative: disable via environmentnode test-semantic-watching.js- Run comprehensive validation testsnode test-realtime-search.ts- Test dual-mode search functionality
npm run storage:status- Complete status reportnpm run storage:validate- Consistency checknpm run cache:clear-all- Clear all storage (nuclear option)
npm run cache:stats- Shows embedding cache statistics and hit ratesnpm run cache:validate- Validates cache integrity and consistencynpm run cache:clear- Clears both vector and embedding caches
Core Configuration:
PORT- Server port (default: 8765)LOG_FILE- Custom log file path (default: logs/cortex-server.log)DEBUG- Enable debug logging (set to 'true')
Advanced Configuration:
DISABLE_REAL_TIME- Disable real-time file watching (set to 'true')ENABLE_NEW_LOGGING- Enable enhanced logging system (set to 'true')INDEX_MODE- Force indexing mode ('full' | 'incremental' | 'reindex')FORCE_REBUILD- Force complete rebuild (set to 'true')
Embedding & Processing:
EMBEDDING_STRATEGY- Embedding strategy ('auto' | 'cached' | 'process-pool')EMBEDDING_BATCH_SIZE- Batch size for embedding generationEMBEDDING_PROCESS_COUNT- Number of processes for embeddingEMBEDDING_TIMEOUT_MS- Timeout for embedding operationsEMBEDDER_TYPE- Embedder type ('local' | 'cloudflare')
MMR & Search:
CORTEX_MMR_ENABLED- Enable MMR optimization (default: true)CORTEX_MMR_LAMBDA- MMR diversity parameter (0.0-1.0)CORTEX_MMR_TOKEN_BUDGET- Token budget for context windowCORTEX_MMR_DIVERSITY_METRIC- Diversity metric for MMR
Git & Telemetry:
CORTEX_INCLUDE_UNTRACKED- Include untracked files (set to 'true')CORTEX_TELEMETRY_ENABLED- Enable telemetry (default: true)CORTEX_TELEMETRY_SAMPLE_RATE- Telemetry sampling rate (0.0-1.0)CORTEX_TELEMETRY_ANONYMIZATION- Anonymization levelCORTEX_TELEMETRY_RETENTION_DAYS- Data retention period
Note: Environment variables without
CORTEX_prefix may conflict with other applications. Future versions will migrate to prefixed variables for better isolation.
All server activity is logged to both console and file:
- Default location:
logs/cortex-server.log - Custom location: Set
LOG_FILEenvironment variable - Format: JSON structured logs with timestamps
- π― 80-90% token reduction through semantic understanding
- π― Sub-100ms query response times achieved
- π― 408 code chunks indexed with real embeddings
- π― 384-dimensional semantic embeddings via BGE-small-en-v1.5
- π― Pure Node.js - no external dependencies
- π― MCP server ready for production Claude Code integration
- π― Zero orphaned processes guaranteed through signal cascade system
- π― Real-time CPU + memory monitoring every 15 seconds
- π― Intelligent embedding cache with 95-98% performance improvement
- π― Enhanced predictive hysteresis with 20% CPU gap and 9% memory gap preventing oscillation
- π― Real-time updates: File changes processed within seconds of semantic changes
- π― Semantic filtering: Only processes changes that affect code understanding
- π― Minimal overhead: Single dependency (chokidar), leverages existing infrastructure
- π― Cross-platform: Windows/macOS/Linux compatibility through chokidar
Based on architectural review, we've incorporated:
- β Memory-based approach: Semantic embeddings over syntax trees
- β Multi-hop retrieval: Relationship traversal vs flat KNN
β οΈ Adaptive orchestration: Balanced approach vs purely minimal
Our system provides 80-90% token reduction through semantic understanding and multi-hop relationship traversal, addressing Claude Code's primary limitation of 50-70% token waste due to manual, text-based code discovery.
Cortex provides semantic tools via HTTP MCP server for Claude Code:
semantic_searchβ Enhanced code search with vector embeddingscontextual_readβ File reading with semantic context awarenesscode_intelligenceβ High-level semantic codebase analysisrelationship_analysisβ Advanced code relationship discoverytrace_execution_pathβ Function call graph traversalfind_code_patternsβ Pattern-based code discoveryreal_time_statusβ Live context freshness monitoring
Production Results:
- β Sub-100ms response times achieved
- β 6000+ chunks indexed and searchable
- β Real BGE embeddings working in production
- β MCP server operational on port 8765
- β Claude Code integration (HTTP transport)
- Install the MCP server globally:
# Start the Cortex server
cd /path/to/cortexyoung
npm run server
# In another terminal, add to Claude Code (user-level/global)
claude mcp add cortex http://localhost:8765/mcp --transport http --scope user- Verify installation:
claude mcp list
claude mcp get cortex- Use in Claude Code:
/mcp cortex semantic_search query="your search"
/mcp cortex contextual_read path="some/file.ts"
/mcp cortex code_intelligenceStatus: Not currently supported. The Cortex server uses HTTP transport only, while Amazon Q CLI requires stdio transport for MCP servers.
Future Enhancement: Stdio transport support could be added in a future version to enable Amazon Q CLI integration.
For Claude Code (HTTP):
- Type: HTTP MCP Server
- URL:
http://localhost:8765/mcp - Scope: User-level (available in all projects)
- Transport: HTTP
Server Management:
# Start server manually
npm run server
# Or use background process
nohup npm run server > cortex-server.log 2>&1 &
# Health check
curl http://localhost:8765/healthWe welcome contributions to Cortex! Please follow these guidelines:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Ensure all tests pass
- Submit a pull request
For major changes, please open an issue first to discuss what you would like to change.
This project is proprietary and not licensed for public use. All rights reserved.