Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions drivers/sonic/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Sonic — In-Memory DAWGS Graph Driver

Sonic is an in-memory graph database driver for DAWGS. It implements the same `graph.Database` interface as the `pg` driver, giving BloodHound users a zero-infrastructure option with no Postgres required.

`sonic` because it's really fast — no network, no disk, no MVCC overhead.

## Architecture

All data lives in Go maps protected by a `sync.RWMutex`. Adjacency indexes (`outEdges`, `inEdges`) map node IDs to their edge IDs for O(1) neighbor lookup. IDs are assigned via `atomic.Uint64`.

The driver registers itself as `"sonic"` via `dawgs.Register()` in `init()`, following the same pattern as the `pg` driver.

## Files

| File | Purpose |
|------|---------|
| `sonic.go` | `Database` struct, constructor, `graph.Database` interface |
| `transaction.go` | `graph.Transaction` — node/edge CRUD, Cypher dispatch |
| `batch.go` | `graph.Batch` — bulk CRUD with upsert support |
| `queries.go` | `graph.NodeQuery` / `graph.RelationshipQuery` — filtering, fetching, shortest paths |
| `eval.go` | Cypher AST filter evaluation, comparison operators, type coercion |
| `pathfinding.go` | BFS shortest-path algorithm with constraint extraction |
| `cypher.go` | Cypher AST walker — MATCH, WITH, RETURN, WHERE, variable-length paths |
| `execute.go` | `sonicResult` — result set iteration, scanning, value mapping |

## What Works

### Graph Operations (via `graph.*` interfaces)

- **CRUD**: CreateNode, UpdateNode, DeleteNode, CreateRelationship, CreateRelationshipByIDs, UpdateRelationship, DeleteRelationship
- **Node queries**: Filter, Filterf, First, Count, Fetch, FetchIDs, FetchKinds, Delete, Update, Query
- **Relationship queries**: Filter, Filterf, First, Count, Fetch, FetchIDs, FetchKinds, FetchDirection, FetchTriples, FetchAllShortestPaths, Delete, Update, Query
- **Batch upserts**: UpdateNodeBy, UpdateRelationshipBy (identity-based match/create/update)
- **Schema**: AssertSchema, SetDefaultGraph, FetchKinds

### Filter Evaluation

The driver evaluates the Cypher AST that DAWGS query builders produce:

- **Logical**: Conjunction (AND), Disjunction (OR), Negation (NOT), Parenthetical
- **Comparisons**: `=`, `!=`, `<`, `>`, `<=`, `>=`, `IN`, `CONTAINS`, `STARTS WITH`, `ENDS WITH`, `IS NULL`, `IS NOT NULL`
- **Kind matching**: node kinds, edge kinds, start/end node kinds
- **Functions**: `id()`, `type()`, `toLower()`, `toUpper()`, `labels()`, `keys()`
- **Property resolution**: node/edge properties, start/end node properties via `query.EdgeStartSymbol`/`query.EdgeEndSymbol`

### Cypher Execution

Raw Cypher strings are parsed and executed via an AST walker:

- MATCH / OPTIONAL MATCH with node and relationship patterns
- WHERE clause filtering with full expression evaluation
- WITH (scope barriers, projection, aggregation aliases)
- RETURN (*, named projections)
- ORDER BY, LIMIT, SKIP, DISTINCT
- `allShortestPaths()` pattern
- Variable-length relationship patterns (`[*]`, `[*1..3]`)
- Multi-part queries (multiple MATCH/WITH chains)
- Parameter substitution

### Pathfinding

BFS shortest-path implementation that:
- Finds **all** equally-short paths between start and end nodes
- Respects edge kind constraints
- Supports multiple start/end nodes simultaneously
- Uses bidirectional parent tracking for path reconstruction

## What's Not Supported

- **Cypher write operations**: CREATE, DELETE, SET, REMOVE, MERGE return errors. Use the `graph.Transaction` or `graph.Batch` interfaces for writes.
- **UNWIND, quantifiers, filter expressions** in Cypher
- **Aggregation functions** (count, collect, sum, avg, min, max) — return nil stubs in Cypher evaluation
- **OrderBy, Offset, Limit** on `nodeQuery`/`relQuery` — accepted but no-op
- **Persistence** — data lives only in memory, lost on process exit (by design)

## Constraints

- **No persistence** — data is lost when the process exits. By design for the initial version.
- **Coarse locking** — `sync.RWMutex` protects the whole database, not individual operations. Fine for single-user BHE.
- **Non-deterministic ordering** — map iteration means query results may come back in different orders than Postgres.
- **Binding limit** — Cypher execution caps at 100,000 intermediate bindings.
- **Variable-length path depth** — capped at 50 hops with cycle prevention.

## Tests

- **Unit tests** (`sonic_test.go`): CRUD, property filters, shortest paths, Cypher queries (kind filtering, negation, multi-part, variable-length paths, anonymous nodes, concurrent access)
- **Integration tests** (`integration_test.go`): node/relationship operations, attack path finding, batch upserts, parallel fetches against a realistic graph topology
205 changes: 205 additions & 0 deletions drivers/sonic/batch.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
package sonic

import (
"context"
"fmt"

"github.com/specterops/dawgs/graph"
)

type batch struct {
db *Database
ctx context.Context
}

func (b *batch) WithGraph(graphSchema graph.Graph) graph.Batch {
return b
}

func (b *batch) CreateNode(node *graph.Node) error {
id := b.db.newID()
node.ID = id

b.db.mu.Lock()
defer b.db.mu.Unlock()

b.db.nodes[id] = node
return nil
}

func (b *batch) DeleteNode(id graph.ID) error {
b.db.mu.Lock()
defer b.db.mu.Unlock()

for _, edgeID := range b.db.outEdges[id] {
delete(b.db.edges, edgeID)
}
for _, edgeID := range b.db.inEdges[id] {
delete(b.db.edges, edgeID)
}
delete(b.db.outEdges, id)
delete(b.db.inEdges, id)
delete(b.db.nodes, id)
return nil
}

func (b *batch) Nodes() graph.NodeQuery {
return &nodeQuery{db: b.db}
}

func (b *batch) Relationships() graph.RelationshipQuery {
return &relQuery{db: b.db}
}

func (b *batch) UpdateNodeBy(update graph.NodeUpdate) error {
b.db.mu.Lock()
defer b.db.mu.Unlock()

// Try to find an existing node that matches the identity criteria
for _, existing := range b.db.nodes {
if !existing.Kinds.ContainsOneOf(update.IdentityKind) {
continue
}

if matchesIdentity(existing.Properties, update.Node.Properties, update.IdentityProperties) {
// Update existing node: merge kinds and properties
for _, kind := range update.Node.Kinds {
if !existing.Kinds.ContainsOneOf(kind) {
existing.Kinds = append(existing.Kinds, kind)
}
}
if update.Node.Properties != nil {
for key, val := range update.Node.Properties.Map {
existing.Properties.Set(key, val)
}
}
return nil
}
}

// No match — create new node (inline to avoid double-lock)
id := b.db.newID()
update.Node.ID = id
b.db.nodes[id] = update.Node
return nil
}

func (b *batch) CreateRelationship(relationship *graph.Relationship) error {
id := b.db.newID()
relationship.ID = id

b.db.mu.Lock()
defer b.db.mu.Unlock()

b.db.edges[id] = relationship
b.db.outEdges[relationship.StartID] = append(b.db.outEdges[relationship.StartID], id)
b.db.inEdges[relationship.EndID] = append(b.db.inEdges[relationship.EndID], id)
return nil
}

func (b *batch) CreateRelationshipByIDs(startNodeID, endNodeID graph.ID, kind graph.Kind, properties *graph.Properties) error {
b.db.mu.Lock()
defer b.db.mu.Unlock()

if _, ok := b.db.nodes[startNodeID]; !ok {
return fmt.Errorf("start node %d not found", startNodeID)
}
if _, ok := b.db.nodes[endNodeID]; !ok {
return fmt.Errorf("end node %d not found", endNodeID)
}

id := b.db.newID()
rel := &graph.Relationship{
ID: id,
StartID: startNodeID,
EndID: endNodeID,
Kind: kind,
Properties: properties,
}

b.db.edges[id] = rel
b.db.outEdges[startNodeID] = append(b.db.outEdges[startNodeID], id)
b.db.inEdges[endNodeID] = append(b.db.inEdges[endNodeID], id)
return nil
}

func (b *batch) DeleteRelationship(id graph.ID) error {
b.db.mu.Lock()
defer b.db.mu.Unlock()

rel, ok := b.db.edges[id]
if !ok {
return fmt.Errorf("relationship %d not found", id)
}

delete(b.db.edges, id)

// Clean up adjacency indexes
b.db.outEdges[rel.StartID] = removeID(b.db.outEdges[rel.StartID], id)
b.db.inEdges[rel.EndID] = removeID(b.db.inEdges[rel.EndID], id)
return nil
}

func (b *batch) UpdateRelationshipBy(update graph.RelationshipUpdate) error {
b.db.mu.Lock()
defer b.db.mu.Unlock()

rel := update.Relationship

// Try to find an existing relationship that matches
for _, existing := range b.db.edges {
if existing.Kind != rel.Kind {
continue
}
if existing.StartID != rel.StartID || existing.EndID != rel.EndID {
continue
}
if matchesIdentity(existing.Properties, rel.Properties, update.IdentityProperties) {
// Update existing relationship properties
if rel.Properties != nil {
for key, val := range rel.Properties.Map {
existing.Properties.Set(key, val)
}
}
return nil
}
}

// No match — create new relationship (inline to avoid double-lock)
id := b.db.newID()
rel.ID = id
b.db.edges[id] = rel
b.db.outEdges[rel.StartID] = append(b.db.outEdges[rel.StartID], id)
b.db.inEdges[rel.EndID] = append(b.db.inEdges[rel.EndID], id)
return nil
}

func (b *batch) Commit() error {
return nil
}

func matchesIdentity(existing, candidate *graph.Properties, identityKeys []string) bool {
if len(identityKeys) == 0 {
return false
}
for _, key := range identityKeys {
existingVal := existing.Get(key).Any()
candidateVal := candidate.Get(key).Any()
if existingVal == nil || candidateVal == nil {
return false
}
if fmt.Sprint(existingVal) != fmt.Sprint(candidateVal) {
return false
}
}
return true
}

func removeID(ids []graph.ID, target graph.ID) []graph.ID {
for i, id := range ids {
if id == target {
return append(ids[:i], ids[i+1:]...)
}
}
return ids
}
Loading