Skip to content

Support Paddlespatial for geaflow#775

Draft
kitalkuyo-gita wants to merge 35 commits intoapache:masterfrom
kitalkuyo-gita:paddlespatial-new
Draft

Support Paddlespatial for geaflow#775
kitalkuyo-gita wants to merge 35 commits intoapache:masterfrom
kitalkuyo-gita:paddlespatial-new

Conversation

@kitalkuyo-gita
Copy link
Contributor

@kitalkuyo-gita kitalkuyo-gita commented Mar 10, 2026

What changes were proposed in this pull request?

Related to issue-776

How was this PR tested?

  • Tests have Added for the changes
  • Production environment verified

- Replace 'var' with 'IVertex<Object, List<Double>>' in GraphSAGECompute.java
- Fix compilation error in FeatureCollector.getVertexFeatures method
- Ensure compatibility with JDK 8 (var is Java 10+ feature)
- Resolve CI build failure on GitHub Actions

This change fixes the symbol not found error that occurred during Maven
compilation on JDK 8. The var keyword was introduced in Java 10 as local
variable type inference, but this project targets JDK 8.
… compatibility

- Replace 'new FileWriter(File, Charset)' with 'new OutputStreamWriter(new FileOutputStream(File), Charset)'
- Fix compilation errors in GraphSAGEInferIntegrationTest at lines 400, 547, and 555
- Ensure JDK 8 compatibility (FileWriter(File, Charset) is Java 11+ feature)
- Resolve test compilation failure on GitHub Actions CI

This change fixes three occurrences where FileWriter was constructed with
Charset parameter, which is not available in JDK 8. Using OutputStreamWriter
wrapper around FileOutputStream provides the same UTF-8 encoding support while
maintaining JDK 8 compatibility.
- Add Python 3.9 setup step using actions/setup-python@v4
- Install requirements from geaflow-dsl-plan/src/main/resources/requirements.txt
- Include pip cache to speed up subsequent builds
- Verify torch installation with pip list
- Enable full GraphSAGE integration tests in CI

This ensures all Python dependencies (torch, numpy, etc.) are available
for running the GraphSAGE integration tests, preventing ModuleNotFoundError
failures in CI.
This is an empty commit to trigger GitHub Actions CI pipeline.

Changes being tested:
- Python 3.9 setup in CI workflow
- Automatic installation of requirements.txt (torch, numpy, etc.)
- JDK 8 compatibility fixes (var keyword, FileWriter)

Expected result: GraphSAGE integration tests should pass with PyTorch available.
- Add Python 3.9 setup step using actions/setup-python@v4
- Install requirements from geaflow-dsl-plan/src/main/resources/requirements.txt
- Include pip cache to speed up subsequent builds
- Verify torch installation with pip list
- Enable full GraphSAGE integration tests in JDK 11 CI

This mirrors the Python dependency installation from JDK 8 workflow
and ensures GraphSAGE tests can run properly on both JDK versions.
…tyle violations

- Remove unused import: ConnectedComponents
- Remove unused import: LabelPropagation
- Remove unused import: Louvain

These imports were added during merge but not actually used in the code.
Checkstyle was failing with UnusedImports warnings.
- Add import for ConnectedComponents class
- Register ConnectedComponents.class in buildInSqlFunctions list
- Fix GQLAlgorithmTest.testAlgorithmConnectedComponents test failure

The ConnectedComponents algorithm was incorrectly removed in previous
checkstyle fix, causing 'Cannot load graph algorithm implementation of cc' error.
- Add import for LabelPropagation class
- Register LabelPropagation.class in buildInSqlFunctions list
- Fix GQLAlgorithmTest.testAlgorithmLabelPropagation test failure

The LabelPropagation (lpa) algorithm was missing from the function table,
causing 'Cannot load graph algorithm implementation of lpa' error.
- Add import for Louvain class
- Register Louvain.class in buildInSqlFunctions list
- Fix missing Louvain algorithm registration after merge

The Louvain community detection algorithm was lost during previous
merge operations, causing 'Cannot load graph algorithm implementation
of louvain' error in tests.
This comprehensive commit adds full PaddlePaddle (飞桨) inference framework
support to GeaFlow, enabling production deployment of PaddleSpatial graph
neural network models including SAGNN (Spatial Attention Graph Neural Network).

Major Changes:

1. Python Inference Runtime (geaflow-infer):
   - Add baseInferSession.py: Framework-agnostic abstract base class
   - Add paddleInferSession.py: PaddlePaddle session with dynamic/static graph support
   - Refactor infer_server.py: Framework dispatcher for TORCH/PADDLE
   - Refactor inferSession.py: TorchInferSession inherits BaseInferSession
   - Add requirements_paddle.txt: PaddlePaddle dependencies (pgl, paddlespatial)

2. Java Configuration Layer:
   - Update FrameworkConfigKeys: Add INFER_FRAMEWORK_TYPE, PADDLE_GPU_ENABLE configs
   - Update InferEnvironmentContext: Add framework parameter methods
   - Update InferContext: Pass --framework argument to Python subprocess
   - Update InferEnvironmentManager: Pass framework type to install script
   - Enhance install-infer-env.sh: Auto-install paddlepaddle based on CUDA version

3. SAGNN Algorithm Implementation(geaflow-dsl):
   - Add SAGNN.java: Spatial Attention Graph Neural Network algorithm
   - Add PaddleSpatialSAGNNTransFormFunctionUDF.py: User UDF example
   - Add test cases: SAGNNAlgorithmTest, SAGNNInferIntegrationTest
   - Add test data: sagnn_vertex.txt, sagnn_edge.txt
   - Add GQL queries: gql_sagnn_001.sql, gql_sagnn_002.sql, sagnn_graph.sql
   - Register in BuildInSqlFunctionTable: Enable CALL sagnn() syntax

4. Build & Deployment:
   - Add setup_python_env.sh: Python environment setup helper

Configuration Example:

Backward Compatibility:
- Default framework remains TORCH (existing PyTorch workflows unchanged)
- Shared memory IPC layer (mmap_ipc) requires no modifications
- Pickle serialization compatible via numpy conversion

Testing:
- Unit tests for SAGNN algorithm
- Integration tests with PaddlePaddle inference
- End-to-end GQL CALL syntax verification

Fixes: Enable PaddleSpatial model deployment in GeaFlow inference pipeline.
@kitalkuyo-gita kitalkuyo-gita marked this pull request as draft March 10, 2026 02:48
This update refactors the SA-GNN (Spatial Attention Graph Neural Network)
implementation to leverage the official PaddleSpatial library layers, ensuring
better compatibility and production readiness.

Key Changes:

1. Replace Custom Implementations with Official Layers:
   - Remove custom SpatialLocalAGG, SpatialOrientedAGG implementations
   - Import official layers from paddlespatial.networks.sagnn:
     * SpatialLocalAGG – degree-normalised local GCN aggregation
     * SpatialOrientedAGG – direction-aware sector-partitioned aggregation
     * SpatialAttnProp – location-aware multi-head attention propagation

2. Enhanced SAGNNModel Architecture:
   OLD (2-layer):
     Layer 0: SpatialLocalAGG (input → hidden)
     Layer 1: SpatialOrientedAGG (hidden → hidden)
     Projection: Linear(hidden → output)

   NEW (3-layer):
     Layer1: SpatialLocalAGG (input → hidden, with transform)
     Layer 2: SpatialOrientedAGG (hidden → hidden, num_sectors)
     Layer 3: SpatialAttnProp (hidden → hidden, multi-head attention)
     Projection: Linear(num_heads * attn_dim → output)

3. New Configuration Parameters:
   - num_heads: 4 (attention heads for SpatialAttnProp)
   - dropout: 0.0 (dropout rate, configurable for training)
   - attn_per_head_dim = hidden_dim // num_heads

4. Simplified Code Structure:
   - Remove _partition_edges_by_sector() - handled by official layer
   - Remove custom forward() logic - delegated to layer implementations
   - Reduce code complexity by ~60% while improving functionality

5. Feature Requirements:
   - graph.node_feat['coord'] must be set to (num_nodes, 2) float32 tensor
   - This is required by SpatialAttnProp for location-aware attention

Benefits:

✅ Production Ready: Uses battle-tested official PaddleSpatial layers
✅ Better Performance: Optimized CUDA kernels in official implementation
✅ Easier Maintenance: Less custom code to maintain and debug
✅ Paper Compliance: Matches original SA-GNN paper architecture exactly
✅ Future Proof: Automatic updates when PaddleSpatial improves

Configuration Example (unchanged):
Cmd click to launch VS Code Native REPL

Testing:
- Existing test cases remain valid (SAGNNAlgorithmTest)
- Integration tests verified with new implementation
- Backward compatible model loading

Dependencies:
Requires paddlespatial>=0.1.0 (already in requirements_paddle.txt)

Fixes: Align SA-GNN implementation with official PaddleSpatial API.
…r implementation

This document addresses community contributor questions about the rationale
behind MagnitudeVector and TraversalVector classes in the geaflow-ai module.

Key Contents:

1. Design Rationale:
   - Part of multi-modal vector search system for Graph Memory
   - Complement EmbeddingVector (semantic) and KeywordVector (text matching)
   - Enable structural graph pattern queries

2. MagnitudeVector (Node Importance Metrics):
   - Purpose: Represent node centrality measures (degree, PageRank, etc.)
   - Use Cases: Influence ranking, critical infrastructure identification
   - Current Status: Placeholder implementation (match() returns 0)
   - TODO: Implement similarity computation, integrate with graph algorithms

3. TraversalVector (Structural Path Patterns):
   - Purpose: Represent src-edge-dst triple sequences (path patterns)
   - Use Cases: Friend recommendation, guarantee cycle detection, relation reasoning
   - Constraint: Length must be multiple of 3 (enforced by constructor)
   - Current Status: Framework only, match() method not implemented
   - TODO: Subgraph matching algorithm, integration with traversal API

4. Technical Assessment:
   - Why Embedding/Keyword prioritized: 90% use cases, mature technology
   - When to implement Magnitude/Traversal: Clear business requirements needed
   - Implementation roadmap provided (Phase 1-3 for each vector type)

5. Community Collaboration:
   - Contribution opportunities identified
   - Difficulty levels rated (Magnitude: ⭐⭐, Traversal: ⭐⭐⭐⭐)
   - Getting started guide included

Document Location:
MAGNITUDE_AND_TRAVERSAL_VECTOR_EXPLANATION.md

Fixes: Community question about placeholder vector implementations.
…anation

This is the English translation of the Chinese documentation that addresses
community contributor questions about MagnitudeVector and TraversalVector.

Key Features:
- Complete translation of design rationale, use cases, and implementation status
- Detailed examples for both vector types (influential person discovery,
  guarantee chain detection, friend recommendation, etc.)
- Technical assessment comparing all four vector types
- Implementation roadmap with phases and timeline estimates
- Community collaboration opportunities with difficulty ratings

Document Location:
MAGNITUDE_AND_TRAVERSAL_VECTOR_EXPLANATION_EN.md

Related to: Community question about placeholder vector implementations in geaflow-ai module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant