Skip to content

Benchmark Results: Rust SQL Parser Comparison #2215

@LucaCappelletti94

Description

@LucaCappelletti94

Hi! I've created an open-source benchmark comparing Rust SQL parsers for PostgreSQL workloads and wanted to share the results with you.

Benchmark Results

Benchmark Results

Methodology

  • Framework: Criterion.rs v0.8 with flat sampling mode, 50 samples, 3-second measurement time
  • Workload: Parsing batches of 1-1000 SQL statements concatenated into a single string
  • Datasets:
    • SELECT: 4,505 queries from Spider (Yale) + Gretel AI
    • INSERT: 992 queries from Gretel AI
    • UPDATE: 983 queries from Gretel AI
    • DELETE: 933 queries from Gretel AI
  • Environment: AMD Ryzen Threadripper PRO 5975WX, Ubuntu 24.04, Rust 2021 edition
  • Dialect: All parsers configured for PostgreSQL

Results for sqlparser-rs

sqlparser-rs performs excellently in this benchmark:

  • 1.5-2x faster than FFI-based parsers (pg_query.rs, pg_parse)
  • 100% compatibility with all test queries in our corpus
  • Best balance of speed, correctness, and multi-dialect support
Statement Type 500 statements
SELECT 5.68 ms
INSERT 4.90 ms
UPDATE 3.20 ms
DELETE 2.93 ms

Observations

  1. Pure Rust implementation avoids FFI overhead, showing consistent performance advantage
  2. The recursive descent parser handles complex queries (CTEs, window functions, nested subqueries) efficiently
  3. Fuzz testing gives confidence in robustness that other parsers lack
  4. Could improve performance by using a generic S for most strings, allowing for both String and &str to reduce the amount of cloning which happens both when creating the tokens and the statements

Full Benchmark Repository

https://github.com/LucaCappelletti94/sql_ast_benchmark

The repository includes:

  • Complete benchmark code
  • All SQL test datasets
  • Reproducible methodology
  • Detailed README with analysis

Feedback Request

I'd appreciate any feedback on the benchmark methodology or if there are any improvements I should make:

  1. Are there any parser configuration options that could improve performance?
  2. Are there specific query patterns I should include in the test corpus?
  3. Is there anything about the benchmark setup that might not represent real-world usage?

Thank you for maintaining such an excellent library!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions