Conversation
| @@ -0,0 +1,18 @@ | |||
| #!/bin/bash | |||
|
|
|||
| # The latest official release of Quickwit is too old, many unsupported tantivy quries. | |||
There was a problem hiding this comment.
So what stops us from using the latest and greatest Docker builds?
Quickwit hasn't released a new version long time, and many people actually use a nightly build. I used a prebuilt binary here to avoid running Docker, but we can request a new binary release from the Quickwit team before merging this PR.
EDIT: Using Docker is fine, see the starrocks and singlestore submissions in this repository.
It is fine to add |
Some more debugging would be nice but we can again mark Q2 again as |
I don't really understand what that means. Is performance slower than it could be? |
That's good. As per the benchmark rules, as little as possible tuning should be applied (i.e. databases should run with their default settings). |
|
@cometkim I'm interested in merging this - thanks for the PR. Seems more work is needed, please ping me when this is ready. |
Notes:
Quickwit hasn't released a new version long time, and many people actually use a nightly build. I used a prebuilt binary here to avoid running Docker, but we can request a new binary release from the Quickwit team before merging this PR.
Quickwit does not support Q5. Testing this would require additional features, such as ElasticSearch's
bucket_script.The result for Q2 appears to be inconsistent with other engines. It's unclear whether this is a bug, a precision loss, or data corruption.
Quickwit's
termsaggregation does not support unlimited buckets. There is no explicit "return all" option in aggregations, and even if I specify an arbitrarily large number, the maximum number is limited by the searcher'saggregation_bucket_limitsettings.I haven't tuned the settings for the instance size.
This may differ significantly from actual production results. It's more like benchmarking Tantivy. Since Quickwit is typically configured with S3 and a Postgres metastore, I suspect there will be additional overhead by other components and networking.
There were several errors while loading the 1000m data, but they weren't logged, so I don't know the exact cause. I need to run this at least once more to check the data quality.