filtering performance question #27

cariaso · 2020-12-13T08:13:54Z

More of a performance question than a bug report, but I don't see a more suitable location.

I get that groq has capabilities that JMESPath does not, but as they do share some similarities I'm comparing them.

I need to filter some JSONs. For this test I'm working with 10k entries. Here is an example that shows 2 records. The real data has a few more columns, some with a bit of nested complexity, but my queries don't touch any of that.

[{
  "base__uid": 1664200,
  "base__chrom": "chr6",
  "base__pos": 1312763,
  "base__ref_base": "T"
},{
  "base__uid": 1669279,
  "base__chrom": "chr6",
  "base__pos": 4116028,
  "base__ref_base": "G"
}]

In JMESPath this query:

[?base__chrom=='chr6']|[?base__pos>=`1000000`]|[?base__pos<=`5000000`]

takes 7ms

and it seems to be equivalent to this groq query

*[base__chrom=='chr6' && base__pos>= 1000000 && base__pos <= 5000000]

which takes 5662ms.

So far my queries aren't particularly complex, but regardless of their complexity, runtime for groq seems linear and dominated by the number of records in my input. At 100k records, it always takes ~30s. 10k records = ~3s. etc.

I'm filtering with this code

       value = await evaluate(groqtree, {  dataset: allrec  });
       accepted = await value.get();

Do my performance numbers seem appropriate? Anything I should dig into in hopes of getting better performance from groq?

The text was updated successfully, but these errors were encountered:

judofyr · 2020-12-16T12:42:49Z

So far my queries aren't particularly complex, but regardless of their complexity, runtime for groq seems linear and dominated by the number of records in my input.

This is expected. groq-js is a naive implementation and doesn't index the documents in any way to speed up query performance. Right now there's no performance gain of using groq-js vs. just calling .filter(…) in JavaScript.

which takes 5662ms.

As for the performance itself: There hasn't really been done any specific work in making it efficient. There might be a lot of low-hanging fruits which can speed up performance.

scottrippey · 2022-12-25T20:07:58Z

I'm running into similar performance issues. I'm attempting to make a reusable groq testing library, and for some queries across 1500 entries, it's taking 15+ seconds. These queries are perfectly fast in production.
I assume that the lack of indexing would essentially prevent this from running any faster ... do you think there's any possibility that groq-js could add support for indexing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filtering performance question #27

filtering performance question #27

cariaso commented Dec 13, 2020

judofyr commented Dec 16, 2020

scottrippey commented Dec 25, 2022

filtering performance question #27

filtering performance question #27

Comments

cariaso commented Dec 13, 2020

judofyr commented Dec 16, 2020

scottrippey commented Dec 25, 2022