Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse query condition gives error when using a variable #524

Open
sam-haberman opened this issue Feb 27, 2023 · 8 comments
Open

Parse query condition gives error when using a variable #524

sam-haberman opened this issue Feb 27, 2023 · 8 comments

Comments

@sam-haberman
Copy link

I am creating a R Shiny App where I want users to be able to define the query ranges for the TileDB array that I have. In the code snippet below I have a slider that the user can adjust to change the query. This query is successful if I used the first option, but if I use the hslider variable I get an error.

q1 <- parse_query_condition("A-Count"> 6, use_int64 = TRUE)
hslider = 6
q1 <- parse_query_condition("A-Count"> hslider, use_int64 = TRUE)

(Error: [TileDB::QueryCondition] Error: Value node condition value size mismatch: 8 != 7).

I was able to successfully use the tiledb_query_condition_init function with a variable like the one below, but I want to include queries that have more than two conditions in it and I don't see any way to chain more than 2, using the tiledb_query_condition_combine) function.

qc <- tiledb_query_condition_init("total_count", 10, "INT64", "GT")

Is there any other way to combine multiple queries (more than 2) using the query_condition function or to use a variable in the parse_query_condition function?

@eddelbuettel
Copy link
Member

eddelbuettel commented Feb 27, 2023

Thanks for opening the issue. Your best option is to supplying the actual array to the parse_query_condition helper so it can look up the variable type from the name you supplied. An example from the unit tests:

arr <- tiledb_array(uri)
qc1 <- parse_query_condition(island %in% c("Dream", "Biscoe"), arr)
res1 <- tiledb_array(uri, query_condition = qc1[]

This works with all types of column, and it most helpful when the the simple guess based on the comparison value goes wrong: seeing an int does not lead us to an int64 column, and also gets an error with double.

Combining mulitple queries works by nesting. Say you have queries A, B, C. You can combine A and B into a query D, and thereafter combine C and D into a query E. When you submit E, it's combined effect is evluated in the query context. The combine queries you can use the corresponding function, it supports boolean 'AND' and 'OR".

You can also write it out. An example from the unit tests:

qc <- parse_query_condition(species == "Adelie" || species == "Chinstrap" && year >= 2009)

(This query gets by without array information because text values imply ASCII, and an int value implies INT. Supplying the array is more precise and sometimes needed.)

@sam-haberman
Copy link
Author

Thanks for the super quick response, it is much appreciated. I just have a question about the forum that you mentioned. This is the first thing I have reported to my knowledge unless creating the issue here posts to the forum as well? The only forum I am aware of is https://forum.tiledb.com/.

@sam-haberman
Copy link
Author

I am able to successfully use the nesting of multiple queries and the tiledb_query_condition_init function, but I still experience an issue with the parse_query condition function, even when including the array as suggested.

Following the example you posted above, I have

hslider = as.integer(60000)
arr <- tiledb_array(uri)
qc1 <- parse_query_condition("A-Count" > hslider, use_int64 = TRUE,arr)

This creates a tiledb_query_condition object, but it shares the warning seen below. Upon the next line of code it does not take the query condition into account at all and returns the entire array without taking into consideration the query condition supplied in qc1.
res1 <- tiledb_array(uri, query_condition = qc1)[]

Warning message:
In stopifnot(Argument 'qc' with query condition object required = is(qc, :
NAs introduced by coercion

@eddelbuettel
Copy link
Member

I had myself confused -- I saw your post first in our internal slack where we echo github issues. So there was no earlier post and I edited that out, my bad for any confusion. That said, our community slack is good and has a channel for tiledb-r. You may like it, and this link from the bottom of our homepage should get you started.

You should be able to get both muliplle conditions combined, and conditons created on the fly based on variable names. The parse_query_conditions() functions expects an R object of type expression so you may want to create a standard textstring first and then convert it and then use a function such as str2lang() to convert it. If you get lost I can probably mock something based on the popular penguins example.

@sam-haberman
Copy link
Author

I just joined the slack and really appreciate you taking your time to assist me with this. I would be grateful if you could make an example like you suggest as I am indeed lost and not having any success using parse or str2lang()with parse_query_condition().

When using those functions and passing along the expression into parse_query_condition(), I get the following errors, even without using a variable for my value. Likely I am missing something.


AB = str2lang("A-Count > 60000")
qc <- parse_query_condition(AB, use_int64 = TRUE,arr)
Error in .makeExpr(e, debug) : Unexpected symbol in expression: AB

qc <- parse_query_condition(str2lang("A-Count > 60000"), use_int64 = TRUE,arr)
Error in .makeExpr(e, debug) : 
  Unexpected token in expression: str2lang("A-Count > 60000")

@eddelbuettel
Copy link
Member

eddelbuettel commented Feb 28, 2023

Below is a full example running on my standard example data set from the palmerpenguins package (written with one call to fromDataFrame(), no extra options). It contains the critically important step of calling the str2lang which I had omitted. I will add something like this to the docs.

library(tiledb)            
uri <- "/tmp/tiledb/penguins"       
# as needed:    fromDataFrame(palmerpenguins::penguins, uri)
arr <- tiledb_array(uri)  
cols <- c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g")   

set.seed(42)  # reproduce
col <- sample(cols, 1) 

txt <- paste(col, "> 50") # arbitrary, replaced by your shiny app    
cat("Setting '", txt, "'\n", sep="")  
parsed <- do.call(what = parse_query_condition,   args = list(expr = str2lang(txt), ta = arr)) 
arr2 <- tiledb_array(uri, return_as="data.table", query_condition=parsed)[]  
summary(arr2[,4:7]) 

When I run this I get

 > summary(arr2[,4:7]) 
  bill_length_mm bill_depth_mm  flipper_length_mm  body_mass_g
  Min.   :50.1   Min.   :14.2   Min.   :181       Min.   :3250 
  1st Qu.:50.6   1st Qu.:16.0   1st Qu.:198       1st Qu.:3775                
  Median :51.3   Median :18.1   Median :206       Median :4375      
  Mean   :51.9   Mean   :17.8   Mean   :210       Mean   :4579   
  3rd Qu.:52.2   3rd Qu.:19.2   3rd Qu.:223       3rd Qu.:5500      
  Max.   :59.6   Max.   :20.8   Max.   :231       Max.   :6050
> cat("Setting '", txt, "'\n", sep="")      
 Setting 'bill_length_mm > 50'  
 > 

showing the 'larger than 50' condition held on the (randomly) picked column 'bill_length_mm'.

@sam-haberman
Copy link
Author

Thank you so much. I just used your example and was able to successfully get it to work.

@eddelbuettel
Copy link
Member

Perfect! I will leave the issue open to ensure I add the conversion at the package level -- may as well make it work from a character variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants