Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ST_EstimatedExtent for quick bounds calc #1220

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sharkAndshark
Copy link
Collaborator

@sharkAndshark sharkAndshark commented Feb 29, 2024

Try to fix #1206

  • Use St_EstimatedExtent
  • Update test cases

.await?
.query_one(&format!(
let sql = if is_quick {
format!("SELECT ST_EstimatedExtent('{schema}', '{table}', '{geometry_column}') as bounds")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why in quotes? these values are not strings, they are escaped PostgreSQL identifiers (schema/table/column names)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per postgis doc, the arguments type of ST_EstimatedExtent is text.

box2d ST_EstimatedExtent(text schema_name, text table_name, text geocolumn_name);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, thx, did not realize that. Should the parameters be pre-escaped though? We would need to test this on the MixedCase. Apparently there was a bug 10 years ago -- https://trac.osgeo.org/postgis/ticket/2834 that temporarily used the escaped version, but then it was reverted it seems

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a little bit test.

SELECT ST_EstimatedExtent('public','Project_Copy1','geom')
-- BOX(399529.5 3384786.5,412743.5 3394297)
SELECT ST_EstimatedExtent('public','"Project_Copy1"','geom')
-- syntax error

I will add a test on MixedCase later.

@nyurik
Copy link
Member

nyurik commented Feb 29, 2024

I added a test for weird table names - #1222 -- please review, and maybe add a test for a similar function too, probably in the same namespace.

@nyurik
Copy link
Member

nyurik commented Feb 29, 2024

Also note this code: https://github.com/postgis/postgis/blob/bccab952f00696d5a48e9eb495e827825eda23a2/postgis/gserialized_estimate.c#L2315 - postgis adds double quotes around passed in schema & table names. I really hope that function returns pre-escaped value without double quotes, or else it will fail spectacularly

let sql = if is_quick {
let schema = escape_literal(schema);
let table = escape_literal(table);
let geometry_column = escape_literal(geometry_column);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nyurik any help on this? I think we should call escape_literal before ST_EstimatedExtent, but the tests still fails.
[2024-03-13T04:33:08Z ERROR martin::pg::builder] Failed to create a source: Postgres error while querying table bounds: db error: ERROR: invalid name syntax

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the SQL is : SELECT ST_EstimatedExtent('"Quotes'' and Space.Dot.', '. Points" ''quote', '. "Geom"') as bounds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh, one of those painful cases I guess - I created a bug https://trac.osgeo.org/postgis/ticket/5697

So to move forward, I think we will have to do a workaround with the following logic:

  • if schema, table, and geom column is "simple" (matches this regex for all 3 values: ^[a-zA-Z_][a-zA-Z_0-9]*$), pass these values "as is", but make sure to pass them as parameters (second param in .query_one())
  • if not, use the existing query

@nyurik
Copy link
Member

nyurik commented Mar 16, 2024

I have managed to get it to work in some cases, but still some issues remain. The trick I think is to remove the " around schema and table, but keep the geometry column as a string value, so something like this, but there is still something wrong

    let params: Vec<&(dyn ToSql + Sync)>;
    let sql: String;

    // Both queries require identifier-escaped format for schema and table
    let schema = escape_identifier(schema);
    let table = escape_identifier(table);
    let mut schema: &str = schema.as_str();
    let mut table: &str = table.as_str();

    if is_quick {
        // Remove first and last quote (but only one on each side)
        schema = &schema[1..schema.len() - 1];
        table = &table[1..table.len() - 1];
        params = vec![&schema, &table, &geometry_column];
        sql = "SELECT ST_EstimatedExtent($1, $2, $3) as bounds".to_string();
    } else {
        params = vec![];
        let geometry_column = escape_identifier(geometry_column);
        sql = format!(
            r#"
WITH real_bounds AS (SELECT ST_SetSRID(ST_Extent({geometry_column}), {srid}) AS rb FROM {schema}.{table})
SELECT ST_Transform(
            CASE
                WHEN (SELECT ST_GeometryType(rb) FROM real_bounds LIMIT 1) = 'ST_Point'
                THEN ST_SetSRID(ST_Extent(ST_Expand({geometry_column}, 1)), {srid})
                ELSE (SELECT * FROM real_bounds)
            END,
            4326
        ) AS bounds
FROM {schema}.{table};
                "#
        );
    };

@nyurik
Copy link
Member

nyurik commented Mar 17, 2024

I got it to work in https://github.com/sharkAndshark/martin/pull/103/files -- once merged into your branch, it will appear here as well. Let me know what you think

fix ST_EstimatedExtent computation
@sharkAndshark sharkAndshark marked this pull request as ready for review May 6, 2024 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

For --auto-bounds quick, use ST_EstimatedExtent instead of ST_Extent for quicker results
2 participants