Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need better documentation for the partitioned tables in gis_core #801

Open
radumas opened this issue Dec 20, 2023 · 6 comments · May be fixed by #845
Open

We need better documentation for the partitioned tables in gis_core #801

radumas opened this issue Dec 20, 2023 · 6 comments · May be fixed by #845
Assignees
Labels

Comments

@radumas
Copy link
Member

radumas commented Dec 20, 2023

How should people be using them?
Can we have a simple way of accessing the latest data? A Mat View maybe?

@radumas radumas added the gis label Dec 20, 2023
@gabrielwol
Copy link
Collaborator

Here's a quick potential solution I came up with using information_schema and execute_format.

CREATE OR REPLACE FUNCTION gwolofs.centreline_latest()
RETURNS TABLE (
    version_date date,
    centreline_id integer,
    linear_name_id integer,
    linear_name_full text,
    linear_name_full_legal text,
    address_l text,
    address_r text,
    parity_l text,
    parity_r text,
    lo_num_l integer,
    hi_num_l integer,
    lo_num_r integer,
    hi_num_r integer,
    begin_addr_point_id_l integer,
    end_addr_point_id_l integer,
    begin_addr_point_id_r integer,
    end_addr_point_id_r integer,
    begin_addr_l integer,
    end_addr_l integer,
    begin_addr_r integer,
    end_addr_r integer,
    linear_name text,
    linear_name_type text,
    linear_name_dir text,
    linear_name_desc text,
    linear_name_label text,
    from_intersection_id integer,
    to_intersection_id integer,
    oneway_dir_code integer,
    oneway_dir_code_desc text,
    feature_code integer,
    feature_code_desc text,
    jurisdiction text,
    centreline_status text,
    shape_length numeric,
    objectid integer,
    shape_len numeric,
    mi_prinx integer,
    low_num_odd integer,
    high_num_odd integer,
    low_num_even integer,
    high_num_even integer,
    geom geometry
)
LANGUAGE plpgsql AS
$func$

DECLARE
    partition_name TEXT := (
        SELECT table_name
        FROM information_schema.tables
        WHERE
            table_schema = 'gis_core'
            AND table_name LIKE 'centreline_20%'
        ORDER BY table_name DESC
        LIMIT 1
    );

BEGIN
   RETURN QUERY
EXECUTE FORMAT(
        $$SELECT version_date, centreline_id, linear_name_id, linear_name_full, linear_name_full_legal, address_l, address_r, parity_l, parity_r, lo_num_l, hi_num_l, lo_num_r, hi_num_r, begin_addr_point_id_l, end_addr_point_id_l, begin_addr_point_id_r, end_addr_point_id_r, begin_addr_l, end_addr_l, begin_addr_r, end_addr_r, linear_name, linear_name_type, linear_name_dir, linear_name_desc, linear_name_label, from_intersection_id, to_intersection_id, oneway_dir_code, oneway_dir_code_desc, feature_code, feature_code_desc, jurisdiction, centreline_status, shape_length, objectid, shape_len, mi_prinx, low_num_odd, high_num_odd, low_num_even, high_num_even, geom
          FROM gis_core.%s
        $$,
          partition_name);
END
$func$;

--usage example
SELECT version_date, geom
FROM gwolofs.centreline_latest()
WHERE linear_name_full = 'Lake Shore Blvd W';

--same performance when selecting from partition directly:
SELECT version_date, geom
FROM gis_Core.centreline_20231001
WHERE linear_name_full = 'Lake Shore Blvd W';

@Nate-Wessel
Copy link
Contributor

Wondering if something like this is a simpler solution to the problem of finding the latest data.

CREATE OR REPLACE VIEW gis_core.intersection_latest AS

SELECT *
FROM gis_core.intersection
WHERE version_date = (SELECT MAX(version_date) FROM gis_core.intersection)

@gabrielwol a function might be quicker, but personally I would probably never think to look for data in a function. I would look at the tables and views. The solution above only takes about a fifth of a second to scan the table.

@gabrielwol
Copy link
Collaborator

I stand corrected, your view seems to be just as fast as the function I created and much less complex!

@Nate-Wessel Nate-Wessel self-assigned this Jan 26, 2024
@Nate-Wessel Nate-Wessel linked a pull request Jan 26, 2024 that will close this issue
@Nate-Wessel
Copy link
Contributor

I've created a branch for this and will slowly work on it in parallel with my HIN work. Please feel free to review, contribute or correct anything in there. A lot of my understanding of the centreline data at this point is educated guesswork.

@chmnata
Copy link
Collaborator

chmnata commented Jan 29, 2024

I created a new directional routing function that uses the max(version_date) centreline table in gis_core last week, noting this here so I will rmb to add it in the doc :meow_dio:

@chmnata
Copy link
Collaborator

chmnata commented Feb 15, 2024

Was curious to see the performance between a view and a mat view with more data, so I did a test on what our centreline table would look like in the year 2️⃣ 0️⃣ 2️⃣ 7️⃣

Both view and table are created using max(version_date):

SELECT *
FROM natalie.centreline_chonk 
WHERE version_date = (SELECT MAX(version_date) FROM gis_core.centreline)

Selecting and joining

Mat view is a bit faster

-- 155 msec.
select * from natalie.centreline_latest_mat
inner join miovision_api.centreline_miovision using (centreline_id)
-- 298 msec.
select * from natalie.centreline_latest
inner join miovision_api.centreline_miovision using (centreline_id)

Routing

Created routing function based on each view and mat view, the performance of routing 15 sets of OD:
Mat view is significantly faster.

with from_to(node_s, node_e) AS (
values 
	(13457965,13465931),(13465876,13467505),(13457965,13467253),
	(13467253,13465931),(13457965,13467124),(13467124,13465931),(13457965,13467505),
	(13467040,13465931),(13467124,13467040),(13467505,13465931),(13467505,13465931),
	(13465931,13465931),(13467253,13457965),(13467505,13457965),(13467253,13465931))

select * from from_to 
left join lateral  (select * from natalie.get_centreline_btwn_intersections(node_s, node_e))a ON true

The view returned in

-- natalie chonk view
16 secs 51 msec.

The mat view returned in

-- natalie chonk mat
6 secs 412 msec.

Simple spatial join

Mat view is just a little bit faster than a view.

-- 377 msec.
SELECT area_name, count(1) 
FROM natalie.centreline_latest_mat cent
inner join gis.neighbourhood nei on ST_intersects(cent.geom, nei.geom)
group by area_name
-- 457 msec.	
SELECT area_name, count(1) 
FROM natalie.centreline_latest cent
inner join gis.neighbourhood nei on ST_intersects(cent.geom, nei.geom)
group by area_name

Would expect a little more difference in performance when doing spatial join, especially with an index but apparently it doesn't super matter. However, since there is a significant improvement in efficiency when routing, I would suggest a mat view instead of a function or a view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants