Fix sorting bug for named entities #394

pmahindrakar-oss · 2022-04-07T14:26:38Z

Signed-off-by: Prafulla Mahindrakar [email protected]

TL;DR

The column used in ordering should be part of the group by clause aswell and hence have added the created_at column too.
This wasn't happening for created_at column which was absent.

Only the record sets cannot be orderd by state since those dont exist workflow table.

With the fix :

flytectl get workflow -p flytesnacks -d development --filter.fieldSelector "state=0" --filter.sortBy "created_at" 
 --------------- ------------- ------------------------------------------------------------- ------------------------ ------- 
| PROJECT (46)  | DOMAIN      | NAME                                                        | DESCRIPTION            | STATE |
 --------------- ------------- ------------------------------------------------------------- ------------------------ ------- 
| flytesnacks   | development | core.containerization.use_secrets.my_secret_workflow        |                        |       |
 --------------- ------------- ------------------------------------------------------------- ------------------------ -------

Type

Bug Fix
Feature
Plugin

Are all requirements met?

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

fixes flyteorg/flyte#2327

Follow-up issue

NA

Signed-off-by: Prafulla Mahindrakar <[email protected]>

codecov · 2022-04-07T14:41:53Z

Codecov Report

Merging #394 (47bb2ec) into master (d9b1c03) will increase coverage by 0.57%.
The diff coverage is 100.00%.

❗ Current head 47bb2ec differs from pull request most recent head 0e3915c. Consider uploading reports for the commit 0e3915c to get more accurate results

@@            Coverage Diff             @@
##           master     #394      +/-   ##
==========================================
+ Coverage   60.80%   61.38%   +0.57%     
==========================================
  Files         151      154       +3     
  Lines       10799    11088     +289     
==========================================
+ Hits         6566     6806     +240     
- Misses       3529     3578      +49     
  Partials      704      704

Flag	Coverage Δ
unittests	`60.35% <100.00%> (+0.57%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/repositories/gormimpl/common.go	`60.71% <ø> (ø)`
pkg/repositories/gormimpl/named_entity_repo.go	`82.00% <100.00%> (+11.00%)`	⬆️
pkg/repositories/gormimpl/node_execution_repo.go	`67.66% <0.00%> (-5.92%)`	⬇️
pkg/manager/impl/execution_manager.go	`73.96% <0.00%> (-0.18%)`	⬇️
pkg/repositories/config/migrations.go	`2.65% <0.00%> (-0.15%)`	⬇️
pkg/rpc/adminservice/base.go	`4.12% <0.00%> (-0.05%)`	⬇️
pkg/manager/impl/task_execution_manager.go	`71.75% <0.00%> (ø)`
...c/notifications/implementations/event_publisher.go	`94.28% <0.00%> (ø)`
...cloudevent/implementations/cloudevent_publisher.go	`86.44% <0.00%> (ø)`
pkg/async/cloudevent/factory.go	`84.00% <0.00%> (ø)`
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9b1c03...0e3915c. Read the comment docs.

Signed-off-by: Prafulla Mahindrakar <[email protected]>

katrogan

can the same issue occur in other group by queries like ListWorkflowIds, ListLaunchPlanIds, etc - should we fix those too?

katrogan · 2022-04-07T16:05:42Z

pkg/repositories/gormimpl/named_entity_repo.go


 // Apply consistent sort ordering.
 if input.SortParameter != nil {
+ identifierGroupByWithOrderKey := fmt.Sprintf("%s, %s, %s, %s", Project, Domain, Name, input.SortParameter.GetSortKey())


should we check that the sort parameter isn't already in (Project, Domain, Name)?

Have removed this functionality of dynamically adding the column and instead made it part of the identifierGroupBy which takes care of fixing ListWorkflowIds , ListLaunchPlanIds, ListTaskIdentifiers

Signed-off-by: Prafulla Mahindrakar <[email protected]>

katrogan · 2022-04-08T14:05:59Z

pkg/repositories/gormimpl/common.go

@@ -30,7 +31,7 @@ const taskTableName = "tasks"
 const limit = "limit"
 const filters = "filters"

-var identifierGroupBy = fmt.Sprintf("%s, %s, %s", Project, Domain, Name)
+var identifierGroupBy = fmt.Sprintf("%s, %s, %s, %s", Project, Domain, Name, CreatedAt)


so we always group by created at now? what if someone queries by updated_at or some other field?

Facing issue with group by . Let me dig deeper.

I am planning to remove the group by instead do the following queries which should be equivalent

explain SELECT Distinct entities.project, entities.domain, entities.name, '2' AS resource_type, named_entity_metadata.description, named_entity_metadata.state FROM "named_entity_metadata" RIGHT JOIN ( select distinct project,domain,name from ( SELECT project,domain,name, created_at FROM "tasks" WHERE "domain" = 'development' AND "project" = 'admintests' order by created_at ) t limit 2 offset 2) AS entities ON named_entity_metadata.resource_type = 2 AND named_entity_metadata.project = entities.project AND named_entity_metadata.domain = entities.domain AND named_entity_metadata.name = entities.name WHERE COALESCE(named_entity_metadata.state, 0) = '0' ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Unique (cost=9.82..9.84 rows=1 width=581) -> Sort (cost=9.82..9.83 rows=1 width=581) Sort Key: tasks.project, tasks.domain, tasks.name, named_entity_metadata.description, named_entity_metadata.state -> Nested Loop Left Join (cost=1.76..9.81 rows=1 width=581) Join Filter: ((named_entity_metadata.project = tasks.project) AND (named_entity_metadata.domain = tasks.domain) AND (named_entity_metadata.name = tasks.name)) Filter: (COALESCE(named_entity_metadata.state, 0) = 0) -> Limit (cost=1.61..1.62 rows=1 width=29) -> HashAggregate (cost=1.59..1.61 rows=2 width=29) Group Key: tasks.project, tasks.domain, tasks.name -> Sort (cost=1.41..1.44 rows=9 width=37) Sort Key: tasks.created_at -> Seq Scan on tasks (cost=0.00..1.27 rows=9 width=37) Filter: ((domain = 'development'::text) AND (project = 'admintests'::text)) -> Index Scan using named_entity_metadata_type_project_domain_name_idx on named_entity_metadata (cost=0.14..8.16 rows=1 width=616) Index Cond: (resource_type = 2) (15 rows)

Also the query plan look more efficient

One with group by

explain SELECT entities.project, entities.domain, entities.name, '2' AS resource_type, named_entity_metadata.description, named_entity_metadata.state FROM "named_entity_metadata" RIGHT JOIN (SELECT project,domain,name FROM "workflows" WHERE "domain" = 'development' AND "project" = 'admintests' GROUP BY project, domain, name, created_at ORDER BY created_at LIMIT 2 offset 2) AS entities ON named_entity_metadata.resource_type = 2 AND named_entity_metadata.project = entities.project AND named_entity_metadata.domain = entities.domain AND named_entity_metadata.name = entities.name WHERE COALESCE(named_entity_metadata.state, 0) = '0' GROUP BY "created_at",entities.project, entities.domain, entities.name, named_entity_metadata.description, named_entity_metadata.state ORDER BY created_at desc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Group (cost=9.70..9.72 rows=1 width=589) Group Key: named_entity_metadata.created_at, workflows.project, workflows.domain, workflows.name, named_entity_metadata.description, named_entity_metadata.state -> Sort (cost=9.70..9.71 rows=1 width=557) Sort Key: named_entity_metadata.created_at DESC, workflows.project, workflows.domain, workflows.name, named_entity_metadata.description, named_entity_metadata.state -> Nested Loop Left Join (cost=1.58..9.69 rows=1 width=557) Join Filter: ((named_entity_metadata.project = workflows.project) AND (named_entity_metadata.domain = workflows.domain) AND (named_entity_metadata.name = workflows.name)) Filter: (COALESCE(named_entity_metadata.state, 0) = 0) -> Limit (cost=1.44..1.46 rows=2 width=37) -> Group (cost=1.41..1.53 rows=9 width=37) Group Key: workflows.created_at, workflows.project, workflows.domain, workflows.name -> Sort (cost=1.41..1.44 rows=9 width=37) Sort Key: workflows.created_at, workflows.name -> Seq Scan on workflows (cost=0.00..1.27 rows=9 width=37) Filter: ((domain = 'development'::text) AND (project = 'admintests'::text)) -> Materialize (cost=0.14..8.17 rows=1 width=624) -> Index Scan using named_entity_metadata_type_project_domain_name_idx on named_entity_metadata (cost=0.14..8.16 rows=1 width=624) Index Cond: (resource_type = 2)

We dont need an additional sort for the same key in both inner and outer sql queries

just a note we still need something equivalent to the WHERE COALESCE(named_entity_metadata.state, 0) = '0' clause to filter by named entity state

re:

RIGHT JOIN ( select distinct project,domain,name from ( SELECT project,domain,name, created_at FROM "tasks" WHERE "domain" = 'development' AND "project" = 'admintests' order by created_at ) t limit 2 offset 2) AS entities ON named_entity_metadata.resource_type = 2

why do we need to do the nested select? can't we right join directly on the result of SELECT DISTINCT project, domain, name, created_at

Fix sorting bug for named entities

19c0934

Signed-off-by: Prafulla Mahindrakar <[email protected]>

pmahindrakar-oss requested review from katrogan, wild-endeavor, anandswaminathan, EngHabu and kumare3 as code owners April 7, 2022 14:26

lint fixes

7462000

Signed-off-by: Prafulla Mahindrakar <[email protected]>

katrogan reviewed Apr 7, 2022

View reviewed changes

pmahindrakar-oss added 3 commits April 8, 2022 18:16

Fixed filtering

b0cde31

Signed-off-by: Prafulla Mahindrakar <[email protected]>

removed unused

47bb2ec

Signed-off-by: Prafulla Mahindrakar <[email protected]>

using distinct to filter results

0e3915c

Signed-off-by: Prafulla Mahindrakar <[email protected]>

katrogan reviewed Apr 8, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sorting bug for named entities #394

Fix sorting bug for named entities #394

pmahindrakar-oss commented Apr 7, 2022 •

edited

codecov bot commented Apr 7, 2022 •

edited

katrogan left a comment

katrogan Apr 7, 2022

pmahindrakar-oss Apr 8, 2022 •

edited

katrogan Apr 8, 2022

pmahindrakar-oss Apr 8, 2022 •

edited

katrogan Apr 8, 2022

Fix sorting bug for named entities #394

Are you sure you want to change the base?

Fix sorting bug for named entities #394

Conversation

pmahindrakar-oss commented Apr 7, 2022 • edited

TL;DR

Type

Are all requirements met?

Complete description

Tracking Issue

Follow-up issue

codecov bot commented Apr 7, 2022 • edited

Codecov Report

katrogan left a comment

Choose a reason for hiding this comment

katrogan Apr 7, 2022

Choose a reason for hiding this comment

pmahindrakar-oss Apr 8, 2022 • edited

Choose a reason for hiding this comment

katrogan Apr 8, 2022

Choose a reason for hiding this comment

pmahindrakar-oss Apr 8, 2022 • edited

Choose a reason for hiding this comment

katrogan Apr 8, 2022

Choose a reason for hiding this comment

pmahindrakar-oss commented Apr 7, 2022 •

edited

codecov bot commented Apr 7, 2022 •

edited

pmahindrakar-oss Apr 8, 2022 •

edited

pmahindrakar-oss Apr 8, 2022 •

edited