Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Fix sorting bug for named entities #394

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pmahindrakar-oss
Copy link
Contributor

@pmahindrakar-oss pmahindrakar-oss commented Apr 7, 2022

Signed-off-by: Prafulla Mahindrakar [email protected]

TL;DR

The column used in ordering should be part of the group by clause aswell and hence have added the created_at column too.
This wasn't happening for created_at column which was absent.

Only the record sets cannot be orderd by state since those dont exist workflow table.

With the fix :

flytectl get workflow -p flytesnacks -d development --filter.fieldSelector "state=0" --filter.sortBy "created_at" 
 --------------- ------------- ------------------------------------------------------------- ------------------------ ------- 
| PROJECT (46)  | DOMAIN      | NAME                                                        | DESCRIPTION            | STATE |
 --------------- ------------- ------------------------------------------------------------- ------------------------ ------- 
| flytesnacks   | development | core.containerization.use_secrets.my_secret_workflow        |                        |       |
 --------------- ------------- ------------------------------------------------------------- ------------------------ ------- 

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

fixes flyteorg/flyte#2327

Follow-up issue

NA

Signed-off-by: Prafulla Mahindrakar <[email protected]>
@codecov
Copy link

codecov bot commented Apr 7, 2022

Codecov Report

Merging #394 (47bb2ec) into master (d9b1c03) will increase coverage by 0.57%.
The diff coverage is 100.00%.

❗ Current head 47bb2ec differs from pull request most recent head 0e3915c. Consider uploading reports for the commit 0e3915c to get more accurate results

@@            Coverage Diff             @@
##           master     #394      +/-   ##
==========================================
+ Coverage   60.80%   61.38%   +0.57%     
==========================================
  Files         151      154       +3     
  Lines       10799    11088     +289     
==========================================
+ Hits         6566     6806     +240     
- Misses       3529     3578      +49     
  Partials      704      704              
Flag Coverage Δ
unittests 60.35% <100.00%> (+0.57%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/repositories/gormimpl/common.go 60.71% <ø> (ø)
pkg/repositories/gormimpl/named_entity_repo.go 82.00% <100.00%> (+11.00%) ⬆️
pkg/repositories/gormimpl/node_execution_repo.go 67.66% <0.00%> (-5.92%) ⬇️
pkg/manager/impl/execution_manager.go 73.96% <0.00%> (-0.18%) ⬇️
pkg/repositories/config/migrations.go 2.65% <0.00%> (-0.15%) ⬇️
pkg/rpc/adminservice/base.go 4.12% <0.00%> (-0.05%) ⬇️
pkg/manager/impl/task_execution_manager.go 71.75% <0.00%> (ø)
...c/notifications/implementations/event_publisher.go 94.28% <0.00%> (ø)
...cloudevent/implementations/cloudevent_publisher.go 86.44% <0.00%> (ø)
pkg/async/cloudevent/factory.go 84.00% <0.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9b1c03...0e3915c. Read the comment docs.

Signed-off-by: Prafulla Mahindrakar <[email protected]>
Copy link
Contributor

@katrogan katrogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the same issue occur in other group by queries like ListWorkflowIds, ListLaunchPlanIds, etc - should we fix those too?


// Apply consistent sort ordering.
if input.SortParameter != nil {
identifierGroupByWithOrderKey := fmt.Sprintf("%s, %s, %s, %s", Project, Domain, Name, input.SortParameter.GetSortKey())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check that the sort parameter isn't already in (Project, Domain, Name)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed this functionality of dynamically adding the column and instead made it part of the identifierGroupBy which takes care of fixing ListWorkflowIds , ListLaunchPlanIds, ListTaskIdentifiers

Signed-off-by: Prafulla Mahindrakar <[email protected]>
Signed-off-by: Prafulla Mahindrakar <[email protected]>
Signed-off-by: Prafulla Mahindrakar <[email protected]>
@@ -30,7 +31,7 @@ const taskTableName = "tasks"
const limit = "limit"
const filters = "filters"

var identifierGroupBy = fmt.Sprintf("%s, %s, %s", Project, Domain, Name)
var identifierGroupBy = fmt.Sprintf("%s, %s, %s, %s", Project, Domain, Name, CreatedAt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we always group by created at now? what if someone queries by updated_at or some other field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Facing issue with group by . Let me dig deeper.

I am planning to remove the group by instead do the following queries which should be equivalent

 explain SELECT 
Distinct 
entities.project,
entities.domain,
entities.name,
'2' AS resource_type,
named_entity_metadata.description,
named_entity_metadata.state 

FROM "named_entity_metadata" 
RIGHT JOIN (
select distinct project,domain,name 
from (
        SELECT  project,domain,name, created_at 
        FROM "tasks" 
        WHERE "domain" = 'development' AND "project" = 'admintests' order by created_at
     ) t limit 2 offset 2) AS entities ON named_entity_metadata.resource_type = 2 

AND named_entity_metadata.project = entities.project
AND named_entity_metadata.domain = entities.domain 
AND named_entity_metadata.name = entities.name 

WHERE COALESCE(named_entity_metadata.state, 0) = '0' ;
                                                                                  QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=9.82..9.84 rows=1 width=581)
   ->  Sort  (cost=9.82..9.83 rows=1 width=581)
         Sort Key: tasks.project, tasks.domain, tasks.name, named_entity_metadata.description, named_entity_metadata.state
         ->  Nested Loop Left Join  (cost=1.76..9.81 rows=1 width=581)
               Join Filter: ((named_entity_metadata.project = tasks.project) AND (named_entity_metadata.domain = tasks.domain) AND (named_entity_metadata.name = tasks.name))
               Filter: (COALESCE(named_entity_metadata.state, 0) = 0)
               ->  Limit  (cost=1.61..1.62 rows=1 width=29)
                     ->  HashAggregate  (cost=1.59..1.61 rows=2 width=29)
                           Group Key: tasks.project, tasks.domain, tasks.name
                           ->  Sort  (cost=1.41..1.44 rows=9 width=37)
                                 Sort Key: tasks.created_at
                                 ->  Seq Scan on tasks  (cost=0.00..1.27 rows=9 width=37)
                                       Filter: ((domain = 'development'::text) AND (project = 'admintests'::text))
               ->  Index Scan using named_entity_metadata_type_project_domain_name_idx on named_entity_metadata  (cost=0.14..8.16 rows=1 width=616)
                     Index Cond: (resource_type = 2)
(15 rows)

Also the query plan look more efficient

One with group by

explain SELECT 
entities.project,
entities.domain,
entities.name,
'2' AS resource_type,
named_entity_metadata.description,
named_entity_metadata.state 

FROM "named_entity_metadata" 
RIGHT JOIN (SELECT project,domain,name FROM "workflows" WHERE "domain" = 'development' AND "project" = 'admintests' GROUP BY project, domain, name, created_at ORDER BY created_at LIMIT 2 offset 2) AS entities ON named_entity_metadata.resource_type = 2 
AND named_entity_metadata.project = entities.project
 AND named_entity_metadata.domain = entities.domain 
AND named_entity_metadata.name = entities.name 

WHERE COALESCE(named_entity_metadata.state, 0) = '0' 

GROUP BY "created_at",entities.project, entities.domain, entities.name, named_entity_metadata.description, named_entity_metadata.state 

ORDER BY created_at desc;
                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Group  (cost=9.70..9.72 rows=1 width=589)
   Group Key: named_entity_metadata.created_at, workflows.project, workflows.domain, workflows.name, named_entity_metadata.description, named_entity_metadata.state
   ->  Sort  (cost=9.70..9.71 rows=1 width=557)
         Sort Key: named_entity_metadata.created_at DESC, workflows.project, workflows.domain, workflows.name, named_entity_metadata.description, named_entity_metadata.state
         ->  Nested Loop Left Join  (cost=1.58..9.69 rows=1 width=557)
               Join Filter: ((named_entity_metadata.project = workflows.project) AND (named_entity_metadata.domain = workflows.domain) AND (named_entity_metadata.name = workflows.name))
               Filter: (COALESCE(named_entity_metadata.state, 0) = 0)
               ->  Limit  (cost=1.44..1.46 rows=2 width=37)
                     ->  Group  (cost=1.41..1.53 rows=9 width=37)
                           Group Key: workflows.created_at, workflows.project, workflows.domain, workflows.name
                           ->  Sort  (cost=1.41..1.44 rows=9 width=37)
                                 Sort Key: workflows.created_at, workflows.name
                                 ->  Seq Scan on workflows  (cost=0.00..1.27 rows=9 width=37)
                                       Filter: ((domain = 'development'::text) AND (project = 'admintests'::text))
               ->  Materialize  (cost=0.14..8.17 rows=1 width=624)
                     ->  Index Scan using named_entity_metadata_type_project_domain_name_idx on named_entity_metadata  (cost=0.14..8.16 rows=1 width=624)
                           Index Cond: (resource_type = 2)

We dont need an additional sort for the same key in both inner and outer sql queries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note we still need something equivalent to the WHERE COALESCE(named_entity_metadata.state, 0) = '0' clause to filter by named entity state

re:

RIGHT JOIN (
select distinct project,domain,name 
from (
        SELECT  project,domain,name, created_at 
        FROM "tasks" 
        WHERE "domain" = 'development' AND "project" = 'admintests' order by created_at
     ) t limit 2 offset 2) AS entities ON named_entity_metadata.resource_type = 2 

why do we need to do the nested select? can't we right join directly on the result of SELECT DISTINCT project, domain, name, created_at

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants