Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support filtering by extended approval conditions #935

Open
4 tasks
morucci opened this issue Sep 2, 2022 · 3 comments
Open
4 tasks

Support filtering by extended approval conditions #935

morucci opened this issue Sep 2, 2022 · 3 comments

Comments

@morucci
Copy link
Collaborator

morucci commented Sep 2, 2022

Problem statement

The self-merged metrics is based on provider data model and in some cases does not reflect what a team expect.
Indeed a change that match state == self_merged is a change where <created_by> == <merged_by>.

For instance, here are some team workflows for which the self-merged metrics is not relevant:

  1. A change author merges his own change when another human has approved the change. However the change author can self-approve and merge his own change.
  2. A bot merges a change when the change get specific approval conditions. However, a change author can set his own approval and then trigger the bot to get is own change merged.

The current data model and the query language does not provide enough flexibility to allow filtering for those changes.

Proposed solution

For the first workflow, a self-merged change is a change that match state:self_merged and author in approval_authors['APPROVED'] and not <reviewer_group> - author in approval_authors['APPROVED']

For the second use case, a self-merged change is a change that match state:merged and author in approval_authors[Workflow+1] and not <reviewer_group> - author in approval_authors['Code-Review+2']

Required changes

  • add the approval_authors field to changes and change's events. This field must be a "Map String [String]" which reflect the current approval state of a change.
  • Support for the "in" operator in the query language. The operator allows to filter for term in a EL doc array field.
  • Support for the "author" keyword in the query language. This keyword might translate to an EL query using a script https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html
  • Support for the "-" operator. The operator remove an entity from a monocle group.
@TristanCacqueray
Copy link
Contributor

Here are some early comments:

  • The in operator is already mentioned in the spec: https://github.com/change-metrics/monocle/blob/master/doc/query-language.md#extension , however it was meant to work with a literal list. How is the approval_authors going to be evaluated, e.g. do we need a elasticsearch script to match against such value?

  • The - operator seems extra difficult, and I wonder if we can express the proposed solutions without it. For example, could the first workflow be expressed as state:self_merged and approval_authors['APPROVED']:[author] , in other words, the author is the only member of the APPROVED list.

  • For bots, perhaps they should be defined in the workspace configuration so that we can automatically ignore them in such case.

  • And instead of changing the query, could we change the self_merged status to take these workflows into account by default?, e.g. when the author is the only member of every approval list.

@TristanCacqueray
Copy link
Contributor

I guess the main question is: how are the new operators going to be translated to an elastic query?

@morucci
Copy link
Collaborator Author

morucci commented Sep 2, 2022

Here are some early comments:

Actually, after looking more at the EL doc, I'm not sure that's possible to filter a doc based on a script relying on a document's field. There is that thing: expression language https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-expression.html but limitations (at the end of the page) are pretty heavy.

  • The - operator seems extra difficult, and I wonder if we can express the proposed solutions without it. For example, could the first workflow be expressed as state:self_merged and approval_authors['APPROVED']:[author] , in other words, the author is the only member of the APPROVED list.

Well maybe, but if a bot votes 'APPROVED' then the only member assumption will be false. However I like the syntax approval_authors['APPROVED']:[author] to match an array content. The terms_set query seems to be well adapted to that use case https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-set-query.html

  • For bots, perhaps they should be defined in the workspace configuration so that we can automatically ignore them in such case.

Yes that could a be solution. However as approval_authors is going to be populated by crawlers then we'd need to perform a database update when that list of bots change.

Or perhaps when we build the EL query we expand approval_authors['APPROVED']:[author] to a terms_set query combined with negative term queries to exclude each bot of the bot list. <- probably this coupled with the exclude bot feature might be the right thing to do ?

  • And instead of changing the query, could we change the self_merged status to take these workflows into account by default?, e.g. when the author is the only member of every approval list.

That might be a solution too, but self_merged is computed by crawlers and any change to the bot list will require a db update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants