Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate high Change Failure Rate #6989

Closed
2 of 3 tasks
dms1981 opened this issue May 13, 2024 · 8 comments
Closed
2 of 3 tasks

Investigate high Change Failure Rate #6989

dms1981 opened this issue May 13, 2024 · 8 comments
Assignees

Comments

@dms1981
Copy link
Contributor

dms1981 commented May 13, 2024

User Story

As a Product Owner
I want to invest time in investigating our failing changes
So that we improve the quality of our Change Failure Rate (CFR)

Value / Purpose

A high CFR isn't necessarily indicative of degraded services or poor quality changes; by investing time in looking at our commonly-failing changes we can make this metric more meaningful.

Useful Contacts

No response

Additional Information

No response

Proposal / Unknowns

  • How do we retrieve the failing changes across our repositories?
  • If remediation is complex, raise a bugfix issue.

Definition of Done

  • Commonly failing changes identified
  • Easy remediations applied
  • Complex remediations turned into new issues
@dms1981 dms1981 changed the title <title> Investigate high Change Failure Rate May 13, 2024
@SteveLinden SteveLinden self-assigned this May 15, 2024
@SteveLinden
Copy link
Contributor

To start on this I will look at issues on the team channel that indicate failures. This may require additional issues being raised as the calls are often different but in some cases an iterative approach may be needed - fix one, a different issue, fix that, yet another issue etc.

Unlikely to start on this until either later tomorrow or Friday (milk monitor duties could impact this).

@dms1981 dms1981 reopened this May 15, 2024
@SteveLinden
Copy link
Contributor

One example is the daily Terraform Static Code Analysis which fails daily.
This has a fix in place for one issue (trivy wants the KMS key for S3 buckets) which may cure one issue but there are also a couple of checkov ones we need to examine. These appear to be complaining for no key in a call for routine. This will be looked at assuming the fix for the trivy issue works.

There is a link on the modernisation platform workflow status which may provide additional information

@SteveLinden
Copy link
Contributor

Change made to the static workflow check has worked and this no longer produces errors

@SteveLinden
Copy link
Contributor

No other obvious regular errors popping up.
I will have a route around the actions on github to see if there are any on our main repos.

@SimonPPledger
Copy link
Contributor

@SteveLinden this is the code that checks for failures https://github.com/ministryofjustice/dora-the-explora/blob/main/cfr.py

@SteveLinden
Copy link
Contributor

SteveLinden commented May 22, 2024

On a regular basis the dependabot changes fail on the Go tests. There are various reasons (mainly Error: No valid credential sources found) but due to the infrequency of these failures I have decided to ignore them here.

Working through and I'll list those that need checks here....

Secure Code Analysis on modernisation-platform-terraform-ecs-cluster - 12 checkov failures

Secure Code Analysis on https://github.com/ministryofjustice/modernisation-platform-terraform-s3-bucket/actions/runs/9184139276/job/25255974159 - unknown cause - sent to team

Sent to the team modernisation-platform-security is regularly failing due to a feature existing that is not supported in this repository. An example is in here
https://github.com/ministryofjustice/modernisation-platform-security/actions/runs/9193342497/job/25284134825

@SteveLinden
Copy link
Contributor

Apart from. the issues on the Go test, which can be alleviated by connecting, closing the PR and re-opening it and then running the job, there do not appear to be many issues other than those listed above.

One to two of these have been fixed.

@SteveLinden
Copy link
Contributor

Some issues were highlighted with closing this and they are listed below

Following our discussions I did a check of the repo (thanks
@Aaron
) I have been through that list and have the following :
3 are archived and can be removed@

  1. modernisation-platform-terraform-ecs
  2. modernisation-platform-infrastructure-test
  3. modernisation-platform-terraform-trusted-advisor

There were also 5 routines that showed errors - first one was 4 days ago but the other more recent.

  1. modernisation-platform-cp-network-test (possible personally identifiable data)
  2. modernisation-platform-terraform-ecs-cluster (7 or 8 checkov errors)
  3. modernisation-platform-terraform-member-vpc (1 checkov error - CKV2_AWS_67)
  4. modernisation-platform-terraform-s3-bucket (tflint error - seen elsewhere so possibly easy to fix)
  5. modernisation-platform-terraform-bastion-linux (1 checkov error - CKV2_AWS_64)

Asked the team for opinions on raising calls for the 5 above so additional ones may be raised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants