Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COST-4744] Azure network data processing SQL #5056

Merged
merged 12 commits into from
May 23, 2024
Merged

Conversation

cgoodfred
Copy link
Contributor

@cgoodfred cgoodfred commented Apr 22, 2024

Jira Ticket

COST-4744

Description

This change will add ocp on azure network processing. This change does a few things:

  • Identifies Network records from the Azure bill that are associated with a specific Virtual Machine that can be tied to an OCP Node
  • Separates the usage and cost for these records into a distinct row per day, one for inbound traffic, one for outbound traffic when we aggregate the azure_openshift_daily records up
  • Filter out the networking records when we are grouping by namespace because these values cannot be attributed to a specific namespace/project (hence the Network unattributed project!)
  • Perform a new insert into the project daily summary table for the networking records grouped by OCP node
  • Back populate these records into the OCPUsage table adding a data transfer direction to the group by which has 3 options, IN, OUT, and NULL
  • Requires a trino migration in stage/prod and if you already have trino azure tables created locally
python koku/manage.py migrate_trino_tables --add-columns "[{\"table\": \"azure_openshift_daily_resource_matched_temp\", \"column\": \"data_transfer_direction\", \"datatype\": \"varchar\"},{\"table\": \"reporting_ocpazurecostlineitem_project_daily_summary\", \"column\": \"data_transfer_direction\", \"datatype\": \"varchar\"}]"

Testing

  1. Using nise > 4.4.17, create Azure data that has DTGenerator's defined for the same resource id as an OpenShift node. Something like
---
generators:
  - VMGenerator:
      start_date: 2024-03-01
      service_name: Virtual Machines
      meter_id: 55555555-4444-3333-2222-111111111128
      resource_location: "US East"
      instance_id: '/subscriptions/99999999-9999-9999-9999-999999999999/resourceGroups/koku-99hqd-rg/providers/Microsoft.Compute/virtualMachines/azure_compute1'
      tags:
        version: Mars
        dashed-key-on-azure: dashed-value
  - DTGenerator: 
      start_date: 2024-03-01
      meter_id: 55555555-4444-3333-2222-111111111128
      data_direction: "in"
      resource_location: "US East"
      instance_id: '/subscriptions/99999999-9999-9999-9999-999999999999/resourceGroups/koku-99hqd-rg/providers/Microsoft.Compute/virtualMachines/azure_compute1'
      usage_quantity: 5
      resource_rate: 0.01
      tags:
        version: Mars
        dashed-key-on-azure: dashed-value
  - DTGenerator: 
      start_date: 2024-03-01
      meter_id: 55555555-4444-3333-2222-111111111128
      data_direction: "out"
      resource_location: "US East"
      instance_id: '/subscriptions/99999999-9999-9999-9999-999999999999/resourceGroups/koku-99hqd-rg/providers/Microsoft.Compute/virtualMachines/azure_compute1'
      usage_quantity: 7.5
      resource_rate: 0.01
      tags:
        version: Mars
        dashed-key-on-azure: dashed-value
  1. Create a source and load the OCP data
  2. Create a source and load the Azure data you just created
  3. Let summary run and check the OCP and OCP on Azure database records and verify the network records are visible and distinct with infrastructure_data_in_gigabytes or infrastructure_data_out_gigabytes filled in for each day and each Network unattributed project.
  4. Run a few SQL queries to verify the costs before and after OCPAzure summary line up.
  • docker exec -it trino trino --server localhost:8080 --catalog hive --schema org1234567 --user admin --debug
trino:org1234567> SELECT SUM(costinbillingcurrency) FROM azure_openshift_daily;
       _col0        
--------------------
 34.280750420817206 
trino:org1234567> SELECT SUM(pretax_cost) FROM reporting_ocpazurecostlineitem_project_daily_summary;
       _col0        
--------------------
 34.280750422000004 

SELECT SUM(costinbillingcurrency) FROM azure_openshift_daily WHERE json_exists(lower(additionalinfo), 'strict $.datatransferdirection');
       _col0        
--------------------
 14.128931721012743 
trino:org1234567> SELECT SUM(pretax_cost) FROM reporting_ocpazurecostlineitem_project_daily_summary WHERE data_transfer_direction IS NOT NULL;
       _col0        
--------------------
 14.128931721999995 
  1. Check some OCP endpoints to verify that the costs and values present make sense. such as api/cost-management/v1/reports/openshift/costs/?group_by[project]=* since this change is summary focused,

Release Notes

  • proposed release note
  • Should add an example with numbers below before this is marked done
* [COST-4744](https://issues.redhat.com/browse/COST-4744) This PR will **result in a numbers change when looking at OpenShift or Azure filtered by OpenShift endpoints when grouped by project** as long as OpenShift Costs are coming from an Azure cloud source. 
* Previously the networking cost of the node was distributed amongst the projects on the node but now those networking costs are removed into a separate NEW project called `Network unattributed`.
* Example with numbers: 

- I have a node called `compute_1` and this node has 2 projects, `projectA` and `projectB` that each use 50% of the cluster leaving 0 unallocated costs.
- When I look at the costs for this node grouped by project today, `projectA` costs $15 and `projectB` costs $5 for a total of $20. 
- Of that $20, I know that $5 is networking costs. 
- After this change there will be 3 projects with costs for this node, `projectA`, `projectB`, and `Network unattributed`.
- The cost for `projectA` would now be $12.5, `projectB` would now be $2.5 and `Network unattributed` would be $5. 
- The new Network unattributed project is the networking costs that can be specifically tied to this node but not broken down at the project level. 

Copy link

codecov bot commented Apr 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.2%. Comparing base (c21e5e5) to head (c9cab3b).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #5056   +/-   ##
=====================================
  Coverage   94.2%   94.2%           
=====================================
  Files        378     378           
  Lines      31622   31622           
  Branches    3756    3756           
=====================================
+ Hits       29777   29779    +2     
+ Misses      1177    1174    -3     
- Partials     668     669    +1     

@cgoodfred cgoodfred self-assigned this May 15, 2024
@cgoodfred cgoodfred added the azure-smoke-tests pr_check will build the image and run azure + ocp on azure smoke tests label May 15, 2024
@cgoodfred
Copy link
Contributor Author

/retest

@cgoodfred cgoodfred marked this pull request as ready for review May 15, 2024 13:52
@cgoodfred cgoodfred requested review from a team as code owners May 15, 2024 13:52
@cgoodfred
Copy link
Contributor Author

/retest

@cgoodfred
Copy link
Contributor Author

/retest

@cgoodfred cgoodfred merged commit ff7e502 into main May 23, 2024
13 checks passed
@cgoodfred cgoodfred deleted the COST-4744-azure-net branch May 23, 2024 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure-smoke-tests pr_check will build the image and run azure + ocp on azure smoke tests smokes-required
Projects
None yet
4 participants