Fix bugs in allocate_gen_fuel #3690

grgmiller · 2024-06-22T22:34:47Z

Overview

This PR fixes several issues that we identified in the analysis.allocate_gen_fuel module and had fixed in our fork of pudl for OGE. We are now trying to get rid of our dependency on the pudl code, so want to migrate all of our changes over to pudl so that we can directly use the output table from this module.

This is part of work we are tracking in OGE here: singularity-energy/open-grid-emissions#369

First, this addresses an issue where some retiring generators were incorrectly identified and being dropped: singularity-energy#1

This PR fixes a bug where during the generation and fuel allocation process, data for report months after the reported retirement date was getting dropped for generators that retire mid-year. For example, the retirement date of both generators at plant 50937 was "2022-09-01", and the previous behavior was to drop all data after september, even though this plant continued to report fuel consumption after september. This fix keeps all report dates through the end of the current year to avoid dropping this data.

Second, this addresses issues with duplicate generators, as described in this PR: singularity-energy#3

When running the pudl.analysis.allocate_gen_fuel pipeline for 2016 and 2017, we were getting a TypeError at group_duplicate_keys(), because this function was trying to groupby().sum() non-numeric columns like generator_retirement_date.
The group_duplicate_keys() will only work if we drop any datetime and boolean columns before using this function, and considering carefully whether we want to sum any of the frac columns or not. This PR, however, does not touch this function, but rather fixes the issue upstream.
We were only running into this issue with group_duplicate_keys() because there were duplicate keys in the dataframe, so this PR addresses the root cause of where duplicate keys were getting introduced in the first place.
It turns out that when creating the gen_assoc table with associate_generator_tables(), one of the steps is remove_inactive_generators(), which removes certain generators by creating six different dataframes with different generators based on their operating status: existing, retiring_generators, retired_plants, proposed_generators, proposed_plants, and unassociated_plants. These six dataframes are then concat'ed together. Previously our assumption was that these six dataframes should be non-overlapping. However, it turns out that this is not always the case.
For example, in 2016, plant 56846 generator GTG1 ended up in both proposed_generators and proposed_plants, which was causing it to be duplicated.
We fix this by simply adding .drop_duplicates() after these six dataframes are concat'ed together. This fixes the issue that we were experiencing in 2016 and 2017.
For now, we will leave group_duplicate_keys() alone even though it does not work. It effectively acts as an error if there are ever any duplicate keys since it will raise a typeerror like we saw for 2016 and 2017.

Testing

We have successfully run this after importing pudl and running it in the OGE pipeline. However, we had previously been testing this with an older release of pudl (v2023.12.01).

I have a Windows machine so there are not great instructions on getting the pudl dev environment set up on Windows. This is a pretty small amount of code change, so I'm hoping that someone with the dev environment already set up may be able to help test this.

To-do list

Give feedback

If updating analyses or data processing functions: make sure to update or write data validation tests (e.g., `test_minmax_rows()`)

If updating analyses or data processing functions: make sure to update or write data validation tests (e.g., test_minmax_rows())
Options
Successfully updated the issue's project

There was an error updating the issue's project
Update the [release notes](../docs/release_notes.rst): reference the PR and related issues.

Update the release notes: reference the PR and related issues.
Options
Successfully updated the issue's project

There was an error updating the issue's project
Ensure docs build, unit & integration tests, and test coverage pass locally with `make pytest-coverage` (otherwise the merge queue may reject your PR)

Ensure docs build, unit & integration tests, and test coverage pass locally with make pytest-coverage (otherwise the merge queue may reject your PR)
Options
Successfully updated the issue's project

There was an error updating the issue's project
Review the PR yourself and call out any questions or issues you have

Review the PR yourself and call out any questions or issues you have
Options
Successfully updated the issue's project

There was an error updating the issue's project
For minor ETL changes or data additions, once `make pytest-coverage` passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and [run relevant data validation tests](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#data-validation) using `pytest` and `--live-dbs`.

For minor ETL changes or data additions, once make pytest-coverage passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests using pytest and --live-dbs.
Options
Successfully updated the issue's project

There was an error updating the issue's project
For significant ETL, data coverage or analysis changes, once `make pytest-coverage` passes, ensure the full ETL runs locally and [run data validation tests](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#data-validation) using `make pytest-validate` (a ~10 hour run). If you can't run this locally, run the `build-deploy-pudl` GitHub Action (or ask someone with permissions to). Then, check the logs on the `#pudl-deployments` Slack channel or `gs://builds.catalyst.coop`.

For significant ETL, data coverage or analysis changes, once make pytest-coverage passes, ensure the full ETL runs locally and run data validation tests using make pytest-validate (a ~10 hour run). If you can't run this locally, run the build-deploy-pudl GitHub Action (or ask someone with permissions to). Then, check the logs on the #pudl-deployments Slack channel or gs://builds.catalyst.coop.
Options
Successfully updated the issue's project

There was an error updating the issue's project
Options

grgmiller · 2024-06-23T00:11:35Z

@cmgosnell not sure who would be best to review this so added you for now.

grgmiller added 2 commits June 22, 2024 18:26

Fix bugs in allocate_gen_fuel

b2b4f9f

fix spelling

5f7f653

grgmiller requested a review from cmgosnell June 23, 2024 00:11

zaneselvans added analysis Data analysis tasks that involve actually using PUDL to figure things out, like calculating MCOE. community labels Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bugs in allocate_gen_fuel #3690

Fix bugs in allocate_gen_fuel #3690

grgmiller commented Jun 22, 2024 •

edited

Loading

To-do list

grgmiller commented Jun 23, 2024

Fix bugs in allocate_gen_fuel #3690

Are you sure you want to change the base?

Fix bugs in allocate_gen_fuel #3690

Conversation

grgmiller commented Jun 22, 2024 • edited Loading

Overview

Testing

To-do list

grgmiller commented Jun 23, 2024

grgmiller commented Jun 22, 2024 •

edited

Loading