Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 509 #2333

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open

Bug 509 #2333

Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
c48584e
Remove plant_name_eia in pudl.transform.eia923.fuel_receipts_costs()
knordback Feb 14, 2023
13a0dc1
Don't discard plant_name_eia from pudl.transform.eia923.generation_fu…
knordback Feb 17, 2023
b4a3c28
Don't drop operator_name/utility_name_eia in pudl.transform.eia923.ge…
knordback Feb 25, 2023
35b6c60
Move newly added code to what seems like a more sensible location
knordback Feb 27, 2023
d6021ac
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 27, 2023
adc4503
Extract the core of AbstractTableTransformer.enforce_schema() into Re…
knordback Feb 28, 2023
2bcd803
Merge branch 'bug-509' of https://github.com/catalyst-cooperative/pud…
knordback Feb 28, 2023
c44949b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 28, 2023
731baef
Use Resource.endorse_schema() to clean up non-matching fields in Data…
knordback Mar 1, 2023
eef6cbf
Merge branch 'bug-509' of https://github.com/catalyst-cooperative/pud…
knordback Mar 1, 2023
fbe7b21
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 1, 2023
e2768f9
Move DataFrame cleaning to the end of transform()
knordback Mar 1, 2023
46d7e66
Merge branch 'bug-509' of https://github.com/catalyst-cooperative/pud…
knordback Mar 1, 2023
508ba29
Call enforce_schema() on entities DataFrames as well
knordback Mar 2, 2023
126102e
Merge branch 'dev' into bug-509
knordback Mar 2, 2023
e489ed4
Merge branch 'dev' into bug-509
zaneselvans Mar 4, 2023
dc5068e
Don't drop combined_heat_power in boiler_fuel()
knordback Mar 17, 2023
fdada61
One more incremental change in boiler_fuel()
knordback Mar 20, 2023
40ef5f4
Several more fields not dropped
knordback Mar 20, 2023
dd9982a
Merge in substantial changes from dev.
zaneselvans Mar 21, 2023
bb60591
More un-dropping of fields
knordback Mar 21, 2023
fc099cc
Merge branch 'dev' into bug-509
knordback Apr 12, 2023
c91edf1
Merge branch 'dev' into bug-509
knordback May 15, 2023
2efd152
Remove code made obsolete in merge from dev
knordback May 16, 2023
d464394
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2023
3675eba
Don't drop total_fuel_consumption_quantity in clean_boiler_fuel_eia923()
knordback May 18, 2023
cb74321
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 18, 2023
54dd4cc
More field non-dropping
knordback May 19, 2023
761ba7b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 19, 2023
67105a1
Don't drop fields in clean_generation_eia923()
knordback May 22, 2023
ee0ae79
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 22, 2023
d962d4f
Bug 509 work in clean_fuel_receipts_costs_eia923()
knordback May 23, 2023
1d466f1
Don't drop fields in clean_generation_fuel_eia923()
knordback May 24, 2023
15cb0c6
Remove a useless function call
knordback May 24, 2023
97ff719
Merge branch 'dev' into bug-509
knordback May 24, 2023
a4f6234
Merge branch 'dev' into bug-509
knordback Jun 1, 2023
968267c
Merge branch 'dev' into bug-509
knordback Jun 8, 2023
a6f7f9b
Merge branch 'dev' into bug-509
zaneselvans Jun 10, 2023
5d3a0e8
Merge branch 'dev' into bug-509
knordback Jun 14, 2023
eaa7615
Merge branch 'dev' into bug-509
knordback Jun 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions src/pudl/package_data/eia923/column_maps/boiler_fuel.csv
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
year_index,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
plant_id_eia,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id
combined_heat_power,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_and_power_plant,combined_heat_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant
associated_combined_heat_power,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_and_power_plant,combined_heat_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant
plant_name_eia,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name
operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name
operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id
utility_name_eia,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name
utility_id_eia,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id
plant_state,state,state,state,state,plant_state,state,plant_state,plant_state,plant_state,plant_state,plant_state,plant_state,plant_state,plant_state
census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region
nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ ash_content_pct,average_ash_content,average_ash_content,average_ash_content,aver
mercury_content_ppm,,,,,average_mercury_content,,average_mercury_content,average_mercury_content,average_mercury_content,average_mercury_content,average_mercury_content,average_mercury_content,average_mercury_content,average_mercury_content
fuel_cost_per_mmbtu,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost,fuel_cost
regulated,regulated,regulated,regulated,regulated,regulated,regulated,regulated,regulated,regulated,regulated,regulated,regulated,regulated,regulated
operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name
operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id
utility_name_eia,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name
utility_id_eia,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id
reporting_frequency_code,respondent_frequency,respondent_frequency,respondent_frequency,respondent_frequency,reporting_frequency,respondent_frequency,reporting_frequency,reporting_frequency,reporting_frequency,reporting_frequency,reporting_frequency,reporting_frequency,reporting_frequency,reporting_frequency
primary_transportation_mode_code,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode,primary_transportation_mode
secondary_transportation_mode_code,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode,secondary_transportation_mode
Expand Down
4 changes: 2 additions & 2 deletions src/pudl/package_data/eia923/column_maps/generation_fuel.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ plant_id_eia,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plan
combined_heat_power,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_and_power_plant,combined_heat_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant
nuclear_unit_id,nuclear_unit_i_d,nuclear_unit_i_d,nuclear_unit_i_d,nuclear_unit_i_d,nuclear_unit_i_d,nuclear_unit_i_d,nuclear_unit_i_d,nuclear_unit_i_d,nuclear_unit_i_d,nuclear_unit_i_d,nuclear_unit_id,nuclear_unit_id,nuclear_unit_id,nuclear_unit_id,nuclear_unit_id,nuclear_unit_id,nuclear_unit_id,nuclear_unit_id,nuclear_unit_id,nuclear_unit_id,nuclear_unit_id
plant_name_eia,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name
operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name
operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id
utility_name_eia,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name
utility_id_eia,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id
plant_state,state,state,state,state,state,state,state,state,state,state,state,plant_state,state,plant_state,plant_state,plant_state,plant_state,plant_state,plant_state,plant_state,plant_state
census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region
nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region
Expand Down
4 changes: 2 additions & 2 deletions src/pudl/package_data/eia923/column_maps/generator.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ year_index,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
plant_id_eia,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id
combined_heat_power,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_power_plant,combined_heat_and_power_plant,combined_heat_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant,combined_heat_and_power_plant
plant_name_eia,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name,plant_name
operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name
operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id
utility_name_eia,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name,operator_name
utility_id_eia,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id,operator_id
plant_state,state,state,state,state,plant_state,state,plant_state,plant_state,plant_state,plant_state,plant_state,plant_state,plant_state,plant_state
census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region,census_region
nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region,nerc_region
Expand Down
49 changes: 10 additions & 39 deletions src/pudl/transform/eia923.py
Original file line number Diff line number Diff line change
Expand Up @@ -618,21 +618,12 @@ def clean_generation_fuel_eia923(raw_generation_fuel_eia923: pd.DataFrame):

# Drop fields we're not inserting into the generation_fuel_eia923 table.
cols_to_drop = [
"combined_heat_power",
"plant_name_eia",
"operator_name",
"operator_id",
"plant_state",
"census_region",
"nerc_region",
"naics_code",
"fuel_unit",
"total_fuel_consumption_quantity",
"electric_fuel_consumption_quantity",
"total_fuel_consumption_mmbtu",
"elec_fuel_consumption_mmbtu",
"net_generation_megawatthours",
"early_release",
]
gen_fuel.drop(cols_to_drop, axis=1, inplace=True)

Expand Down Expand Up @@ -764,7 +755,15 @@ def _aggregate_duplicate_boiler_fuel_keys(boiler_fuel_df: pd.DataFrame) -> pd.Da
quantity_cols
+ relative_cols
+ key_cols
+ ["prime_mover_code", "sector_id_eia", "sector_name_eia"]
+ [
"prime_mover_code",
"sector_id_eia",
"sector_name_eia",
"total_fuel_consumption_quantity",
"balancing_authority_code_eia",
"early_release",
"reporting_frequency_code",
]
)
actual_cols = set(boiler_fuel_df.columns)
difference = actual_cols.symmetric_difference(expected_cols)
Expand Down Expand Up @@ -830,19 +829,7 @@ def clean_boiler_fuel_eia923(raw_boiler_fuel_eia923: pd.DataFrame) -> pd.DataFra
# Need to stop dropping fields that contain harvestable entity attributes.
# See https://github.com/catalyst-cooperative/pudl/issues/509
cols_to_drop = [
"combined_heat_power",
"plant_name_eia",
"operator_name",
"operator_id",
"plant_state",
"census_region",
"nerc_region",
"naics_code",
"fuel_unit",
"total_fuel_consumption_quantity",
"balancing_authority_code_eia",
"early_release",
"reporting_frequency_code",
"data_maturity",
]
bf_df.drop(cols_to_drop, axis=1, inplace=True)
Expand Down Expand Up @@ -948,18 +935,7 @@ def clean_generation_eia923(raw_generator_eia923: pd.DataFrame) -> pd.DataFrame:
gen_df = (
raw_generator_eia923.dropna(subset=["generator_id"])
.drop(
[
"combined_heat_power",
"plant_name_eia",
"operator_name",
"operator_id",
"plant_state",
"census_region",
"nerc_region",
"naics_code",
"net_generation_mwh_year_to_date",
"early_release",
],
[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just remove this .drop() altogether.

axis="columns",
)
.pipe(_yearly_to_monthly_records)
Expand Down Expand Up @@ -1101,18 +1077,13 @@ def clean_fuel_receipts_costs_eia923(
# Drop fields we're not inserting into the fuel_receipts_costs_eia923
# table.
cols_to_drop = [
"plant_name_eia",
"plant_state",
"operator_name",
"operator_id",
"mine_id_msha",
"mine_type_code",
"state",
"county_id_fips",
"state_id_fips",
"mine_name",
"regulated",
"early_release",
]

cmi_df = (
Expand Down