Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Archive job does not do online archive correctly for both GDAS and GFS cycles #2673

Open
emilyhcliu opened this issue Jun 10, 2024 · 4 comments · May be fixed by #2687
Open

The Archive job does not do online archive correctly for both GDAS and GFS cycles #2673

emilyhcliu opened this issue Jun 10, 2024 · 4 comments · May be fixed by #2687
Assignees
Labels
bug Something isn't working

Comments

@emilyhcliu
Copy link
Contributor

emilyhcliu commented Jun 10, 2024

What is wrong?

I am running two experiments using two different global-workflow versions:
Exp1 - uses hash# 59cdc0e (last updated on April 5, 2024)
Exp2 - uses hash# acf3aaa (last updated on June 6, 2024)

For both runs (Exp1 and Exp2), the pgb files created were copied from RUNDIR to ROTDIR under the product directory for GDAS and GFS cycles without any problems. The exp1 experiment did not have an online archive problem. However, the online archive job for Exp2 has missing files for both GDAS and GFS runs.

The archive job has two parts: one is the HPSS archive, and the other is the online archive.
There were no problems with the HPSS archive. However, the online archive job has issues:

  1. For the GDAS cycle: gsistat and pgbanl files were not archived; other files were OK
  2. For the GFS cycle: No files were archived.

The archive job (develop version) is processed using exglobal_archive.py with arcdir.yaml as input.
There was a PR #2621 related to the archive job merged on June 1.
There was a refactoring of the arcdir.yaml.j2, which may be related to the problem with the online archive job reported in this issue.

What should have happened?

For GDAS and GFS cycles, both analysis and forecast pgb files should be archived on disk (online archive) along with gsistat files.

What machines are impacted?

All or N/A

Steps to reproduce

  1. Check out the latest global-workflow from develop
  2. Configure to run both GDAS and GFS for one cycle.

Additional information

My Exp2 run:

HOMEgfs:/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-thompson-enkffix
EXPDIR: /scratch1/NCEPDEV/da/Emily.Liu/para/v17/v17allskyens
ROTDIR:/scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17allskyens
ARCDIR:/scratch1/NCEPDEV/da/Emily.Liu/archive/v17allskyens

Related log files:
/scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17allskyens/logs/2023040300/gdasarch.log
/scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17allskyens/logs/2023040300/gfsarch.log

Do you have a proposed solution?

Debug exglobal_archive.py and its related scripts and yaml files (e.g. arcdir.yaml.j2)

@emilyhcliu
Copy link
Contributor Author

Tagging @azadeh-gh for awareness.

@DavidHuber-NOAA
Copy link
Contributor

Thanks for letting me know about this, @emilyhcliu. I will take a look today and see what's going on.

@CatherineThomas-NOAA CatherineThomas-NOAA removed their assignment Jun 11, 2024
@emilyhcliu
Copy link
Contributor Author

@DavidHuber-NOAA Do you have a timeline for fixing the online archive issue?
We have three experiments running with the latest global workflow, which includes the archive refactoring work merged on June 1. Knowing the timeline for fixing the issue will help us decide whether to wait for the fix or rebuild the global workflow with an earlier version (before June 1) for the experiments.
Thanks!

WalterKolczynski-NOAA added a commit to WalterKolczynski-NOAA/global-workflow that referenced this issue Jun 12, 2024
The metplus job isn't running correctly (it "succeeds" but has a
ton of silent errors), so turning it off until it works. May be as
simple as the archive not working properly for gfs output, but there
may be more underneath, as these haven't been validated in a while.

Refs: NOAA-EMC#2673
@DavidHuber-NOAA
Copy link
Contributor

@emilyhcliu I expect to have a fix in by mid next week at the latest. I did some exploratory work yesterday and have an idea of the root cause, but there's still some more debugging work to do.

DavidHuber-NOAA added a commit to DavidHuber-NOAA/global-workflow that referenced this issue Jun 13, 2024
DavidHuber-NOAA added a commit to DavidHuber-NOAA/global-workflow that referenced this issue Jun 13, 2024
@DavidHuber-NOAA DavidHuber-NOAA linked a pull request Jun 13, 2024 that will close this issue
5 tasks
WalterKolczynski-NOAA added a commit to WalterKolczynski-NOAA/global-workflow that referenced this issue Jun 14, 2024
The metplus job isn't running correctly (it "succeeds" but has a
ton of silent errors), so turning it off until it works. May be as
simple as the archive not working properly for gfs output, but there
may be more underneath, as these haven't been validated in a while.

Refs: NOAA-EMC#2673
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants