Skip to content

Using EFSOI with GDAS

AndrewEichmann-NOAA edited this page Jul 27, 2022 · 19 revisions

Current Status

last edited July 26, 2022

The EFSOI code is merged with the GSI-utils develop branch and the scripts and j-jobs in the global-workflow develop branch, and most of the rest of the necessary files preliminary approval to be merged with the global-workflow development branch. As of this writing, EFSOI can be run on Orion using the development fork that has been merged with global-workflow develop from July 22, 2022 (ffcd5b), and the GSI-utils develop from July 19 (322cc7b).

Quick Start

Code, building, and experiment setup

To use EFSOI, you need to clone the forked global-workflow repository, then checkout the EFSOI branch, then run the usual sequence of scripts. This branch of global-workflow will clone a hash of a development fork of GSI-utils that contains the necessary fix file and some scripts for analyzing the EFSOI output. At the time of this writing EFSOI works on Orion and previously worked on WCOSS.

To set up the current global-workflow repository with the latest EFSOI development:

git clone --recursive https://github.com/AndrewEichmann-NOAA/global-workflow.git

cd global-workflow/

git checkout 92a4489

and then checkout, build, and link global-workflow as usual.

Run workflow/setup_expt.py as usual for a cycling experiment. For testing 20-member ensembles suffices, and 80-member are used for experiments. GFS need not be run run separately.

In config.base in your expdir, set

export DO_EFSOI="YES"

Also, per the global-workflow instructions for cycling experiments, set the following:

imp_physics from 8 (Thompson) to 11 (GFDL)

CCPP_SUITE to FV3_GFS_v16 (or another suite that uses GFDL)

Then run workflow/setup_xml.py from the EFSOI build of global-workflow. This will set up the workflow with the extra EFSOI tasks. The experiment can be started as usual.

What Should Happen

During the first complete cycle, the gdaseupdfsoi task - the ensemble update with settings specific to EFSOI - will run with the same priority as gdaseupd, leading to a parallel set of EFSOI-specific gdas tasks ending with post-processing. In the first complete cycle, the first 30-hour ensemble forecast (metatask gdasefmnfsoi) is generated, and the gdasefsoi task will never run. During the second complete cycle, the forecast will be made again for 30 hours, and post-processed to generate 24-hour and 30-hour ensemble means. The gdasefsoi task in this cycle will sit idle until the cycle 24 hours subsequent is active and creates the verifying analysis. The gdasefsoi task from the second complete cycle then runs, using the 24-hour forecast from that cycle and the 30-hour forecast from the previous cycle, creating the final observation sensitivity - osense - file. This is placed in the osense directory in COMROT. The process is repeated for the following cycles for the length of the experiment.

One result of this process is that the EFSOI-specific data, stored in efsoigdas directories with a structure similar to that of enkfgdas directories, has to be kept on disk for a longer time than the other data files, and can eat up space. Likewise the osense files are several hundred MB for each cycle. These osense files can be analyzed with scripts in sorc/gsi_utils.fd/src/EFSOI_Utilities/scripts.

From Theory to Practice

Background

Ensemble Forecast Sensitivity to Observation Impacts is based on a method developed in Langland and Baker (2004) that uses a model adjoint and the Kalman gain to determine the positive or negative impact of individual assimilated observations on the error of a forecast relative to a verifying analysis. The state vector plot below illustrates the concept.

Plot by Rahul Mahajan

The forecast background Xb and analysis Xa of a given cycle are both used to initialize forecasts, Xaf and Xbf. These forecasts are then compared to a verifying analysis Xt to obtain the respective errors of the two forecasts. The difference in the errors at each observation point are traced back to their respective assimilated observations using the following equation:


Kalnay et al. (2012) developed a method to use ensemble forecasts and observation error covariance in lieu of an adjoint and Kalman gain:

References:

Kalnay, E., Ota., Y., Miyoshi, T. and Liu, J. 2012. A simpler formulation of forecast sensitivity to observations: application to ensemble Kalman filters. Tellus, 64A, 18462

Ota., Y., Derber., J., Kalnay., E. and Miyoshi., T., 2013, Ensemble-Based Observation Impact Estimates Using the NCEP GFS. Tellus, 65A, 20038

EFSOI in GDAS and global-workflow

In more concrete terms within GDAS and global-workflow, the variables in the EFSOI equation are represented as follows:

where the green terms are stored in the initial "osense file" generated during the EFSOI-specific ensemble update task (gdaseupdfsoi) for a given cycle t0, and the values used for the forecast perturbation (the red term) are in 24-hour ensemble member forecasts initialized with the analysis at t0, also generated by gdaseupdfsoi. The forecast errors (the blue terms) are calculated using the 24-hour forecast ensemble mean, the 30-hour forecast ensemble mean from the cycle t0-6hr (which is functionally the same as a 24-hour forecast initialized with the background at t0), and the verifying analysis from t0+24hr. Both the 24-hour and 30-hour ensemble forecasts are run at the same time for t0 with the metatask gdasefmnfsoi, the 24-hour forecast to for the EFSOI calculation for cycle t and the 30-hour forecast for cycle t+6hr, and the ensemble means generated with gdasepmnfsoi. Note that the 24-hour forecasts used are specific to the global model; regional models may use shorter forecasts for the same purpose.

Running EFSOI

Tools for Analysis

Developer Notes

Location of EFSOI-relevant code

The code to run EFSOI is spread out over three repositories: GSI-util for the EFSOI-exclusive Fortran code and Python scripts for analysis, GSI for libraries in enkf and gsi, and global-workflow for scripts to run within a cycling experiment.

GSI-util

Everything in this repository is under src/EFSOI_Utilities/, with the Fortran in src/EFSOI_Utilities/src and Python scripts in src/EFSOI_Utilities/scripts. Under src/EFSOI_Utilities/fix is a version of the file global_anavinfo.l127.txt from the GSI fix directory that is identical except for an entry for the EFSOI executable. At the time of this writing the location of this file is assigned to ANAVINFO in the config.efso in the experiment directory, though it should be merged with the regular fix file.

The files under src are as follows:

  • efsoi.f90
  • efsoi_main.f90
  • gridio_efsoi.f90
  • loadbal_efsoi.f90
  • loc_advection.f90
  • scatter_chunks_efsoi.f90
  • statevec_efsoi.f90

The filenames ending in _efsoi.f90 were originally from similar files under the EnKF code in GSI as for various reason they could not be used as is as libraries. Otherwise effort has been made to reduced code duplication, and certain modules are linked from the EnKF and GSI code. As such the GSI-utils build needs to told the location as gsi_ROOT and enkf_ROOT, as described in the GSI-utils INSTALL.md. This is done automatically in the global-workflow build, and that is probably the easiest context to do development here.

GSI

The EFSOI code links a number of modules as libraries in the GSI code, generally parameter setting, file I/O, MPI handling, and the like. Of particular interest is the source code in enkf_obs_sensitivity.f90, which contains the code for reading and writing the osense file. This can be helpful for understanding the contents of the file, and modifying it if necessary. Apparently it is not otherwise used by the enkf executable, and so changes to it will not affect anything anything else. Bear in mind that the format has to match between writing and reading subroutines.

global-workflow

The global-workflow repository contains the configuration files and scripts necessary to run the tasks needed to complete the EFSOI algorithm. Each of the EFSOI-specific tasks (eupdfsoi, ecenfsoi, esfcfsoi, efcsfsoi, eposfsoi, and efsoi) has their own rocoto script:

  • jobs/rocoto/eupdfsoi.sh
  • jobs/rocoto/esfcfsoi.sh
  • jobs/rocoto/ecenfsoi.sh
  • jobs/rocoto/efcsfsoi.sh
  • jobs/rocoto/eposfsoi.sh
  • jobs/rocoto/efsoi.sh

...which in turn calls its respective j-job:

  • jobs/JGDAS_EFSOI_UPDATE
  • jobs/JGDAS_EFSOI_ECEN
  • jobs/JGDAS_EFSOI_SFC
  • jobs/JGDAS_EFSOI_FCST
  • jobs/JGDAS_EFSOI_POST
  • jobs/JGDAS_EFSOI

The tasks ecenfsoi, esfcfsoi, efcsfsoi, and eposfsoi, the j-jobs call the scripts used for the corresponding regular task, after setting variables specific to EFSOI tasks. The EFSOI tasks eupdfsoi and efsoi have their own run scripts:

  • scripts/exgdas_efsoi_update.sh
  • scripts/exgdas_efsoi.sh

The eupdfsoi script scripts/exgdas_efsoi_update.sh is fairly similar to scripts/exgdas_enkf_update.sh, and scripts/exgdas_efsoi.sh is a stripped down version of the same. There is also an EFSOI-specific block in ush/forecast_predet.sh, downstream of the ensemble forecast script.

Each EFSOI-specific task also has its own config file in parm/config/ that gets copied into the experiment directory, as well as entries in parm/config/config.resources and the machine-specific settings in the env directory.

There are also blocks in jobs/rocoto/earc.sh, config/config.earc, and ush/hpssarch_gen.sh to handle EFSOI-specific archiving.

Finally, there are sections the workflow setup scripts workflow/applications.py, workflow/rocoto/workflow_tasks.py, and /workflow/rocoto/workflow_xml.py which handle setting up the EFSOI workflow if DO_EFSOI="YES" is detected in config.base when setting up an experiment.

The osense file

The osense file output by both the EnKF executable during the ensemble update task, and EFSOI executable during the efsoi task. The update outputs the statistical information of each assimilated observation required to perform the EFSOI calculation, where the EFSOI executable reads it in, and then overwrites it with the same information plus the observation sensitivities.

The following tables document the contents of the osense file. The first is for a single header with variables common to all observations, and the second describing the record for each observation. The conventional and ozone observations are generally handled separately from the satellite observations, though the record format is the same. The variable names are as they are used in the EnKF/EFSOI code, and the same are used by convention in accompanying Python scripts

Type Variable Name Description
real(r_single) obfit_prior Observation fit to the first guess
real(r_single) obsprd_prior Spread of observation prior
real(r_single) ensmean_obnobc Ensemble mean first guess (no bias correction)
real(r_single) ensmean_ob Ensemble mean first guess (bias corrected)
real(r_single) ob Observation value
real(r_single) oberrvar Observation error variance
real(r_single) lon Longitude
real(r_single) lat Latitude
real(r_single) pres Pressure
real(r_single) time Observation time
real(r_single) oberrvar_orig Original error variance
integer(i_kind) stattype Observation type
character(len=20) obtype Observation element / Satellite name
integer(i_kind) indxsat Satellite index (channel) set to zero
real(r_single) osense_kin Observation sensitivity (kinetic energy) [J/kg]
real(r_single) osense_dry Observation sensitivity (Dry total energy) [J/kg]
real(r_single) osense_moist Observation sensitivity (Moist total energy) [J/kg]