LoadRunner

Introduction

LoadRunner's purpose is to calculate "flux" or "loads" of dissolved elements, carried in streams and rivers. This could be for a single USGS water testing site, or for an entire watershed. The element could be a single USGS constituent code, or a list of codes that measure the 'same' quantity by different means.

For example, the kind of question LoadRunner was built to answer was, "how much carbon does the Ohio River watershed carry" - a major component of the carbon sequestered by the watershed.

At the core of LoadRunner is the USGS LOADEST program - a batch program requiring specially prepared input files. LoadRunner's job is to:

  1. Take the source data in its native form - USGS water quality and flow data files (or hand-crafted files),
  2. Massage it to create LOADEST input files,
  3. Run LOADEST for you, then
  4. Present the results in a convenient format for further analysis.

The rest of this page goes into more detail on the options and operation of LoadRunner, organized by the area of the LoadRunner screen they appear in.


The LoadRunner startup screen.
Before clicking the Run button, you have to supply the two Required Input data files.
Further Options allow you to customize the run. LoadRunner remembers your selections from run to run.
After a Run, two new areas - Output and Results - will appear.

Required Input - Data Files

LoadRunner can do nothing without:
  1. Water quality data - the concentration of each element dissolved in the streamflow, and
  2. Flow data - how much water the stream carried that day.

LoadRunner extracts the quantities of interest from these huge files, massages it into LOADEST format, and runs LOADEST on it.

The current version of LoadRunner assumes you've already fetched or prepared this data for all the sites you're interested in. (You can put a whole list of sites into one file of each type.) A future version may automate retrieving the data for you. For now, you fetch data, and save it onto your local computer, from:

Quality file - USGS Water Quality Samples for the Nation

Flow file - USGS Surface-Water Daily Data for the Nation

OR : You can massage your own data (flow, quality, or both) into a format LoadRunner will accept.

Note that water quality files are samples. The flow data is (almost) daily, over many decades. The quality data is far sparser, sometimes missing years at a stretch with no samples at all. Based on what little sample data is available, LOADEST constructs mathematical models fitting that data to a curve, then amongst those models, selects the best fit. Then it applies that load model to calculate load as a function of date and flow, and report the results.

LOADEST has a nice fat manual on that mathematical challenge.

But for LoadRunner's purposes, note that the "calibration data" - the information we give LOADEST to build its models, LOADEST's "calib.inp" - all comes from the quality data file, including the stream flow to match with each sample measurement. For the calibration data, LoadRunner only uses data from the flow file when the water quality file provides no flow measurement for the day of the sample measurement - a rare event.

The flow file is used as input for calculating loads after the models have been calibrated - LOADEST's "est.inp" file. If days are missing from the raw data, LoadRunner pads in the missing days with averaged data, so that you can get good-enough estimates for stream loads for full months and years.

The USGS does a wonderful job on data quality control, with regular maintenance and corrections. But note that any database that large and complex, built by so many people over so many years, is bound to have quirks and errors that have gone unnoticed. LoadRunner reports any problem it recognizes, in the pursuit of its own modest goals. If you disagree with what LoadRunner decided to do about the problem, you can go back to the original data files and edit them.

Options

Recall that the bulk of LoadRunner's job is to automate running LOADEST for you. Most of these options have to do with how it massages the raw data into posing a LOADEST question.

The most critical option is the list of ElemNum - what you want LoadRunner to calculate loads of. (This is only an "option" because LoadRunner was built to automate alkalinity load estimation.) Note that unit conversions are also entered here, if any are needed.

Other options in order are:

The Run Directory and Output

For each run (click of the Run button), LoadRunner creates a directory with three subdirectories:

  1. inputs - Copies of the raw data you gave LoadRunner
  2. loadest - Both inputs and outputs of the LOADEST runs, in a subdirectory for each USGS site included in the run. You can examine the calibration and estimation files and the raw results of the LOADEST program here.
  3. outputs - LoadRunner's restatement of LOADEST's results, data issues found, etc.

After a run completes, an "Outputs" group of buttons is available on the right of the the LoadRunner screen, allowing you to click several of these items to open directly, or the whole run directory for your rummaging convenience.

Typically LOADEST is running a bunch of sites, since we are most interested in running whole watersheds. After each site is completed, the Output area buttons give you quick access to the result files for that site. Once all the sites have completed running, these per-site results are appended into files with allsite_ prepended to the file name. Note that xxx_model.txt summary files are only available site-by-site - there is no "allsite" composite form of that file.

More on the contents of these files in the Results section below.

Results

This text area at the bottom of the screen reports LoadRunner's progress on each site as it runs. Its contents are also written to the xxx_runlog.txt file - the first button in the Outputs area. The messages here include:

  1. The run date/time and run directory name used for all LoadRunner files associated with the current job.
  2. The arguments used for the batch LoadestRunner program underlying the LoadRunner user interface.
  3. Each site number as its input is read, then processed.
  4. Any data issues where infobits were discarded during reading.
    1. Concentrations of 0.0 are discarded - even if true, LOADEST will crash, and in our experience, it isn't true.
    2. Currently concentrations less than detection limit are discarded, because they also cause LOADEST to crash sometimes. We've contacted the LOADEST author about that.
    3. Some lines just don't parse - a value of "N" in a numeric field, etc.
    4. Values discarded for having failed the sigma test.
    Note that unlike the data issues below, this is the end of the line for those info bits. After the "Processing < sitenum >" message, further data warnings are on data that was included in the main calculations.
  5. If any gaps years were omitted, the dates for those.
  6. How many calibration measurements were used, and a caveat summary on that data:
    1. The number of measurements the USGS marked as estimated data.
    2. How many days used flow data from the flow file instead of quality file for calibration.
    3. Measurements the USGS marked as below the detection limit (currently disabled - we ignore those because including them crashed LOADEST sometimes).
    Calibration measurements with these "data issues" are clearly marked on the "calib.inp" file for this site/run (open the directory button under Output and rummage under the "loadest" subdirectory for the site number of interest's LOADEST files.)

    Calibration "data issues" are not marked on the daily flux output file. They really affect all days.

    If there aren't enough calibration measurements available, this site's run stops. LOADEST needs a minimum of 12 calibration points, or it can't estimate stream loads.

  7. How many days flow data were used, and a caveat summary on that data:
    1. The number of measurements the USGS marked as estimated data.
    2. How many days LoadRunner padded with interpolated flow values, because there was no usable USGS data.
    3. How many days LoadRunner supplied averaged flow values, because the USGS data had multiple entries.
    4. How many days LoadRunner replaced a flow value between zero and 1.0, with an interpolated value.
    Flow measurements with these "data issues" are clearly marked on the "est.inp" file for this site/run (open the directory button under Output and rummage under the "loadest" subdirectory for the site number of interest's LOADEST files.)

    Flow data issues are marked in the "Caveat" column on the daily xxx_flux.txt output files. This column doesn't appear in the xxx_monthflux.txt and xxx_annualflux.txt composited files.

  8. Run completion time.

The Output Files

There are a number of output files written by LoadRunner. (xxx_ is whatever nickname you filled in for the Elem field under Options. sss_ is either a USGS site number or allsite, for the results of all sites appended together.)


Ginger Booth for Peter Raymond, January, 2008