LoadRunner's purpose is to calculate "flux" or "loads" of dissolved elements, carried in streams and rivers. This could be for a single USGS water testing site, or for an entire watershed. The element could be a single USGS constituent code, or a list of codes that measure the 'same' quantity by different means.
For example, the kind of question LoadRunner was built to answer was, "how much carbon does the Ohio River watershed carry" - a major component of the carbon sequestered by the watershed.
At the core of LoadRunner is the USGS LOADEST program - a batch program requiring specially prepared input files. LoadRunner's job is to:
The rest of this page goes into more detail on the options and operation of LoadRunner, organized by the area of the LoadRunner screen they appear in.
LoadRunner extracts the quantities of interest from these huge files, massages it into LOADEST format, and runs LOADEST on it.
The current version of LoadRunner assumes you've already fetched or prepared this data for all the sites you're interested in. (You can put a whole list of sites into one file of each type.) A future version may automate retrieving the data for you. For now, you fetch data, and save it onto your local computer, from:
Quality file - USGS Water Quality Samples for the Nation
Flow file - USGS Surface-Water Daily Data for the Nation
OR : You can massage your own data (flow, quality, or both) into a format LoadRunner will accept.
Note that water quality files are samples. The flow data is (almost) daily, over many decades. The quality data is far sparser, sometimes missing years at a stretch with no samples at all. Based on what little sample data is available, LOADEST constructs mathematical models fitting that data to a curve, then amongst those models, selects the best fit. Then it applies that load model to calculate load as a function of date and flow, and report the results.
LOADEST has a nice fat manual on that mathematical challenge.
But for LoadRunner's purposes, note that the "calibration data" - the information we give LOADEST to build its models, LOADEST's "calib.inp" - all comes from the quality data file, including the stream flow to match with each sample measurement. For the calibration data, LoadRunner only uses data from the flow file when the water quality file provides no flow measurement for the day of the sample measurement - a rare event.
The flow file is used as input for calculating loads after the models have been calibrated - LOADEST's "est.inp" file. If days are missing from the raw data, LoadRunner pads in the missing days with averaged data, so that you can get good-enough estimates for stream loads for full months and years.
The USGS does a wonderful job on data quality control, with regular maintenance and corrections. But note that any database that large and complex, built by so many people over so many years, is bound to have quirks and errors that have gone unnoticed. LoadRunner reports any problem it recognizes, in the pursuit of its own modest goals. If you disagree with what LoadRunner decided to do about the problem, you can go back to the original data files and edit them.
The most critical option is the list of ElemNum - what you want LoadRunner to calculate loads of. (This is only an "option" because LoadRunner was built to automate alkalinity load estimation.) Note that unit conversions are also entered here, if any are needed.
Other options in order are:
Alternatively, you might want to give it a particular LOADEST model number to use, and skip trying the others. You might want to do this, for instance, if you studied the Susquehanna River watershed in careful detail, and found Model 4 works best. And then for consistency you want to re-run the watershed with one fixed model.
Note that for each site, LoadRunner spawns a new LOADEST run. So with Model 0 selected, each and every site is going to have its own personal best model selected. Which you may or may not want.
Normally this is set high (Sigma 3.0, or 3 standard deviations), and catches data outliers that suffer from someone entering too many zeroes. Note that it's normally no use at all for catching data values that are too small.
Sigma testing is not applied to non-detect values. So if you set the < detect option to use 1/2, LoadRunner knows which values were properly marked as non-detects, and doesn't apply the sigma test.
Note that this only affects the begin and end dates of the load estimation. Whatever those dates are, LoadRunner pads the interval with any missing flow dates as necessary, to give you full daily coverage within the interval.
The only case when you would set SplitSite to false, is if you cannot match quality and flow data by USGS site number. For instance, perhaps the testing site moved a mile downstream and got a new site number in 1965. So you want to take two site numbers, quality and flow data, and munge them all together. But if this were part of a larger project, a better approach would be to massage the USGS site numbers in the source data files.
Note that if you prepare your own data, and don't provide a site_no column, the site name of "NoSiteGiven" is attributed to the data. (This is true for flow or quality data.) So if, for example, you have hand-crafted qwdata with no site names, and a USGS dv file that you've decided matches that data, you would set SplitSite to false, so that "NoSiteGiven" gets matched to data with another site name. However, if both were hand-crafted files, neither with site_no column, their site names would match ("NoSiteGiven"), and the SplitSite setting wouldn't matter.
Note that FlowGap is talking about flow data gaps, of days. The other gap options are talking about calibration data gaps, of years.
The simplest choices are "Any" and "None". "Any" means just go from beginning year, to end year, of the data, without checking for gaps. "None" means to skip any years that have no sample measurements.
The percentage number settings govern what to do in between. Say we have data from 1953 to 2005 - 53 years inclusive. If you say the maximum gap is 5%, in this run that would be three years (it's rounded to the nearest number). So a run of four consecutive years missing data would not be estimated, but gaps of three or fewer years would run.
Note that the run is not chopped up into separate groups of years run separately. So the LOADEST model fitting is done on the same data, whether gaps are suppressed or not, and that one model fitting is used for all years. This may not be quite what you want, if the load carrying behavior of the stream has changed substantially over the years - you might want the 'early years' estimated separately from the 'recent years'. But to do that, you need to chop up the input files by hand - LoadRunner has no options to help.
Example: If preGapMin is 3, and there are only 2 calibration points before the first gap, those 2 points get dropped. Likewise for the last group of points after the last gap. This is only applied once. So if the points go 2-gap-1-gap-20-gap-1-gap-2, only the first 2 and last 2 would get dropped by preGapMin = postGapMin = 3. The next inward groups of 1 point don't get dropped.
For example, say in 1955 there was a value of "< .5" and in 1995, a value of "< .1". With the
< detect - use 1/2 option, LoadRunner sends these to LOADEST as ".25" and ".05". Neither
value is "censored" in LOADEST's worldview, but both are marked with "<" as a caveat in the
calibration file, for ease of human inspection. The normal data issues report gives counts of these values.
However, if you use option "pass to LOADEST", the values are unchanged and marked as censored for LOADEST.
This is the correct thing to do if you're dealing with an element with many non-detects, such as pesticides.
Unfortunately, we've encountered runs where LOADEST hung completely using such data.
In all cases, the
data was actually bad, but there was no safe way for LoadRunner to divine this. So, if you do use this
option, and LOADEST hangs, you'll need to study the data carefully, and possibly consult LOADEST support.
In a multi-site run, you could use one of the other options just
to explore the dataset, then fix the problems and run with "pass to LOADEST" after resolving them.
Please see the LOADEST site section on "Publications that Discuss Detection Limits and Censored Data"
here for further discussion of when and
why detection limits are an issue.
Note that the sigma test is not applied to non-detects, provided they were properly
marked in the USGS quality data file. (Values of "0" are not properly marked, and simply dropped.)
After a run completes, an "Outputs" group of buttons is available on the right of the the LoadRunner screen, allowing you to click several of these items to open directly, or the whole run directory for your rummaging convenience.
Typically LOADEST is running a bunch of sites, since we are most interested in running whole watersheds. After each site is completed, the Output area buttons give you quick access to the result files for that site. Once all the sites have completed running, these per-site results are appended into files with allsite_ prepended to the file name. Note that xxx_model.txt summary files are only available site-by-site - there is no "allsite" composite form of that file.
More on the contents of these files in the Results section below.
Calibration "data issues" are not marked on the daily flux output file. They really affect all days.
If there aren't enough calibration measurements available, this site's run stops. LOADEST needs a minimum of 12 calibration points, or it can't estimate stream loads.
Flow data issues are marked in the "Caveat" column on the daily xxx_flux.txt output files. This column doesn't appear in the xxx_monthflux.txt and xxx_annualflux.txt composited files.
The Output Files
There are a number of output files written by LoadRunner. (xxx_ is whatever nickname you filled in for the Elem field under Options. sss_ is either a USGS site number or allsite, for the results of all sites appended together.)
The allsite_ version is not an appended version, but a table of site, model, statistics, and the model coefficients, one line per site.
Days with quality data issues are marked with a code in the "Caveat" column.
Days with flow data issues are marked with a code in the "Caveat" column. This column doesn't appear in the other ..._flux.txt files.