National Weather Service United States Department of Commerce
9

Database Setup

Overview
Instructions for Deriving Flow Direction Grids
Instructions for Setting Up and Checking Regression Database Delivered by MTU


Overview

The database required to support threshold runoff calculations nation wide is complex.  The two most significant components of the database that were not readily available at the start of this project (January, 1999) were projected flow direction grids for each RFC, and a digital database of terrain and climatic characteristics required to support USGS flood frequency calculations.  Flow direction grids were derived using DEM and RF1 data provided by NOHRSC, and data layers required for USGS flood frequency calculations were compiled at Michigan Technological University.  A significant amount of work is required to (1) derive usable and accurate flow direction grids, and (2) check the database delivered by Michigan Tech against USGS Water Resources Investigations Report 94-4002 and make modifications necessary to conform with the specifications required by the Avenue programs developed at HRL.  The threshold runoff Avenue programs are designed to be fairly general.  This requires a database which conforms to strict specifications that must be carefully checked.  Some of the tasks are tedious but definitely worthwhile.

Deriving Flow Direction Grids

A summary of the flow direction pre-processing and some of the reasoning behind the revisions to earlier plans is summarized here, followed by more detailed instructions on how to execute this pre-processing procedure.

Summary of pre-processing steps for importing, projecting, and stream carving:

1.  Convert NOHRSC river line files (modified RF1), elevation raster, and slope raster files into Arc/Info format.
2.  For elevation and slope grids, reassign values of 32768 as NODATA values.
3.  Obtain outline of RFC boundary from NOHRSC
4.  Project elevation grid, slope grid, river lines, and RFC boundary into the Albers projection
5.  Buffer the projected RFC boundary file.
6.  Use the buffered RFC boundary to create a mask and clip out portions of the elevation and slope grids to analyze.  This reduces processing times for later steps and helps to ensure that the vector streams cross over the edge of the elevation grid to the border areas with NODATA (a requirement for stream carving).
7.  Fill the masked elevation grid.
8.  Rasterize the RF1 rivers.
9.  Carve stream cells into the land surface.
10.  Introduce a slight elevation gradient in flat areas to force flow directions in the correct direction.  This involves running an AML which calls a C program (flatareas5) and several Arc/Info Grid functions.
11.  Fill sinks again.
12.  Compute flow direction and flow accumulation.
13.  Use semi-automated procedures to correct the flow direction grid as explained below.

A variant of processing steps 1-12 were initially run at Michigan Tech University for all RFCs.  As time has passed, several reasons to refine this procedure have been discovered; therefore, re-running some of these steps is required.  It should not be a huge burden on resources to repeat these processing steps because they are automated and the 15 arc-second grids are relatively small in the DEM world.  The computational time for processing steps 1-12 is on the order of 12 hours (depends on many factors), but the most computationally expensive processes can be run overnight.

There are several reasons for refining this procedure that revealed themselves by working with and checking the data at HRL.

-  In the original procedure (Spring 1999), the buffer length selected to clip grids was 30 cell widths outside the RFC boundary or 12000 m.  This buffer is not long enough at the edge of one portion of ABRFC because it does not allow the processing routines to account for the effects of the closest RF1 stream outside of the RFC basin.  The presence or absence of this particular river segment just outside ABRFC influences the assignment of flow directions on a relatively flat area near the RFC basin divide (this may be an artifact of poor DEM or the filling process).  To avoid this problem, a larger buffer of 50000 m is selected for re-processing.

-  Some of the NOHRSC DEMs use more than one code to denote data outside the study area.  The automatic routines used in the first round of processing assumed that the code 32768 was the only "NODATA" code used.  The code 65436 was also used in NODATA in MARFC.  This means restarting from step 2 is required for some RFCs.

-  It is worthwhile to refine the stream carving method so that instead of setting stream cells to 0 as in the original method, land surface cells are raised relative to river cells.  The current differential used to raise the land surface cells is 100 m.  Setting river cells to 0 may cause more serious problems than using a constant elevation differential in situations where two headwater arcs that drain to different basins originate in neighboring DEM cells.  Also, some land cells in coastal areas are rounded down to zero elevation, thus it became difficult to distinguish these cells from river cells using the old method.  All ocean (or large lake) cells have the value NODATA.

-  For RFCs that border on large bodies of water (i.e. oceans or Great Lakes) it may be necessary to edit (lengthen) the river arcs along the coast.  This is a manual procedure (explained below), but it goes pretty quickly.

Latest Procedures (as of Dec. 2, 1999)
Note: These procedures require both Arc/Info 7.0 (or later) and ArcView 3.1 and Spatial Analyst 1.1 or later.

1.  Extract the data for an RFC from the CD-ROM delivered by Michigan Tech.  Keep data for different RFCs in separate directories.

2.  Run prepro1.aml.  To run, follow the instructions in the header of the prepro1.aml file.  This AML may take many hours to complete since it includes use of the "fill" Grid function.  The buffering step can also be expensive for coastal areas.  There is one interactive step that is required before the filling occurs.  A Grid graphics window will display a mask of the watershed and the user is required to select the smallest box that contains the entire area.  After this step, it may be desirable to let this AML to run over night (especially for large RFCs).  Prepro1.aml uses the project command and requires the projection file geoalb83.prj to exist in the same directory.

3.  If the RFC being processed borders large bodies of water (i.e. oceans or Great Lakes) it will likely be necessary to edit the river arcs along the coast.  This is a manual procedure but it goes pretty quickly.  The steps to do this are as follows:

Load the the elevpmf Grid, projected RF1 Theme (rf1p), and the projected RFC boundary Theme (rfcbndp) into an ArcView View.  These Themes were created by prepro1.aml.

Load the ThreshR extension.

All of the "rf1p" arcs point upstream rather than downstream.  It is useful to flip the direction of the arcs for for step 6 below.  To do this, make "rf1p" the active Theme and run ThreshR-Utility --> Flip Arcs.  This program will ask you to save the flipped arcs as a new shapefile.  Name the new file "rf1pf.shp."  Attribute information is transferred to rf1pf.shp with the exception of some standard Arc/Info coverage attributes that are no longer needed in ArcView. For this attribute transfer to work, the seventh field in the Arc/Info "rf1p" file should be the Arc/Info User-ID field.  This was the case for the ABRFC data.  If this is not the case, then the ThreshR-Utility --> Flip Arcs script needs to be changed.

Note:  I don't know with 100% certainty that all arcs will be pointing upstream for all RFCs, so please check this before running this program. An easy way to check this is to changed the Legend Symbol used to a line with arrows rather than the default single line.  If you zoom in, you will see more arrows.

4.  For coastal areas, extend the length of rf1pf lines when necessary:

This is done most easily doing using ArcView.  Modify the legend for the elevation grid (elevpmf) so that zero values have a distinctive color from NODATA cells and a distinctive color for all cells greater than zero.  Make NODATA cells transparent.  The goal is to lengthen any rf1pf arcs that don't quite make it to the ocean.  This is important for stream carving, especially if a portion of the landscape is assigned zero elevation.  Add the RFC boundary Theme available from the NOHRSC web site because this provides a clearer definition of the coastline than the DEM and provides some guidance regarding the appropriate path along which a river arc should be extended.  Sometimes it is also helpful to have an atlas handy to orient yourself.

Pan and Zoom along the coast to make the required edits.  It is much easier to add new arcs to rf1mod than it is to modify existing arcs.  See ArcView documentaion on how to edit an arc Shapefile.

5.  Run prepro2.aml.  To run, follow the instructions in the header of the prepro2.aml file.  This aml calls a C program called flatareas5.c (must be compiled).

6.  While prepro2.aml is running, ArcView scripts can be run to identify potential problem points in the derived flow direction grid.  Potential problem points occur when two separate stream arcs come within one 400 m grid cell of one another at a point that is far from a junction.  This happens quite frequently when using the RF1 data in conjunction with a 15 arc-second DEM.  The steps to do this are as follows:

- Load the most recent ThreshR Extension file into ArcView
- Add the rf1pf.shp theme that has its arcs pointing correctly downstream.
- Make a copy of the flipped and edited stream coverage (rf1pf.shp) by selecting Theme --> Convert to Shape File (Add this shapefile to the View). Make sure that either no arcs or all arcs are selected so that the entire file will be copied.
- Run ThreshR-Utility --> ID multi-arc Reaches

The goal here is to eliminate reaches that are composed of multiple arcs.  To do this, a new attribute is added to rf1pf.shp that contains a unique value for each reach.  (It turns out that this program works for reaches with two arcs but may not work with reaches for 3 arcs so the run may need to be repeated.)  This attribute is used later with the Avenue "Summarize" request to merge all multi-arc shape files.  The ID Multi-arc Reaches program adds an attribute to the rf1pf.shp called "merge."  There is a record in rf1pf.shp for each arc.  For any river reaches that are composed of multiple arcs, the merge attribute for each arc in that reach is assigned an common integer value with other arcs that make up that reach.  For single arc reaches, the merge attribute is 0. (Took 51 minutes for MARFC)

The program prompts the user to identify the river theme and a copy of the river theme.  Note that the river theme is modified by this program. The original flipped coverage should have fewer records in its attribute table than the copy when the program is completed (unless all stream segments were originally only single arcs).

- Add a new integer field to the rf1pf.shp file called merge1, select all zero records in "merge", and use the Calculate button to assign arcinf_id to selected records, toggle the selection and assign "merge" plus the maximum of arcinf_id to the selected records.  Finally, run the program ThreshR--> Merge Multi-arc Reaches to merge lines with a common "merge1" id value.

-  Run ThreshR-Utility/Identify Potential Problems (Time estimate for ABRFC: 14 min )

This program asks the user to:
   -  identify the river theme (mgline1.shp)
   - identify the field containing a unique integer identifier (Select the field "merge1")
   -  define how many cells should be used to define the neighborhood of the outlet where we are not concerned about cells from different segments being too close to one another.  I used a default value of 13 cells -- number should be odd if you want a
square neighborhood.
   - identify the mask grid (rfcmask)

The program creates a grid with ones in the stream and zeros on the land surface (I.e. "rivone1" this grid is useful for display purposes in Step 7) and a point shapefile (i.e. "probzo1.shp") which identifies the locations of potential problems.  It is not known for sure if there are any problems at these points until step 7.
Rivers with a relatively low drainage area threshold.

7.  Manually check and edit the flow direction grid derived by prepro2.aml.

This is the point at which manual checking and editing of the flow direction grid is required. There are "trick" methods for editing grids in ArcView, but I have found these methods relatively slow and tedious. I've also found it difficult to do editing quickly when
using Arc/Info GridTools manually. Therefore, I have written an AML (editgrid.aml) that makes editing the flow direction grids relatively quick and painless. The feature that makes this AML so nice is that the "griddirection" command is used to draw arrows showing the flow directions. This makes it easy to see where flowdirection grid edits may be needed.

Before using editgrid.aml, the shapefile probzo1.shp must be converted to an Arc/Info point coverage and a grid which is the base2 logarithm  (This is needed to use the griddirection command -- see Arc/Info Help for more explanation) of the flow direction grid must be created.

Arc: shapearc probzo1 probzo1
Grid: fdb2 = log2 ( fdmod )
Grid: fdb2i = int ( fdb2 )    /* convert to integer grid and delete old grid
Grid:  kill fdb2 all

The header of editgrid.aml must be changed to be consistent with the input file names being used.  The procedure for editing grid values may appear tedious at first, but it becomes fairly easy once you get the hang of it.  When the editgrid.aml program is started, the user is prompted to type a record number to start with.  This means a record number in the problem points coverage.  This option is helpful if you want to check a portion of the points, take a break and come back to check the rest.

Before running editgrid.aml, type the command
Grid: gridedit edit fdb2i
For some reason, this command does not always work reliably when executed within the AML.

For each potential problem point, gridded, streams are displayed in white, flow direction arrows are displayed in red and digitized (RF1) streams are displayed in blue.  The potential problem points are labelled and numbered with green.  The display automaticall defaults to an extent that is 10 cells on each side of the problem point in question.  Since this might turn out to be too close to understand the overall flow picture, the user is first prompted with the question:

"Resize window? (9 moves to edit options):"

The user may type a number, representing the number of cells to define on each side of the cell extent.  (20 is interesting, 50 is getting a bit big)  Typing 9 moves the user to the next option:

"Enter edit value (9 when finished):"

The edit values to consider are (0 = E, 1=SE, 2=S, 3=SW, 4=W, 5=NW, 6=N, 7=NE).

 If you quit the program normally, your changes will be saved and the arrows you see on the screen will reflect your changes when you rerun the program.  If you exit out of the AML due to a mis-typed key stroke and a subsequent error message, this is no big deal.  Just restart the AML and start with the last record you had previously worked on.  You should be able to scroll up in your terminal window to find the last record in the problem points shapefile that you looked at before exiting.

If you exit out of AML due to an error, you may get the following message when attempting to restart the AML.
"Cursor OUT_CUR already declared"
AML MESSAGE - Stopping execution of AML file due to ERROR condition
To correct this problem, type the following and then restart editgrid.aml.

Grid: cursor out_cur remove

Sometimes a cluster of problem points will exist in the same area.  Flow direction corrections only need to be made once for these areas.  Part of the reason that this program is so inflexible is that AML is very quirky and it sometimes takes a while to debug.  Avenue is far superior.

(Good example of the types of problems encountered -- MARFC, point 40
Why problems -- MARFC, 499   9)

Now that a good flow direction grid has been created, convert fdb2i back into the 1, 2, 4, 8, 16, . . . . codes that Arc/Info can understand.

Grid: fdr = exp2 ( fdb2i )
Grid: fd = int ( fdr )
Grid: kill fdr all



Setting Up and Checking Regression Database Delivered by MTU

To enable ThreshR Scripts work in all RFCs and make different types of computations depending on the location of interest, strict ThreshR database specifications must be met.  Michigan Tech has collected and supplied nearly all of the required data sources; however, there are numerous checks and modifications that must be made to this database in order for the ThreshR software to run correctly.  This is a setup cost that is necessary due to the complexity of the database, and the fact that the database specifications have evolved over time.  There may have been some way to reduce this with more hands on interaction with the contractor.  Up until now, I have not had time to document some of the more tedious issues that I have identified.  The checks and modifications described here must be made for each RFC.

Tasks required to complete ThreshR setup:

- Compute flow accumulation from the edited flow direction grid (output file should be named fa).
- Compute flow length from the edited flow direction grid (output file should be named fld).
-  Delineate the RFC boundary (call this rfcbound.shp) using ArcView and clicking on an outlet point.

- Create a point file of HRAP center points called (hrappts.shp).  This file should cover the entire RFC.  The extension coord.avx can be used to create this file.  With a View active, click on HRAP --> Create HRAP Center points with a View active and choose to create a new point coverage in geographic coordinates based on HRAP extent.  Specifying HRAP extent requires the user to specify the lower left HRAPX and HRAPY and the number of columns and rows.  To determine the appropriate geographic extent for an RFC, the program xmrgtoasc.c can be run on an existing gridded threshold runoff file.  The location of the gridded runoff files for ABRFC on the HRL-NHDRs is /nonawips/usr/apps/ffg/files/abrfc/grro. More information on coord.avx and xmrgtoasc.c.

- Rename "elevpm" (created by prepro1.aml) to "dem" using the Arc RENAME command.

- Make sure the slope grid is available (not delivered with the ThreshR database in Dec. 1999).  If not, derive a slope Grid using the ArcView Spatial Analyst function Surface--> Derive slope.  Make sure the Analysis--> Properties extent and grid size are set to same as "dem" before doing this.  The Derive Slope function returns slope in degrees.  To get the slope in percent, use Analysis-->Map Calculator to take the Tangent of the slope grid.  The ArcView "Tan" function assumes input units are radians, so degrees must be converted to radians before applying the Tan function.  The name required for the slope grid by the ThreshR software is "slopeff".  Determine the file source name for the output of the tangent Map Calculation by choosing Theme--> Properties.  Copy the source Grid to "slopeff" by using the File--> Manage Data Sources.

-  (Keep this for now.)  Make sure that all source file names conform to an 8.3 naming conventions.  Although neither UNIX systems or modern PC operating systems (e.g. Windows 98) are limited to this filename length, it is desirable to store data on a CD-ROM media which can be read from multiple platforms.  In my experience extrac characters in file names that don't conform to the 8.3. naming convention standard become garbled on our HP UNIX systems.  Thus, to make the data files more universally usable, all data files used by ThreshR should conform to the DOS 8.3 naming convention.

-  The statekey.shp and regions.shp files were not delivered with all of the correct fields.  Look at the files from ABRFC or MBRFC as models to add the correct fields to the attribute tables for these Shapefiles.  When the Region shapes and State shapes for different regions were combined, some polygon slivers were created at the intersection of shape boundaries.  To clean up the database, these slivers should be merged with the approprate larger neighbors.  Take a look at the AML mergereg.aml under the CBRFC directory to see how this can be easily done.
aaa.mergethemes; regions.shp should have reg_name (30 char), reg_num (2 int), abbr (2 char) To get stuff from the Tape -- see the file /fs/gis/threshr/cbrfc/tar.txt for an example.
regequat.txt -- ulimit = char

-  Revise the parcode.txt and regequat.txt using the files from ABRFC and MBRFC as models.  These files need to be converted to dBase format.  The reason for this change is that dBase files can be edited more easily within ArcView.  The original choice of using .txt files was because dBase files have limitations on field names; however, this can be alleviated in the future by using aliases within ArcView.  Field names in the original parcode.txt and regequat.txt table designs may need to be shortened.  It is probably easiest to make the field name changes first using a text editor (e.g. Nedit) and then convert the files to dBase format so that they can be edited directly within ArcView.  In the new version of ThreshR, these files must be named "parcode.dbf" and "regequat.dbf.

-  Check all of the numbers in regequat.dbf against USGS Report 94-4002, and be sure to check the "Errata Sheet."   The region numbers in regions.shp should be checked to see if they are consistent with the region numbers in regequat.dbf.

-  A critical field in parcode.dbf is the "conversion" field.  This field is used to tell the program whether or not a units conversion  and/or adjustment factor on the source data is required to calculate peak flows correctly.  The codes "a","s","m", and "d" are used to indicate add, subtract, multiply and divide respectively.  A number indicating a multiplicative or additive factor must follow each of these codes, separated by an underscore.  Multiple operations can be made on the same parameter by separating operations with an underscore.  Example: the conversion required when computing mean basin elevation in New Mexico (ELV_NM) is "m_3.281_d_1000" which means multiply the parameter value by 3.281 and divide by 1000.  This converts mean basin elevation from meters to thousands of ft.  The conversion codes are interpreted and applied by the script "threshr.qxparams."  A checker using USGS Report 94-4002 must be careful because subtractions are sometimes reported in the text, but not reflected in the printed equations and sometimes they are reported in the printed equations but not stated in the text.

-  An important field in regequat.dbf is the "special" field.  The "special" field contains the code 0 if the corresponding regression equations conform to the standard format (see Functionality description).  For a particular state or region with a unique form of the regression equation, the special function code must be a unique integer not only within that RFC but across the entire United States.

-  For specialized data layers in individual states in vector format, convert all of the Arc/Info Coverages into Shapefiles and eliminate unnecessary fields.  This is done because there is no need to keep two types of vector data in the ThreshR database.  Make sure that there is a field in the Shapefile that contains the parameter code that is represented by that layer.  For example, in Kansas, there is a polygon Shapefile that specifies soil permeability.  The code for this attribute in parcode.dbf is SLPM_KS, so the field name containing the actual permeability values must have this name.  For the case of Minnesota, where the parameter of interest is percent streams and ponds, the parameter code is STRP_MN.  The field named strp_mn in the streams and ponds shapefile strp_mn.shp contains the value 100 indicating that any area inside these polygons is 100 percent water.

-  Several problems requiring fixes were found in setting up the database for ABRFC and MBRFC.  To give some more examples of things to look for, here are some of the example problems found.

parcode.txt was missing parameters for Wyoming and Montana

The "params" field in the statekey shapefile was missing for MBRFC.

BASL was listed as a parameter for Wyoming in regequat.txt, but this is incorrect because there is another BASL parameter that has different units.

Some parameter names in parcode.dbf were found to be inconsistent with parameter names in reqequat.dbf.

Some of the GIS data filenames were inconsistent with filenames listed in parcode.dbf (Example: ro_mn and mar_mn).

Some information in USGS 94-4002 was not interpreted correctly.  (Example South Dakota Sand Hills:  "For streams in the Sand Hills area, use 0.4 times the discharges given by the Western Region.")  Thus, equations for the Sand Hills region can be included in the regequat.dbf table.

Precipitation data for Montana has the wrong units.
 

CBRFC

Multiplied forest percent in Idaho (frp_id) by 100.

Precipitation data for california have the wrong units -- should be in inches.  Currently assuming this data set is from PRISM giving the units in hundredths of mm.  

Exponent for NM Nothwest Plateau Q50 incorrect.Also NM southwest mountain, Q5
 

Notes on regression equations for individual states:
 
Iowa:  The Grid "stral_ia" is incorrect.  As a temporary solution, baseline runs for Q2 are made using the regression equations from Carpenter and Georgakakos, 1993 (p. 20) is used.  A linefile of first order streams in Iowa is needed.

Colorado:  Have not yet included the equations for the Eastern Colorado Plains

Texas:  The playa area in the panhandle where regression equations do not apply  is included in regions 3 and 4 on the region map provided in the regression database.

Nebraska:  Using the database available, no distinction can be made between total drainage area and contributing drainage area.  Solving this problem may have to wait until NED comes out.

Region names are totally wrong in Wyoming
 
 
Main Link Categories:
Home | OHD | NWS