National Weather Service United States Department of Commerce
78th Annual AMS Meeting
Phoenix, Arizona
January 1998


RECENT DATABASE DEVELOPMENTS AT THE NATIONAL WEATHER SERVICE OFFICE OF HYDROLOGY


Jon Roe
Geoff Bonnin
Mark Glaudemans
Charles Gobs
Paul Tilles
Office of Hydrology
NOAA/National Weather Service
1325 East-West Highway
Silver Spring, Maryland 20910

1.   INTRODUCTION

     The National Weather Service (NWS) Office of Hydrology has initiated an effort to unify all operational
and reference data used by all hydrologic applications under one common database, the Integrated Hydrologic
Forecast System (IHFS) Database or IHFS_DB.  The intent is that the IHFS_DB provides a common database
framework for hydrologic applications at all Weather Forecast Offices (WFOs) and all River Forecast Centers
(RFCs).  This common framework will allow field offices and the Office of Hydrology to develop hydrologic
applications in a coordinated way that promotes application and information sharing across offices.  The
framework also promotes incorporation of field-developed applications into the national baseline delivered by
the Automated Weather Interactive Processing System (AWIPS).

     The first released version of the IHFS_DB is the result of the merge of two independent but overlapping
predecessor relational databases, the RFC_DB and the WHFS_DB.  The RFC_DB evolved from work in the
early 1980s at RFCs to store operational data in a commercial database.  The WHFS_DB is the relational
database that supported the WFO Hydrologic Forecast System (WHFS) at WFOs.  The WHFS_DB was
developed on an independent path in the early 1990s.  Since the early 1980s at RFCs and since 1994 at
WFOs, versions of the predecessor relational databases have been in continuous operational use.  The
evolution of this technology has therefore been heavily influenced and proven by the functional needs of
operational forecasters and by the reliability and performance needs of the operational environment.

     The IHFS_DB provides an enterprise rather than an application specific approach to data management. 
The merge of the predecessor databases has set the stage for additional significant change such as integrated
support for hydrologic modeling activities that today rely on technology from the mainframe batch processing
era.  The IHFS_DB will ultimately support the full range of functionality required to conduct the hydrologic
mission of the NWS at all NWS field sites.

2.   IHFS DATA MANAGEMENT VISION

     The Office of Hydrology has the vision of unifying all operational and reference data used by all hydrologic
applications under one common database.  IHFS_DB will be the single logical repository at the site (forecast
office) level.   Each element of functionality at a site will use this repository and the knowledge of what data
is needed will reside exclusively within the function.  This avoids the problems of staging data in many
redundant storage areas.  It encapsulates the knowledge of data needs within the domain of each element of
functionality and also provides a mechanism for integration between the diverse elements of functionality
provided by the site forecasting system (Figure 1).  Furthermore, the IHFS_DB will provide a common
database framework for hydrologic applications at all WFOs and all RFCs.  This common framework will allow
both field offices and the Office of Hydrology to develop hydrologic applications for the national baseline and
for unique local use in a coordinated way that promotes application and information sharing across offices.



                 Figure 1.  The Site Forecasting System Uses a Single Logical Site Level Database




3.   THE IHFS DATABASE

     The first released version of the IHFS_DB  is the result of the merge of two independent but overlapping
predecessor relational databases, the RFC_DB and the WHFS_DB discussed in Sections 4 and 5 following.

     Delivery of services by RFCs and WFOs is an ongoing, mission critical priority for the NWS.  The support
provided by the Office of Hydrology recognizes this priority by ensuring that system improvements are
delivered in a way that does not disrupt ongoing delivery of services, and by ensuring that the life-cycle costs
of application software are properly considered.  Based on these considerations, we have chosen to implement
our long term data management vision in a series of shorter duration steps rather than taking a larger amount
of time to produce a single "big-bang" delivery.  The results of the first group of steps (referred to as "the
merge") are discussed in Section 6, and in Section 7, we provide a look ahead to the future.

4.   THE RFC DATABASE

4.1  History of Evolution

     By early 1983, data encoded in the newly developed Standard Hydrologic Exchange Format (SHEF) were
passing across NWS communications circuits, and for the first time, being automatically decoded and posted
to the RFC Gateway systems at RFCs.  The next step was to provide a mechanism for automated transfer of
the data to the NWS River Forecast System (NWSRFS).

     RFC Gateway systems were based on a software system (DATACOL) developed at the California-Nevada
RFC and modified at the Missouri Basin RFC.  DATACOL ran on local minicomputers whereas NWSRFS was
run in batch mode at the National Oceanic and Atmospheric Administration (NOAA) Central Computing Facility
(NCCF) using Remote Job Entry Systems.  Recognition of the problems associated with the separation of data
from the forecast system and the differences between RFC Gateway and NCCF computing environments led
to the initiation of a series of projects to bring the data and the forecast systems together.

     The first steps in this direction were made with the development of a database using a commercial
relational database product on RFC-based mini-computers and the porting of NWSRFS to the mini-computer
environment.  By 1985, the system was being used operationally, and data were arriving at the RFC, being
posted in the relational database tables for review by forecasters and being transferred to NWSRFS using
automated procedures.  The initial system was replicated and implemented at several RFCs so that by the late
1980s a considerable body of operational experience had been developed and the system had been enhanced
and tuned by its users.

     By 1992, the system had been transformed both functionally and in terms of computing environment. 
SQL compliant relational database management products had been adopted and the computing platform was
established as a site network of UNIX-based workstations.  The major functional transformations involved
significant enhancements and additions to  the physical database tables.  These enhancements were made
to support the Weather Surveillance Radar - 1988 Doppler (WSR-88D) Precipitation Processing Subsystem
and more automated and operationally effective transfer of data to NWSRFS.  These systems were again
replicated and used for operations at most RFCs.

4.2  Lessons Learned From Operational Experience

     Since the early 1980s at RFCs, versions of RFC_DB have been in continuous operational use.  The
evolution of this technology has therefore been molded by and proven in the operational environment.  The
development has responded to the reliability and performance needs of real forecasters.  While significant
portions of the system were being maintained and enhanced by the Office of Hydrology, individual RFCs were
also making their own enhancements and sharing them with each other.

     This evolution amounts to approximately 15 years of testing and improvement in the real-world,
operational environment of a range of diverse RFCs.  Throughout that period, RFCs have relied on these
systems and have continued to deliver critical services (including during such extreme events as the Great
Flood of 1993).  A number of lessons have been learned from this experience:

-    Commercial relational database products can be used to provide the reliability and performance necessary
     for operational forecasting.

-    With the appropriate mix of skills, scientific software with a FORTRAN legacy can be effectively wedded
     with more recent programming languages, tools, environments, and architectures.

-    Evolution of database structures by addition of functionality at the physical level without a guiding data
     architecture framework leads to designs that become more complex and difficult to maintain and enhance
     over time.  This is a natural phenomenon for software systems as well, not just physical data structures.

-    To provide an effective data management environment over the long term, data needs must be considered
     from the point of view of the enterprise and the various elements of functionality performed within the
     enterprise, rather than viewing the data needs in a "stove-pipe" manner from the point of view of
     independent applications or a narrow range of functionality..5.  


THE WHFS DATABASE

     The second of the two relational databases that evolved to form the IHFS_DB is the WHFS_DB.  Work
on the WHFS_DB commenced in 1993.  Because of its later design, which built upon lessons learned from
the RFC_DB, many trademark conceptual aspects of the WHFS_DB have endured into the design of the
IHFS_DB, as is discussed in Section 6.

5.1  History of Evolution

     The WHFS_DB is the relational database that supported the WHFS at WFOs.  The beginnings of the
WHFS_DB can be traced to the development of the Service Hydrologist Information Management System
(SHIMS) database and user interface software (Office of Hydrology, 1997).  SHIMS was initiated in 1986 by
NWS Central Region staff, using Rbase 5000 database software.  By 1990, the SHIMS database definition
had stabilized and the database was ported to the Paradox database software.  Deployment of SHIMS at all
NWS WFOs began in 1992, when it was decided that it would serve as the basis for the initial AWIPS
hydrologic database at WFOs.

     SHIMS was established to provide an automated system for storing and managing the information used
by the Service Hydrologist and NWS forecasters.  The Service Hydrologist, or in some offices the hydrologic
focal point, is responsible for the hydrologic program at WFOs, which work together with the RFCs to provide
the operational hydrologic services of the NWS.  SHIMS data consists of information contained in NWS Form
E-19, which is the official form describing a hydrometeorological data collection and/or forecast location, and
Form B-44, which contains location-specific information pertinent to the cooperative observer program.  The
SHIMS databases were populated at each site using a custom Paradox user interface operating on personal
computers from the information on E-19 and B-44 standard forms.

     The E-19 information comprises most of the SHIMS data definition, and consists of reference and
historical information of a static nature.  The SHIMS database does not contain operational data such as
current river stage and precipitation reports, but rather it supports operations by containing location reference
information needed in times of weather events and for general day-to-day inquiries.

     In late 1993, with the advent of the WHFS project, attention turned to the relational database that would
serve as the database for all application software.  The importance of a well-structured database became
critical as it was decided that all operational and reference data would be stored in the relational database. 
The first incarnation of WHFS_DB relied heavily on the SHIMS design for the static reference data.  However,
a fresh approach incorporating lessons learned from RFC_DB was used for non-static data.

     The RFC_DB identified operational data as either observed or forecast data.  The WHFS_DB used this
same concept but instead of having application pre-processors extract data from the tables associated with
these two categories, it used an additional level of data storage by denormalizing all observed and forecast
data to a set of additional tables stratified by physical element.  Therefore, there was a table for observed stage
values, a table for observed precipitation, and tables for other physical elements.  The same was true for
forecast data.  This resulted in a disaggregation of the data for a given station as it was now spread out in
multiple tables.  However, it significantly improved database performance.  The WHFS applications are
typically very interactive and require a fast response time after the arrival of data.

     The division of the operational data by physical element is one of the basic concepts of the WHFS_DB,
and is present in the IHFS_DB, as discussed later.

5.2  Lessons Learned From Operational Experience

     Since 1994, versions of WHFS_DB have been in continuous operational use at WFOs, and as with the
RFC_DB, a number of lessons have been learned in the course of taking the database from the blackboard
to the operations floor of a weather office:

-    Designs resulting from perfectly logical interpretations of the database attributes resulted in unacceptable
     performance in some cases.  In particular, the division of observed data by physical element resulted in
     very large and heavily used tables for two primary parameters - river stage and precipitation.  Queries for
     multiple station's data from these tables were slow.  For stage data, a redundant table containing only the
     most recent data for a location was added.  For precipitation, a pre-processor that populated a
     denormalized table of precipitation accumulation for given durations was implemented.  These two
     changes greatly improved performance.

-    The other observed physical elements and the forecast data do not comprise sufficiently large volumes
     for performance to be an issue.

-    There is overlap between long and short term forecast time series of river stage for a location - both have
     the same creation date but different issuance times.  Additional processing was added to merge them into
     one virtual time series by abutting the time series in sequence, based on their issuance times.

-    Tables listing valid values (e.g. valid lists), such as lists of WFO identifiers, counties, states, etc., can be
     used efficiently and effectively to provide referential integrity via foreign key relationships.

6.   THE DATABASE MERGE

     In early 1996 the Office of Hydrology decided to pursue the consolidation of the two separate but
overlapping relational databases discussed earlier in sections 4 and 5, the RFC_DB and the WHFS_DB.  It
was decided to treat this consolidation as the first concrete step toward the creation of the IHFS database that
would unify all data structures that serve all hydrologic applications supported and developed by the Office
of Hydrology.

6.1  Overlapping Databases

     The overlap between the two relational databases was concentrated around the storage of dynamic
operational data (i.e., point station observations and forecasts) and not station reference data.  The only
significant overlap of reference data was a table in both databases that described locations of reporting
stations (e.g., river stage stations and precipitation gages).  Therefore, most of the work of merging the two
databases occurred in refining the storage mechanism for operational data in a way that is useful for both RFC
and WFO applications.  When the database merge was proposed, the WHFS_DB contained about 95 tables
and the RFC_DB contained about 25 tables.  There were 5 or 6 key tables that were directly overlapping
between the two databases.  Also, the RFC_DB database contained denormalized application-centric tables
of operational data that did not overlap in structure with the WHFS_DB.  The RFC_DB was augmented with
several classes of host files (i.e., files maintained by an operating system external to a relational database)
that contained data that were excellent candidates for inclusion into the merged relational database.

6.2  Merge Approach

     The proposed merge of the two relational databases was approached from the perspective of using the
WHFS_DB as the baseline and extending it to satisfy all of the requirements previously satisfied by the
RFC_DB.  This approach, rather than using the RFC_DB as the baseline or starting from a clean sheet of
paper, was chosen for several reasons.  First, the method used by the WHFS_DB to store operational point
data was less application-centric and more data-centric, therefore easier to extend for future data
requirements.  Second, the WHFS_DB had the advantage of possessing a more recent design that took
advantage of lessons learned from the experience of using the RFC_DB in the field.  Third, the WHFS_DB
contained a much broader base of reference data entities and attributes that had evolved from the earlier
SHIMS database design.  Lastly, the years of successful operational experience gained from field deployments
of both the RFC_DB at RFCs and the WHFS_DB at WFOs negated the need to start from scratch.  Just
because the WHFS_DB was used as the baseline for the merge, it was not assumed that its structures were
inviolate.  Those structures were scrutinized from an architectural viewpoint at the same time that the
RFC_DB's requirements were inserted into the design of the merged database.

     Prior to the merge of the two predecessor relational databases, their designs were captured via reverse
engineering into a Computer-Aided Software Engineering (CASE) tool that supported entity-relationship data
modeling according to Chen (1977).  The reverse engineering resulted in a series of entity-relationship
diagrams (ERDs) and a comprehensive data dictionary that described all entities (i.e., tables and files) and
all attributes (i.e., columns and fields).  The data modeling required to merge the two databases proceeded
from this initial set of CASE tool information.  Today, the design (i.e., ERDs and data dictionary) of the merged
IHFS_DB is maintained and extended with the CASE tool (Office of Hydrology, 1997a).

6.3  Merge Results

     Each RFC_DB entity (data table or file) was examined and compared against existing structures within
the WHFS_DB.  For tables that clearly overlapped in function and form between the two databases (e.g., the
station location table and the radar location table), a single common table structure was devised that
accommodated WFO functionality and RFC functionality.  The RFC_DB included several host data files that
contained reference information (e.g., application control parameters).  These files were re-implemented as
relational database tables.  RFC_DB grids (WSR-88D radar products and precipitation analysis grids) were
brought into the merged database by defining relational tables to hold grid attributes, one of which points to
a host file that holds the grid data values.  Most of the application-centric structures that held operational point
data for the RFC_DB were eliminated in favor of more data-centric structures already in place in the
WHFS_DB.

     Figure 2 illustrates the principal processes and data stores of the merged database.  Encoded data
messages are decoded and posted to the two main tables for observations (i.e., ObsValue) and forecasts (i.e.,
FcstValue).  Data from unknown stations (i.e., not in the Location table) are posted to the UnkStnValue table. 
The user can choose to store the encoded messages in the TextProduct table.  Observations and forecasts
are then denormalized via database triggers and stored procedures into a series of tables stratified by physical
element type.  The IngestFilter table controls which physical parameters are allowed to be propagated into the
denormalized parameter-specific data tables for each station.  The operational data denormalization does not
result in completely duplicated data since data are typically stored for about 30 days in the parameter-specific
tables and only for about one day in the two main tables.


Figure 2. Principal Processes And Flows of the Merged Database
For the operational data tables of the target merged database, we took this opportunity to recast all of the date/time attributes so that the change to the year 2000 will be properly handled by the database. The structure in the WHFS_DB that held data ingest control parameters was eliminated in favor of the more robust and flexible corresponding structure found in the RFC_DB. The use of foreign key relationships to maintain referential integrity was continued into the design of the merged database. Most data attributes that take on values from specific valid lists (e.g., physical element code) are connected to lookup tables via foreign keys to make sure that invalid values are not permitted for those data attributes. Figure 3 shows an example of how values in columns of the LocRangeCheck table are being controlled by foreign keys which point to valid lists of physical element name codes, physical element durations, and station location identifiers in other tables.
Figure 3. Example of the Use of Foreign Keys and Valid Lists
6.4 Implementation Strategy Since the RFC_DB and the WHFS_DB are in operational use at RFCs and WFOs respectively, a strategy was needed to achieve the unified database in a reasonable amount of time while minimizing disruption to field operations. In accordance with the merge approach described above, the field impact was seen to be far more extensive at RFCs than at WFOs, partly because the physical changes were greater for RFC_DB than for WHFS_DB, and partly because the RFCs have developed many local applications over the years that depended on the structure of the RFC_DB. A plan was developed to divide the task of migrating the RFC_DB structures into the unified database into several "themes" that could be incrementally released to the RFCs. Initially four themes were identified, with a schedule that spanned a little less than a year. The merge was actually accomplished in three themes that took a bit more than a year (from early summer 1996 to mid-summer 1997). By August 1997 a single unified relational database (i.e., IHFS_DB Version 1.0) was completed and delivered to the NWS AWIPS Program for deployment to field offices with AWIPS Build 3.1. It is now being beta tested at a couple of field offices prior to its general release through AWIPS this fall and winter. Along with deployment of the new IHFS database, we have provided software to convert existing data at WFOs (from the old WHFS_DB) and RFCs (from the old RFC_DB) to the new database. By the end of September 1997, the Missouri Basin RFC had successfully adjusted and tested the critical operational core of their local applications with the new IHFS_DB. The other RFCs still have work to do to adjust their local applications, but some of that work has already been done due to the phased changeover from the RFC_DB to the IHFS_DB. At WFOs, migration from the old database to the new database requires minimal effort because they have little or no local applications developed against it. 7. IHFS DATABASE FUTURE GROWTH With the delivery of IHFS_DB V1.0, we have established a single, common set of database structures and procedures for those hydrologic systems relying on relational technology. We are positioned to move forward from this common baseline, by extending it in the areas discussed below. 7.1 Functional Extension IHFS_DB V1.0 supports all of the hydrologic functionality supported by WHFS_DB and RFC_DB. However there still remains a significant body of hydrologic modeling code supported by custom built host file data repositories developed in the mainframe era. In addition, new functionality must be added to the existing operational systems to improve our mission performance and make effective use of the organizational, procedural, scientific, and technological changes being introduced as part of NWS modernization. NWSRFS had its genesis as a hydrologic modeling system developed on batch mainframe systems in the 1970s. It used FORTRAN for managing data in random access, indexed file systems designed specifically for the purpose. While the software architecture of the NWSRFS data system was well thought out at the time, it suffers from two major problems: First, it uses custom software that is relatively expensive to maintain and that is available today as part of more cost effective commercial data management systems; Secondly, the physical structure was designed specifically for speed in a narrow domain of hydrologic modeling. Not only are the operational requirements expanding beyond the purely modeling domain, but improvements are being made in modeling science. The old physical design of the NWSRFS data system is constraining our ability to add new science and new functionality to the forecasting system. 7.2 Software Engineering Issues The application code that relies on relational technology is not well insulated from the details of the physical implementation of the database. This places us in the position of having to modify application code each time we make modifications to the physical structure of the database. This is not so with the older NWSRFS code where a concerted effort was made to provide separation between application and data management code. We plan to develop an insulating layer that hides the physical details of IHFS_DB from the application layer. This layer will allow applications to refer to data in the database based on natural data attributes. We will be able to treat the data as "natural data objects" rather than as artifacts of a particular physical implementation. Once this layer is developed, we plan to insert it into NWSRFS, thereby providing the mechanism to convert NWSRFS from reliance on the old host file data repositories to reliance on IHFS_DB. 7.3 Data Engineering Issues The evolution of RFC_DB and WHFS_DB involved development at the physical level of database design. While some architectural consideration made its way into the design, it was based more on the skill and experience of the developers and less on a deliberate approach to software engineering and data engineering in particular. The merge allowed a greater level of architectural consideration; however it was focused more on producing a single common baseline than on a rigorous architectural approach. We are currently reexamining the data domain of the NWS hydrologic enterprise. We have developed an initial conceptual model of the data and are extending it to cover additional territory (referred to in Section 7.1 above). An object-oriented approach for the analysis is being used, with plans to follow through to physical design using object-oriented engineering. We also expect that we will move the physical tool from purely relational technology to commercially available object-relational technology. 8. SUMMARY The release of the IHFS_DB, resulting from the merge of two independent but overlapping predecessor relational databases, was the first step in a longer term effort to unify all operational and reference data used by all hydrologic applications, under one common database design. The development of IHFS_DB is rooted in extensive operational experience and feedback from forecasters. The stage has been set for integrating support for hydrologic modeling activities that today rely on technology from the mainframe batch processing era. The IHFS_DB will ultimately support the full range of functionality required for conduct of the entire hydrologic mission of the NWS at all NWS field sites. 9. REFERENCES Chen, Peter, 1977: The Entity-Relationship Approach to Logical Database Design. Q E D Publishing Co. Office of Hydrology, 1997: The Complete SHIMS Manual, Version 4.11. National Weather Service (Office of Hydrology, WFO Louisville, WFO Indianapolis, and Southern Region Hydrologic Services Division), April 1997. Office of Hydrology, 1997a: IHFS_DB Version 1.0 Database Design Model (maintained in CASE tool). National Weather Service Office of Hydrology, August 1997.

As required by 17 U.S.C. 403, third parties producing works consisting predominantly of the material appearing in NWS Web pages must provide notice with such subsequently produced work(s) identifying such incorporated material and stating that such material is not subject to copyright protection.

Return to HRL Publications

Main Link Categories:
Home | OHD | NWS