OHD/HL - papers: AMS98-3

RECENT DATABASE DEVELOPMENTS AT THE NATIONAL WEATHER SERVICE OFFICE OF HYDROLOGY

Jon Roe

Geoff Bonnin

Mark Glaudemans

Charles Gobs

Paul Tilles

Office of Hydrology
NOAA/National Weather Service
1325 East-West Highway
Silver Spring, Maryland 20910

1. INTRODUCTION The National Weather Service (NWS) Office of Hydrology has initiated an effort to unify all operational and reference data used by all hydrologic applications under one common database, the Integrated Hydrologic Forecast System (IHFS) Database or IHFS_DB. The intent is that the IHFS_DB provides a common database framework for hydrologic applications at all Weather Forecast Offices (WFOs) and all River Forecast Centers (RFCs). This common framework will allow field offices and the Office of Hydrology to develop hydrologic applications in a coordinated way that promotes application and information sharing across offices. The framework also promotes incorporation of field-developed applications into the national baseline delivered by the Automated Weather Interactive Processing System (AWIPS). The first released version of the IHFS_DB is the result of the merge of two independent but overlapping predecessor relational databases, the RFC_DB and the WHFS_DB. The RFC_DB evolved from work in the early 1980s at RFCs to store operational data in a commercial database. The WHFS_DB is the relational database that supported the WFO Hydrologic Forecast System (WHFS) at WFOs. The WHFS_DB was developed on an independent path in the early 1990s. Since the early 1980s at RFCs and since 1994 at WFOs, versions of the predecessor relational databases have been in continuous operational use. The evolution of this technology has therefore been heavily influenced and proven by the functional needs of operational forecasters and by the reliability and performance needs of the operational environment. The IHFS_DB provides an enterprise rather than an application specific approach to data management. The merge of the predecessor databases has set the stage for additional significant change such as integrated support for hydrologic modeling activities that today rely on technology from the mainframe batch processing era. The IHFS_DB will ultimately support the full range of functionality required to conduct the hydrologic mission of the NWS at all NWS field sites. 2. IHFS DATA MANAGEMENT VISION The Office of Hydrology has the vision of unifying all operational and reference data used by all hydrologic applications under one common database. IHFS_DB will be the single logical repository at the site (forecast office) level. Each element of functionality at a site will use this repository and the knowledge of what data is needed will reside exclusively within the function. This avoids the problems of staging data in many redundant storage areas. It encapsulates the knowledge of data needs within the domain of each element of functionality and also provides a mechanism for integration between the diverse elements of functionality provided by the site forecasting system (Figure 1). Furthermore, the IHFS_DB will provide a common database framework for hydrologic applications at all WFOs and all RFCs. This common framework will allow both field offices and the Office of Hydrology to develop hydrologic applications for the national baseline and for unique local use in a coordinated way that promotes application and information sharing across offices.

Figure 1. The Site Forecasting System Uses a Single Logical Site Level Database 3. THE IHFS DATABASE The first released version of the IHFS_DB is the result of the merge of two independent but overlapping predecessor relational databases, the RFC_DB and the WHFS_DB discussed in Sections 4 and 5 following. Delivery of services by RFCs and WFOs is an ongoing, mission critical priority for the NWS. The support provided by the Office of Hydrology recognizes this priority by ensuring that system improvements are delivered in a way that does not disrupt ongoing delivery of services, and by ensuring that the life-cycle costs of application software are properly considered. Based on these considerations, we have chosen to implement our long term data management vision in a series of shorter duration steps rather than taking a larger amount of time to produce a single "big-bang" delivery. The results of the first group of steps (referred to as "the merge") are discussed in Section 6, and in Section 7, we provide a look ahead to the future. 4. THE RFC DATABASE 4.1 History of Evolution By early 1983, data encoded in the newly developed Standard Hydrologic Exchange Format (SHEF) were passing across NWS communications circuits, and for the first time, being automatically decoded and posted to the RFC Gateway systems at RFCs. The next step was to provide a mechanism for automated transfer of the data to the NWS River Forecast System (NWSRFS). RFC Gateway systems were based on a software system (DATACOL) developed at the California-Nevada RFC and modified at the Missouri Basin RFC. DATACOL ran on local minicomputers whereas NWSRFS was run in batch mode at the National Oceanic and Atmospheric Administration (NOAA) Central Computing Facility (NCCF) using Remote Job Entry Systems. Recognition of the problems associated with the separation of data from the forecast system and the differences between RFC Gateway and NCCF computing environments led to the initiation of a series of projects to bring the data and the forecast systems together. The first steps in this direction were made with the development of a database using a commercial relational database product on RFC-based mini-computers and the porting of NWSRFS to the mini-computer environment. By 1985, the system was being used operationally, and data were arriving at the RFC, being posted in the relational database tables for review by forecasters and being transferred to NWSRFS using automated procedures. The initial system was replicated and implemented at several RFCs so that by the late 1980s a considerable body of operational experience had been developed and the system had been enhanced and tuned by its users. By 1992, the system had been transformed both functionally and in terms of computing environment. SQL compliant relational database management products had been adopted and the computing platform was established as a site network of UNIX-based workstations. The major functional transformations involved significant enhancements and additions to the physical database tables. These enhancements were made to support the Weather Surveillance Radar - 1988 Doppler (WSR-88D) Precipitation Processing Subsystem and more automated and operationally effective transfer of data to NWSRFS. These systems were again replicated and used for operations at most RFCs. 4.2 Lessons Learned From Operational Experience Since the early 1980s at RFCs, versions of RFC_DB have been in continuous operational use. The evolution of this technology has therefore been molded by and proven in the operational environment. The development has responded to the reliability and performance needs of real forecasters. While significant portions of the system were being maintained and enhanced by the Office of Hydrology, individual RFCs were also making their own enhancements and sharing them with each other. This evolution amounts to approximately 15 years of testing and improvement in the real-world, operational environment of a range of diverse RFCs. Throughout that period, RFCs have relied on these systems and have continued to deliver critical services (including during such extreme events as the Great Flood of 1993). A number of lessons have been learned from this experience: - Commercial relational database products can be used to provide the reliability and performance necessary for operational forecasting. - With the appropriate mix of skills, scientific software with a FORTRAN legacy can be effectively wedded with more recent programming languages, tools, environments, and architectures. - Evolution of database structures by addition of functionality at the physical level without a guiding data architecture framework leads to designs that become more complex and difficult to maintain and enhance over time. This is a natural phenomenon for software systems as well, not just physical data structures. - To provide an effective data management environment over the long term, data needs must be considered from the point of view of the enterprise and the various elements of functionality performed within the enterprise, rather than viewing the data needs in a "stove-pipe" manner from the point of view of independent applications or a narrow range of functionality..5. THE WHFS DATABASE The second of the two relational databases that evolved to form the IHFS_DB is the WHFS_DB. Work on the WHFS_DB commenced in 1993. Because of its later design, which built upon lessons learned from the RFC_DB, many trademark conceptual aspects of the WHFS_DB have endured into the design of the IHFS_DB, as is discussed in Section 6. 5.1 History of Evolution The WHFS_DB is the relational database that supported the WHFS at WFOs. The beginnings of the WHFS_DB can be traced to the development of the Service Hydrologist Information Management System (SHIMS) database and user interface software (Office of Hydrology, 1997). SHIMS was initiated in 1986 by NWS Central Region staff, using Rbase 5000 database software. By 1990, the SHIMS database definition had stabilized and the database was ported to the Paradox database software. Deployment of SHIMS at all NWS WFOs began in 1992, when it was decided that it would serve as the basis for the initial AWIPS hydrologic database at WFOs. SHIMS was established to provide an automated system for storing and managing the information used by the Service Hydrologist and NWS forecasters. The Service Hydrologist, or in some offices the hydrologic focal point, is responsible for the hydrologic program at WFOs, which work together with the RFCs to provide the operational hydrologic services of the NWS. SHIMS data consists of information contained in NWS Form E-19, which is the official form describing a hydrometeorological data collection and/or forecast location, and Form B-44, which contains location-specific information pertinent to the cooperative observer program. The SHIMS databases were populated at each site using a custom Paradox user interface operating on personal computers from the information on E-19 and B-44 standard forms. The E-19 information comprises most of the SHIMS data definition, and consists of reference and historical information of a static nature. The SHIMS database does not contain operational data such as current river stage and precipitation reports, but rather it supports operations by containing location reference information needed in times of weather events and for general day-to-day inquiries. In late 1993, with the advent of the WHFS project, attention turned to the relational database that would serve as the database for all application software. The importance of a well-structured database became critical as it was decided that all operational and reference data would be stored in the relational database. The first incarnation of WHFS_DB relied heavily on the SHIMS design for the static reference data. However, a fresh approach incorporating lessons learned from RFC_DB was used for non-static data. The RFC_DB identified operational data as either observed or forecast data. The WHFS_DB used this same concept but instead of having application pre-processors extract data from the tables associated with these two categories, it used an additional level of data storage by denormalizing all observed and forecast data to a set of additional tables stratified by physical element. Therefore, there was a table for observed stage values, a table for observed precipitation, and tables for other physical elements. The same was true for forecast data. This resulted in a disaggregation of the data for a given station as it was now spread out in multiple tables. However, it significantly improved database performance. The WHFS applications are typically very interactive and require a fast response time after the arrival of data. The division of the operational data by physical element is one of the basic concepts of the WHFS_DB, and is present in the IHFS_DB, as discussed later. 5.2 Lessons Learned From Operational Experience Since 1994, versions of WHFS_DB have been in continuous operational use at WFOs, and as with the RFC_DB, a number of lessons have been learned in the course of taking the database from the blackboard to the operations floor of a weather office: - Designs resulting from perfectly logical interpretations of the database attributes resulted in unacceptable performance in some cases. In particular, the division of observed data by physical element resulted in very large and heavily used tables for two primary parameters - river stage and precipitation. Queries for multiple station's data from these tables were slow. For stage data, a redundant table containing only the most recent data for a location was added. For precipitation, a pre-processor that populated a denormalized table of precipitation accumulation for given durations was implemented. These two changes greatly improved performance. - The other observed physical elements and the forecast data do not comprise sufficiently large volumes for performance to be an issue. - There is overlap between long and short term forecast time series of river stage for a location - both have the same creation date but different issuance times. Additional processing was added to merge them into one virtual time series by abutting the time series in sequence, based on their issuance times. - Tables listing valid values (e.g. valid lists), such as lists of WFO identifiers, counties, states, etc., can be used efficiently and effectively to provide referential integrity via foreign key relationships. 6. THE DATABASE MERGE In early 1996 the Office of Hydrology decided to pursue the consolidation of the two separate but overlapping relational databases discussed earlier in sections 4 and 5, the RFC_DB and the WHFS_DB. It was decided to treat this consolidation as the first concrete step toward the creation of the IHFS database that would unify all data structures that serve all hydrologic applications supported and developed by the Office of Hydrology. 6.1 Overlapping Databases The overlap between the two relational databases was concentrated around the storage of dynamic operational data (i.e., point station observations and forecasts) and not station reference data. The only significant overlap of reference data was a table in both databases that described locations of reporting stations (e.g., river stage stations and precipitation gages). Therefore, most of the work of merging the two databases occurred in refining the storage mechanism for operational data in a way that is useful for both RFC and WFO applications. When the database merge was proposed, the WHFS_DB contained about 95 tables and the RFC_DB contained about 25 tables. There were 5 or 6 key tables that were directly overlapping between the two databases. Also, the RFC_DB database contained denormalized application-centric tables of operational data that did not overlap in structure with the WHFS_DB. The RFC_DB was augmented with several classes of host files (i.e., files maintained by an operating system external to a relational database) that contained data that were excellent candidates for inclusion into the merged relational database. 6.2 Merge Approach The proposed merge of the two relational databases was approached from the perspective of using the WHFS_DB as the baseline and extending it to satisfy all of the requirements previously satisfied by the RFC_DB. This approach, rather than using the RFC_DB as the baseline or starting from a clean sheet of paper, was chosen for several reasons. First, the method used by the WHFS_DB to store operational point data was less application-centric and more data-centric, therefore easier to extend for future data requirements. Second, the WHFS_DB had the advantage of possessing a more recent design that took advantage of lessons learned from the experience of using the RFC_DB in the field. Third, the WHFS_DB contained a much broader base of reference data entities and attributes that had evolved from the earlier SHIMS database design. Lastly, the years of successful operational experience gained from field deployments of both the RFC_DB at RFCs and the WHFS_DB at WFOs negated the need to start from scratch. Just because the WHFS_DB was used as the baseline for the merge, it was not assumed that its structures were inviolate. Those structures were scrutinized from an architectural viewpoint at the same time that the RFC_DB's requirements were inserted into the design of the merged database. Prior to the merge of the two predecessor relational databases, their designs were captured via reverse engineering into a Computer-Aided Software Engineering (CASE) tool that supported entity-relationship data modeling according to Chen (1977). The reverse engineering resulted in a series of entity-relationship diagrams (ERDs) and a comprehensive data dictionary that described all entities (i.e., tables and files) and all attributes (i.e., columns and fields). The data modeling required to merge the two databases proceeded from this initial set of CASE tool information. Today, the design (i.e., ERDs and data dictionary) of the merged IHFS_DB is maintained and extended with the CASE tool (Office of Hydrology, 1997a). 6.3 Merge Results Each RFC_DB entity (data table or file) was examined and compared against existing structures within the WHFS_DB. For tables that clearly overlapped in function and form between the two databases (e.g., the station location table and the radar location table), a single common table structure was devised that accommodated WFO functionality and RFC functionality. The RFC_DB included several host data files that contained reference information (e.g., application control parameters). These files were re-implemented as relational database tables. RFC_DB grids (WSR-88D radar products and precipitation analysis grids) were brought into the merged database by defining relational tables to hold grid attributes, one of which points to a host file that holds the grid data values. Most of the application-centric structures that held operational point data for the RFC_DB were eliminated in favor of more data-centric structures already in place in the WHFS_DB. Figure 2 illustrates the principal processes and data stores of the merged database. Encoded data messages are decoded and posted to the two main tables for observations (i.e., ObsValue) and forecasts (i.e., FcstValue). Data from unknown stations (i.e., not in the Location table) are posted to the UnkStnValue table. The user can choose to store the encoded messages in the TextProduct table. Observations and forecasts are then denormalized via database triggers and stored procedures into a series of tables stratified by physical element type. The IngestFilter table controls which physical parameters are allowed to be propagated into the denormalized parameter-specific data tables for each station. The operational data denormalization does not result in completely duplicated data since data are typically stored for about 30 days in the parameter-specific tables and only for about one day in the two main tables.

Figure 2. Principal Processes And Flows of the Merged Database For the operational data tables of the target merged database, we took this opportunity to recast all of the date/time attributes so that the change to the year 2000 will be properly handled by the database. The structure in the WHFS_DB that held data ingest control parameters was eliminated in favor of the more robust and flexible corresponding structure found in the RFC_DB. The use of foreign key relationships to maintain referential integrity was continued into the design of the merged database. Most data attributes that take on values from specific valid lists (e.g., physical element code) are connected to lookup tables via foreign keys to make sure that invalid values are not permitted for those data attributes. Figure 3 shows an example of how values in columns of the LocRangeCheck table are being controlled by foreign keys which point to valid lists of physical element name codes, physical element durations, and station location identifiers in other tables.

Figure 3. Example of the Use of Foreign Keys and Valid Lists 6.4 Implementation Strategy Since the RFC_DB and the WHFS_DB are in operational use at RFCs and WFOs respectively, a strategy was needed to achieve the unified database in a reasonable amount of time while minimizing disruption to field operations. In accordance with the merge approach described above, the field impact was seen to be far more extensive at RFCs than at WFOs, partly because the physical changes were greater for RFC_DB than for WHFS_DB, and partly because the RFCs have developed many local applications over the years that depended on the structure of the RFC_DB. A plan was developed to divide the task of migrating the RFC_DB structures into the unified database into several "themes" that could be incrementally released to the RFCs. Initially four themes were identified, with a schedule that spanned a little less than a year. The merge was actually accomplished in three themes that took a bit more than a year (from early summer 1996 to mid-summer 1997). By August 1997 a single unified relational database (i.e., IHFS_DB Version 1.0) was completed and delivered to the NWS AWIPS Program for deployment to field offices with AWIPS Build 3.1. It is now being beta tested at a couple of field offices prior to its general release through AWIPS this fall and winter. Along with deployment of the new IHFS database, we have provided software to convert existing data at WFOs (from the old WHFS_DB) and RFCs (from the old RFC_DB) to the new database. By the end of September 1997, the Missouri Basin RFC had successfully adjusted and tested the critical operational core of their local applications with the new IHFS_DB. The other RFCs still have work to do to adjust their local applications, but some of that work has already been done due to the phased changeover from the RFC_DB to the IHFS_DB. At WFOs, migration from the old database to the new database requires minimal effort because they have little or no local applications developed against it. 7. IHFS DATABASE FUTURE GROWTH With the delivery of IHFS_DB V1.0, we have established a single, common set of database structures and procedures for those hydrologic systems relying on relational technology. We are positioned to move forward from this common baseline, by extending it in the areas discussed below. 7.1 Functional Extension IHFS_DB V1.0 supports all of the hydrologic functionality supported by WHFS_DB and RFC_DB. However there still remains a significant body of hydrologic modeling code supported by custom built host file data repositories developed in the mainframe era. In addition, new functionality must be added to the existing operational systems to improve our mission performance and make effective use of the organizational, procedural, scientific, and technological changes being introduced as part of NWS modernization. NWSRFS had its genesis as a hydrologic modeling system developed on batch mainframe systems in the 1970s. It used FORTRAN for managing data in random access, indexed file systems designed specifically for the purpose. While the software architecture of the NWSRFS data system was well thought out at the time, it suffers from two major problems: First, it uses custom software that is relatively expensive to maintain and that is available today as part of more cost effective commercial data management systems; Secondly, the physical structure was designed specifically for speed in a narrow domain of hydrologic modeling. Not only are the operational requirements expanding beyond the purely modeling domain, but improvements are being made in modeling science. The old physical design of the NWSRFS data system is constraining our ability to add new science and new functionality to the forecasting system. 7.2 Software Engineering Issues The application code that relies on relational technology is not well insulated from the details of the physical implementation of the database. This places us in the position of having to modify application code each time we make modifications to the physical structure of the database. This is not so with the older NWSRFS code where a concerted effort was made to provide separation between application and data management code. We plan to develop an insulating layer that hides the physical details of IHFS_DB from the application layer. This layer will allow applications to refer to data in the database based on natural data attributes. We will be able to treat the data as "natural data objects" rather than as artifacts of a particular physical implementation. Once this layer is developed, we plan to insert it into NWSRFS, thereby providing the mechanism to convert NWSRFS from reliance on the old host file data repositories to reliance on IHFS_DB. 7.3 Data Engineering Issues The evolution of RFC_DB and WHFS_DB involved development at the physical level of database design. While some architectural consideration made its way into the design, it was based more on the skill and experience of the developers and less on a deliberate approach to software engineering and data engineering in particular. The merge allowed a greater level of architectural consideration; however it was focused more on producing a single common baseline than on a rigorous architectural approach. We are currently reexamining the data domain of the NWS hydrologic enterprise. We have developed an initial conceptual model of the data and are extending it to cover additional territory (referred to in Section 7.1 above). An object-oriented approach for the analysis is being used, with plans to follow through to physical design using object-oriented engineering. We also expect that we will move the physical tool from purely relational technology to commercially available object-relational technology. 8. SUMMARY The release of the IHFS_DB, resulting from the merge of two independent but overlapping predecessor relational databases, was the first step in a longer term effort to unify all operational and reference data used by all hydrologic applications, under one common database design. The development of IHFS_DB is rooted in extensive operational experience and feedback from forecasters. The stage has been set for integrating support for hydrologic modeling activities that today rely on technology from the mainframe batch processing era. The IHFS_DB will ultimately support the full range of functionality required for conduct of the entire hydrologic mission of the NWS at all NWS field sites. 9. REFERENCES Chen, Peter, 1977: The Entity-Relationship Approach to Logical Database Design. Q E D Publishing Co. Office of Hydrology, 1997: The Complete SHIMS Manual, Version 4.11. National Weather Service (Office of Hydrology, WFO Louisville, WFO Indianapolis, and Southern Region Hydrologic Services Division), April 1997. Office of Hydrology, 1997a: IHFS_DB Version 1.0 Database Design Model (maintained in CASE tool). National Weather Service Office of Hydrology, August 1997.

As required by 17 U.S.C. 403, third parties producing works consisting predominantly of the material appearing in NWS Web pages must provide notice with such subsequently produced work(s) identifying such incorporated material and stating that such material is not subject to copyright protection.