78th Annual AMS Meeting Phoenix, Arizona January 1998 RECENT DATABASE DEVELOPMENTS AT THE NATIONAL WEATHER SERVICE OFFICE OF HYDROLOGYJon Roe Geoff Bonnin Mark Glaudemans Charles Gobs Paul Tilles Office of Hydrology NOAA/National Weather Service 1325 East-West Highway Silver Spring, Maryland 20910
1. INTRODUCTION The National Weather Service (NWS) Office of Hydrology has initiated an effort to unify all operational and reference data used by all hydrologic applications under one common database, the Integrated Hydrologic Forecast System (IHFS) Database or IHFS_DB. The intent is that the IHFS_DB provides a common database framework for hydrologic applications at all Weather Forecast Offices (WFOs) and all River Forecast Centers (RFCs). This common framework will allow field offices and the Office of Hydrology to develop hydrologic applications in a coordinated way that promotes application and information sharing across offices. The framework also promotes incorporation of field-developed applications into the national baseline delivered by the Automated Weather Interactive Processing System (AWIPS). The first released version of the IHFS_DB is the result of the merge of two independent but overlapping predecessor relational databases, the RFC_DB and the WHFS_DB. The RFC_DB evolved from work in the early 1980s at RFCs to store operational data in a commercial database. The WHFS_DB is the relational database that supported the WFO Hydrologic Forecast System (WHFS) at WFOs. The WHFS_DB was developed on an independent path in the early 1990s. Since the early 1980s at RFCs and since 1994 at WFOs, versions of the predecessor relational databases have been in continuous operational use. The evolution of this technology has therefore been heavily influenced and proven by the functional needs of operational forecasters and by the reliability and performance needs of the operational environment. The IHFS_DB provides an enterprise rather than an application specific approach to data management. The merge of the predecessor databases has set the stage for additional significant change such as integrated support for hydrologic modeling activities that today rely on technology from the mainframe batch processing era. The IHFS_DB will ultimately support the full range of functionality required to conduct the hydrologic mission of the NWS at all NWS field sites. 2. IHFS DATA MANAGEMENT VISION The Office of Hydrology has the vision of unifying all operational and reference data used by all hydrologic applications under one common database. IHFS_DB will be the single logical repository at the site (forecast office) level. Each element of functionality at a site will use this repository and the knowledge of what data is needed will reside exclusively within the function. This avoids the problems of staging data in many redundant storage areas. It encapsulates the knowledge of data needs within the domain of each element of functionality and also provides a mechanism for integration between the diverse elements of functionality provided by the site forecasting system (Figure 1). Furthermore, the IHFS_DB will provide a common database framework for hydrologic applications at all WFOs and all RFCs. This common framework will allow both field offices and the Office of Hydrology to develop hydrologic applications for the national baseline and for unique local use in a coordinated way that promotes application and information sharing across offices. Figure 1. The Site Forecasting System Uses a Single Logical Site Level Database 3. THE IHFS DATABASE The first released version of the IHFS_DB is the result of the merge of two independent but overlapping predecessor relational databases, the RFC_DB and the WHFS_DB discussed in Sections 4 and 5 following. Delivery of services by RFCs and WFOs is an ongoing, mission critical priority for the NWS. The support provided by the Office of Hydrology recognizes this priority by ensuring that system improvements are delivered in a way that does not disrupt ongoing delivery of services, and by ensuring that the life-cycle costs of application software are properly considered. Based on these considerations, we have chosen to implement our long term data management vision in a series of shorter duration steps rather than taking a larger amount of time to produce a single "big-bang" delivery. The results of the first group of steps (referred to as "the merge") are discussed in Section 6, and in Section 7, we provide a look ahead to the future. 4. THE RFC DATABASE 4.1 History of Evolution By early 1983, data encoded in the newly developed Standard Hydrologic Exchange Format (SHEF) were passing across NWS communications circuits, and for the first time, being automatically decoded and posted to the RFC Gateway systems at RFCs. The next step was to provide a mechanism for automated transfer of the data to the NWS River Forecast System (NWSRFS). RFC Gateway systems were based on a software system (DATACOL) developed at the California-Nevada RFC and modified at the Missouri Basin RFC. DATACOL ran on local minicomputers whereas NWSRFS was run in batch mode at the National Oceanic and Atmospheric Administration (NOAA) Central Computing Facility (NCCF) using Remote Job Entry Systems. Recognition of the problems associated with the separation of data from the forecast system and the differences between RFC Gateway and NCCF computing environments led to the initiation of a series of projects to bring the data and the forecast systems together. The first steps in this direction were made with the development of a database using a commercial relational database product on RFC-based mini-computers and the porting of NWSRFS to the mini-computer environment. By 1985, the system was being used operationally, and data were arriving at the RFC, being posted in the relational database tables for review by forecasters and being transferred to NWSRFS using automated procedures. The initial system was replicated and implemented at several RFCs so that by the late 1980s a considerable body of operational experience had been developed and the system had been enhanced and tuned by its users. By 1992, the system had been transformed both functionally and in terms of computing environment. SQL compliant relational database management products had been adopted and the computing platform was established as a site network of UNIX-based workstations. The major functional transformations involved significant enhancements and additions to the physical database tables. These enhancements were made to support the Weather Surveillance Radar - 1988 Doppler (WSR-88D) Precipitation Processing Subsystem and more automated and operationally effective transfer of data to NWSRFS. These systems were again replicated and used for operations at most RFCs. 4.2 Lessons Learned From Operational Experience Since the early 1980s at RFCs, versions of RFC_DB have been in continuous operational use. The evolution of this technology has therefore been molded by and proven in the operational environment. The development has responded to the reliability and performance needs of real forecasters. While significant portions of the system were being maintained and enhanced by the Office of Hydrology, individual RFCs were also making their own enhancements and sharing them with each other. This evolution amounts to approximately 15 years of testing and improvement in the real-world, operational environment of a range of diverse RFCs. Throughout that period, RFCs have relied on these systems and have continued to deliver critical services (including during such extreme events as the Great Flood of 1993). A number of lessons have been learned from this experience: - Commercial relational database products can be used to provide the reliability and performance necessary for operational forecasting. - With the appropriate mix of skills, scientific software with a FORTRAN legacy can be effectively wedded with more recent programming languages, tools, environments, and architectures. - Evolution of database structures by addition of functionality at the physical level without a guiding data architecture framework leads to designs that become more complex and difficult to maintain and enhance over time. This is a natural phenomenon for software systems as well, not just physical data structures. - To provide an effective data management environment over the long term, data needs must be considered from the point of view of the enterprise and the various elements of functionality performed within the enterprise, rather than viewing the data needs in a "stove-pipe" manner from the point of view of independent applications or a narrow range of functionality..5. THE WHFS DATABASE The second of the two relational databases that evolved to form the IHFS_DB is the WHFS_DB. Work on the WHFS_DB commenced in 1993. Because of its later design, which built upon lessons learned from the RFC_DB, many trademark conceptual aspects of the WHFS_DB have endured into the design of the IHFS_DB, as is discussed in Section 6. 5.1 History of Evolution The WHFS_DB is the relational database that supported the WHFS at WFOs. The beginnings of the WHFS_DB can be traced to the development of the Service Hydrologist Information Management System (SHIMS) database and user interface software (Office of Hydrology, 1997). SHIMS was initiated in 1986 by NWS Central Region staff, using Rbase 5000 database software. By 1990, the SHIMS database definition had stabilized and the database was ported to the Paradox database software. Deployment of SHIMS at all NWS WFOs began in 1992, when it was decided that it would serve as the basis for the initial AWIPS hydrologic database at WFOs. SHIMS was established to provide an automated system for storing and managing the information used by the Service Hydrologist and NWS forecasters. The Service Hydrologist, or in some offices the hydrologic focal point, is responsible for the hydrologic program at WFOs, which work together with the RFCs to provide the operational hydrologic services of the NWS. SHIMS data consists of information contained in NWS Form E-19, which is the official form describing a hydrometeorological data collection and/or forecast location, and Form B-44, which contains location-specific information pertinent to the cooperative observer program. The SHIMS databases were populated at each site using a custom Paradox user interface operating on personal computers from the information on E-19 and B-44 standard forms. The E-19 information comprises most of the SHIMS data definition, and consists of reference and historical information of a static nature. The SHIMS database does not contain operational data such as current river stage and precipitation reports, but rather it supports operations by containing location reference information needed in times of weather events and for general day-to-day inquiries. In late 1993, with the advent of the WHFS project, attention turned to the relational database that would serve as the database for all application software. The importance of a well-structured database became critical as it was decided that all operational and reference data would be stored in the relational database. The first incarnation of WHFS_DB relied heavily on the SHIMS design for the static reference data. However, a fresh approach incorporating lessons learned from RFC_DB was used for non-static data. The RFC_DB identified operational data as either observed or forecast data. The WHFS_DB used this same concept but instead of having application pre-processors extract data from the tables associated with these two categories, it used an additional level of data storage by denormalizing all observed and forecast data to a set of additional tables stratified by physical element. Therefore, there was a table for observed stage values, a table for observed precipitation, and tables for other physical elements. The same was true for forecast data. This resulted in a disaggregation of the data for a given station as it was now spread out in multiple tables. However, it significantly improved database performance. The WHFS applications are typically very interactive and require a fast response time after the arrival of data. The division of the operational data by physical element is one of the basic concepts of the WHFS_DB, and is present in the IHFS_DB, as discussed later. 5.2 Lessons Learned From Operational Experience Since 1994, versions of WHFS_DB have been in continuous operational use at WFOs, and as with the RFC_DB, a number of lessons have been learned in the course of taking the database from the blackboard to the operations floor of a weather office: - Designs resulting from perfectly logical interpretations of the database attributes resulted in unacceptable performance in some cases. In particular, the division of observed data by physical element resulted in very large and heavily used tables for two primary parameters - river stage and precipitation. Queries for multiple station's data from these tables were slow. For stage data, a redundant table containing only the most recent data for a location was added. For precipitation, a pre-processor that populated a denormalized table of precipitation accumulation for given durations was implemented. These two changes greatly improved performance. - The other observed physical elements and the forecast data do not comprise sufficiently large volumes for performance to be an issue. - There is overlap between long and short term forecast time series of river stage for a location - both have the same creation date but different issuance times. Additional processing was added to merge them into one virtual time series by abutting the time series in sequence, based on their issuance times. - Tables listing valid values (e.g. valid lists), such as lists of WFO identifiers, counties, states, etc., can be used efficiently and effectively to provide referential integrity via foreign key relationships. 6. THE DATABASE MERGE In early 1996 the Office of Hydrology decided to pursue the consolidation of the two separate but overlapping relational databases discussed earlier in sections 4 and 5, the RFC_DB and the WHFS_DB. It was decided to treat this consolidation as the first concrete step toward the creation of the IHFS database that would unify all data structures that serve all hydrologic applications supported and developed by the Office of Hydrology. 6.1 Overlapping Databases The overlap between the two relational databases was concentrated around the storage of dynamic operational data (i.e., point station observations and forecasts) and not station reference data. The only significant overlap of reference data was a table in both databases that described locations of reporting stations (e.g., river stage stations and precipitation gages). Therefore, most of the work of merging the two databases occurred in refining the storage mechanism for operational data in a way that is useful for both RFC and WFO applications. When the database merge was proposed, the WHFS_DB contained about 95 tables and the RFC_DB contained about 25 tables. There were 5 or 6 key tables that were directly overlapping between the two databases. Also, the RFC_DB database contained denormalized application-centric tables of operational data that did not overlap in structure with the WHFS_DB. The RFC_DB was augmented with several classes of host files (i.e., files maintained by an operating system external to a relational database) that contained data that were excellent candidates for inclusion into the merged relational database. 6.2 Merge Approach The proposed merge of the two relational databases was approached from the perspective of using the WHFS_DB as the baseline and extending it to satisfy all of the requirements previously satisfied by the RFC_DB. This approach, rather than using the RFC_DB as the baseline or starting from a clean sheet of paper, was chosen for several reasons. First, the method used by the WHFS_DB to store operational point data was less application-centric and more data-centric, therefore easier to extend for future data requirements. Second, the WHFS_DB had the advantage of possessing a more recent design that took advantage of lessons learned from the experience of using the RFC_DB in the field. Third, the WHFS_DB contained a much broader base of reference data entities and attributes that had evolved from the earlier SHIMS database design. Lastly, the years of successful operational experience gained from field deployments of both the RFC_DB at RFCs and the WHFS_DB at WFOs negated the need to start from scratch. Just because the WHFS_DB was used as the baseline for the merge, it was not assumed that its structures were inviolate. Those structures were scrutinized from an architectural viewpoint at the same time that the RFC_DB's requirements were inserted into the design of the merged database. Prior to the merge of the two predecessor relational databases, their designs were captured via reverse engineering into a Computer-Aided Software Engineering (CASE) tool that supported entity-relationship data modeling according to Chen (1977). The reverse engineering resulted in a series of entity-relationship diagrams (ERDs) and a comprehensive data dictionary that described all entities (i.e., tables and files) and all attributes (i.e., columns and fields). The data modeling required to merge the two databases proceeded from this initial set of CASE tool information. Today, the design (i.e., ERDs and data dictionary) of the merged IHFS_DB is maintained and extended with the CASE tool (Office of Hydrology, 1997a). 6.3 Merge Results Each RFC_DB entity (data table or file) was examined and compared against existing structures within the WHFS_DB. For tables that clearly overlapped in function and form between the two databases (e.g., the station location table and the radar location table), a single common table structure was devised that accommodated WFO functionality and RFC functionality. The RFC_DB included several host data files that contained reference information (e.g., application control parameters). These files were re-implemented as relational database tables. RFC_DB grids (WSR-88D radar products and precipitation analysis grids) were brought into the merged database by defining relational tables to hold grid attributes, one of which points to a host file that holds the grid data values. Most of the application-centric structures that held operational point data for the RFC_DB were eliminated in favor of more data-centric structures already in place in the WHFS_DB. Figure 2 illustrates the principal processes and data stores of the merged database. Encoded data messages are decoded and posted to the two main tables for observations (i.e., ObsValue) and forecasts (i.e., FcstValue). Data from unknown stations (i.e., not in the Location table) are posted to the UnkStnValue table. The user can choose to store the encoded messages in the TextProduct table. Observations and forecasts are then denormalized via database triggers and stored procedures into a series of tables stratified by physical element type. The IngestFilter table controls which physical parameters are allowed to be propagated into the denormalized parameter-specific data tables for each station. The operational data denormalization does not result in completely duplicated data since data are typically stored for about 30 days in the parameter-specific tables and only for about one day in the two main tables. As required by 17 U.S.C. 403, third parties producing works consisting predominantly of the material appearing in NWS Web pages must provide notice with such subsequently produced work(s) identifying such incorporated material and stating that such material is not subject to copyright protection. |
Main Link Categories: Home | OHD | NWS |