National Weather Service United States Department of Commerce
Presented at AMS Conference
12th International Conference on Interactive Information and Processing System (IIPS) for Meteorology, Oceanography, and Hydrology
Atlanta, Georgia
January 28-February 2, 1996



The NOAA Hydrologic Data System



Geoffrey M. Bonnin
Office of Hydrology
NOAA/National Weather Service, W/OH3
1325 East-West Hwy., RM 8426
Silver Spring, Maryland 20910

Table of Contents


1. Introduction
2. Why Do We Need Observed Data
3. How Do We Use Observed Data
4. The Data Domain
5. Disaggregated Data Systems
6. Context
    6.1 Data for Operations
    6.2 AWIPS
7. Summary


1. Introduction



The National Weather Service (NWS) Office of Hydrology (OH) has begun development of the NOAA Hydrologic Data System (NHDS). The essential characteristic of the NHDS that discriminates it from previous data management approaches is its scope and level of integration, its cross-cutting nature. Current data management systems have been designed for specific isolated purposes, perhaps a few purposes in some cases. The NHDS will satisfy a diverse range of clients and needs and will provide access to the broad range of information that is required by new hydrologic data assimilation and analysis techniques. The information managed by the NHDS spans the time domain from real-time to historical, and spans the quality domain from as-is to high levels of quality managing raw to highly processed data. Information within NHDS will be transitioned across the time and quality domains from as-is-real-time to high-quality-historical. This paper discusses the NHDS from the point of view of the problems it is to address, as well as the interplay between NHDS and the context in which the development takes place.

2. Why Do We Need Observed Data?

While this may seem a trite question, it is useful to reexamine why data is needed and how it is used in order to establish a basis for operationally useful data systems. Data is used in both the operational and development environments. The basic assumption we make when forecasting is that if we can simulate the past then we can successfully predict the future given a knowledge of the appropriate inputs. In an operational environment, this means that we need observations to know if we have been successful in simulating the processes that lead to current conditions. If the simulation has not been successful, we use information contained in the observations to bring the simulation to a reasonable representation of reality. In the development environment, we try to produce models whose physics are appropriate and whose parameters enable the physics to represent to the hydrology of particular basins. Again, we use the information contained in observations as a representation of the reality we are trying to simulate and then we vary the model physics and the model parameters until we are satisfied that we have a useful model for the particular basin. In addition to these two situations, we can use simulations to infer information that is difficult to measure. In these cases, the inference is a function of the model physics, model parameters and the inputs that we used. For example, soil moisture accounting models can be used to infer information about moisture fluxes between the atmosphere and the soil surface - a current area of investigation that is not easily characterized by measurement.

3. How Do We Use Observed Data?

Observed data is used to infer the reality we are trying to represent by simulation. Generally, the physics we are trying to represent is distributed in space and time, whereas the observations we make are merely samples of the distribution. The task is then is to sufficiently interpret the samples so as to make them useful to the physics of the models. We can use statistical tools, guided by an understanding of the physical process to infer the actual time/space distribution that the observations have sampled. In contrast to þdata analysisþ which is the process of inferring reality from the samples, þquality analysisþ gives us information relating to both potential errors in the sample and potential errors in the inference. Errors in the sample can arise not only from variation in the parameter being sampled, but also from errors in communications, transcription, equipment etc. For example, there are cases where several devices for measuring river stage are used at a single location but with different vertical references. This can lead to simultaneous observations which are apparently quite different if knowledge of the vertical frame of reference is not associated with the particular observation. If these apparently different observations are used for data analysis without being corrected during quality analysis, a false picture of the variation in the parameter itself will be implied.

4. The Data Domain

Data exists in and spans a variety of domains. This point can be clearly understood by looking at the time and space domains. The time at which the observation was made i.e. at which the datum is valid, is the point in the time domain at which the data exists. The time attribute may be complicated by the fact that the data may apply to a non-instantaneous period, e.g., it may be an average for a period or an accumulation for a period. Similarly, the data exists in the space domain in that it has location. The location attribute may be a point, a line, or an area, etc and the datum may be an average over the location or some typical or extreme value over the location (spatial representation). While there are many domains in which the data may exist, there a few which are particularly relevent to the NHDS.

Clearly the time and location domains are important. The time domain can also be classified into "operational" and "historical" where the term operational applies to near real time observations and the term historical applies to data which is older. By using this classification, it becomes apparant that operational data becomes historical as time moves forward.

The position of the data in the data type domain (what is the data type?) can be used to determine if the data should be continuous over space and time, or discrete. This knowledge helps in determining appropriate data analysis procedures. The quality domain provides information about the power of the information content of the data to imply the reality of the parameter being described. Another way of looking at this domain is to assume that it is synonymous with the level of processing (which is not always the case). For example, the NEXRAD Stage III precipitation product has a much higher level of processing and a much higher information content than an individual rain gage measurement.

The knowledge that one combines information that exists at different positions in various data domains, provides assistance in choosing appropriate data analysis techniques.

The classification of the time domain into operational and historic also serves to highlight different approaches that exist today within the NWS hydrology program. Operational data is obtained from sources with a wide range of quality. The requirement to produce forecasts in near real time places a constraint of the level of processing that can be applied to the data. Data analysis applied in near real time is minimal, however the NWS is attempting to improve the level of analysis in order to obtain a better estimate of reality. An example of this higher level of processing is the NEXRAD Precipitation Processing System. Operational data has generally been lost! NWS offices have not been charged with archiving data. Historical data to be used in model development and calibration is generally obtained from agencies that are charged with archiving such as the National Climate Data Center. However, significant amounts of operational data do not enter these archives, and often the sources of the archived data are different from the real time sources. This results in statistical differences between the operational and historical data sets. Using a model with operational data, that has been developed and calibrated using historical data raises questions with respect to model output.

Historical data often undergoes a broad range of quality analysis and/or data analysis conducted by the archiving agencies. In some cases information pertaining to the operations performed on the data is published, in other cases it is not, adding to the complexity of the data analysis problem when combing the data in subsequent analyses.

Another issue for NHDS to deal with is that while in the past, we have focused on individual points of data and their associated quality, newer modeling techniques are driving us to using estimates of the time/space distribution of the sampled parameters. This requires a new approach to data analysis, one that considers the data as multiple samples of a time/space distribution rather than as independent and discrete events.

5. Disaggregated Data Systems

Data systems in the NWS have generally grown as an adjunct to either communications systems or science applications such as modeling software. As a result, a variety of data systems have evolved, each tailored to the specific application they are associated with. The result of this evolution is that there are a number of data systems in hydrology, all of which do some of the job of managing hydrologic data and none of which do all of the job. This situation of þdisaggregatedþ or þanarchicþ data systems creates problems when a data user tries to get all appropriate data regardless of domain. The task becomes one of aggregating data from a variety of systems, in a variety of formats into a single useable data set (which then becomes yet another system). The NWS hydrology program relies to a large extent on the NWS River Forecast System (NWSRFS) which incorporates a custom data management system. While the NWSRFS data management system meets the majority of the needs of the NWSRFS operations, it still displays the characteristics of having been developed as an adjunct to the specific applications. This evolution is in contrast to an approach which considers a data system initially in the context of the data to be managed. Such an approach reflects more of the natural architecture of the data and while considering the access patterns of particular applications does not become overly constrained by them, resulting in a data system that allows for evolution in the use of the data more readily than systems that evolve as adjuncts to particular applications.

6. Context

While it is important that a data system be designed from the point of view of the data to be managed, it would be unreasonable to imply that the system is independent of its context. The following paragraphs discuss a number of the contexts that are pertinent to the NHDS.

6.1 Data for Operations

The problem of statistical differences between operational and historical data referred to earlier can be mitigated by retaining operational data for later use (merging it with the more traditional historical data sources). This implies that one of the requirements for NHDS be that it be able to transition data from the operational to the historical time domain. Another aspect of the disaggregation problem is that the results of data analysis done in the context of one application are not easily transferred to the context of other applications. This reduces the effectiveness of model calibration and development efforts and mitigates against the results of the intuitive analysis done in the midst of the event, being available later in the historical domain. Applications would be more readily able to share information if it were stored in a shared data management system.

In the future, the NWS will be making greater use of ESP (Extended Streamflow Prediction) technology. ESP uses information contained in historical data to establish viable distributions of future outcomes. The technology uses high quality historical data sets in real-time computation, resulting in the need for historical data in an operational or real time data management system.

6.2 AWIPS

AWIPS (Advanced Weather Interactive Processing System) is the future processing environment for NWS operations. Any complacency resulting from the long gestation period of AWIPS must be dispelled. AWIPS is almost here and we must be ready for it. One of the fundamental assumptions in the design of AWIPS is that it will be an integrated system, providing all of the functionality necessary for operations. Key necessary conditions for an integrated system that are relevant to NHDS are:

    - functions cooperate through shared data
    - data can be shared if there is a common definition of the data
    - data integrity is maintained through controlled access
    - there is a common look and feel (or user interface)
    - the system has an effective user and systems operations concept

For AWIPS, these concepts will be promoted and maintained through a software architecture that provides common system functions to applications through defined Application Programmers Interfaces (APIs). Data management integration is achieved through the Data Management API, common look and feel through the Human Interface API, and system operations concepts through the Communications, System Support, and Monitoring and Control APIs. If it is to be more than yet one more data management system, NHDS must exist in the AWIPS context, making use of the AWIPS system architecture including the APIs. As AWIPS is a system that will evolve through continuous improvement (Pre-Planned Product Improvement or P3I in AWIPS parlance), NHDS must also be able to accommodate planned improvements in NWS hydrologic applications such as are foreshadowed by the Advanced Hydrologic Prediction System.

7. Summary

New data analysis procedures and science are required to generate the high quality historical input to calibration, ESP, and model development. The NHDS will provide tools to generate and use high quality historical data sets in a productive environment that provides integrated access to all relevant data. The environment will provide unified logical access for all hydrology applications whether developed by OH or in the field and the results of data analysis will remain available for future reuse. The NHDS will exist within the context of AWIPS where many elements of NHDS functionality will be provided by AWIPS.

Main Link Categories:
Home | OHD | NWS