National Weather Service United States Department of Commerce
About the Stage III data
(Last modified: 2/28/02)


The purpose of this write-up is to provide the DMIP participants with some sense as to how Stage III precipitation estimates were produced, what the known error sources and characteristics are, and what may be expected when using Stage III data as precipitation forcing in hydrologic models.

The main ingredients to Stage III data are the Digital Precipitation Array (DPA) products, operational hourly rain gauge data, and interactive quality control by the Hydrometeorological Analysis and Service (HAS) forecasters at the River Forecast Center (RFC). The DPA products, sometimes referred to as the Hourly Digital Precipitation (HDP) products, are generated by the Precipitation Processing Subsystem (PPS), which is one of many automatic algorithms in the WSR-88D Radar Product Generator (RPG). For a description of PPS, the reader is referred to Fulton et al. (1998). Even though it has "precipitation" as its first name, PPS is designed to estimate rainfall and rainfall only. As such, its products are of highly suspect quality in times and areas of snowfall.

The DPA products are radar-only estimates of hourly accumulation of rainfall on an approximately 4x4 km2 rectilinear grid. This grid, referred to as HRAP (Hydrologic Rainfall Analysis Project), is based on the polar stereographic projection. It is a subset of the Limited Fine Mesh (LFM) grid used by the Nested Grid Model (NGM) at the NWS National Centers for Atmospheric Prediction (NCEP). For further details of this mapping, the reader is referred to Greene and Hudlow (1982) and Reed and Maidment (1999).

The accuracy of the DPA products are affected mostly by the following factors; 1) how well the radar can see precipitation near the surface given the sampling geometry of the radar beams and the reflectivity morphology of the precipitating cloud, 2) how accurately the microphysical parameters of the precipitation system are known (Z-R, hail cap, etc.), 3) how accurate the radar hardware calibration is, and 4) various sampling errors in the radar measurement of returned power (how many pulses per sampling volume, how many scans per hour, beam width, etc.)

The first, known as the vertical profile of reflectivity (VPR) effect, can introduce a factor of two (or lower) overestimation (where the radar beam intercepts the bright band layer) and a factor or ten (or higher) underestimation at far ranges of the radar (where the radar beam samples ice particles rather than liquid precipitation) in well-developed stratiform precipitation in the cool season. The following rule of thumb may be useful in assessing the presence and spatial extent of the VPR effect in WSR-88D precipitation estimation. The axis of the lowest radar beam (approximately 0.5 elevation angle) reaches the altitudes of 1, 2, 3, 4, 5 km at ranges of approximately of 60, 120, 160, 200, 230 km, respectively. Hence, if the freezing level is at 2 km above the ground, one may expect bright band enhancement at and around the range of 120 km (resulting in overestimation of rainfall if the Z-R parameters are applicable to the surface rainfall, which very often is not the case) and radar sampling of ice particles beyond that range (resulting in severe underestimation of rainfall if the Z-R parameters are applicable to the surface rainfall). Note that, at Oklahoma City, the climatological freezing level is at or below 2 km in the months of February and March, and at or below 3 km through May (Smith et al. 1997).

One of the more important changes in the production of DPA, related to the sampling geometry of the radar beams, occurred in the spring of 1996 when bi-scan maximization (see Fulton et al. 1998 for details) in PPS was essentially disabled. What that means is that DPAs afer the spring of 1996 suffer less from bright band contamination and are less range-dependent. The net effect of this change to the overall quality of Stage III data over the DMIP basins, however, is less clear because bi-scan maximization tended to compensate, to an extent, for radar underestimation of rainfall due to nonuniform vertical profile of reflectivity (Seo et al. 2000) and inaccurate Z-R parameters. It is difficult to pinpoint the exact timing of this change in the Stage III product (which is based on DPAs from many sites: see below) because each radar is operated independently and hence the timing of the change varies from site to site. For a summary of radar-only and radar-gage evaluation of DPA products prior to the disabling of bi-scan maximization, the reader is referred to Smith et al. (1996). For similar analyses based on the DPA products since the disabling of bi-scan maximization, the reader is referred to Smith et al. (1997).

As for the microphysical parameters, the Z-R is the most important. Initially, only the "convective" Z-R parameters were used; Z=300R1.4. Though they work well for deep convective precipitation systems, the convective parameters underestimate, often severely, for other types of storms. In 1997, the "tropical" Z-R parameters, Z=250R1.2, were added to be used for hurricanes, tropical storms, small scale deep-saturated storms fed by tropical oceanic moisture, etc. In December of 1999, the "stratiform" Z-R parameters were also added to be used for general stratiform events (Z=2001.6) and for winter stratiform events at sites east (Z=130R2.0) and west (Z=75R2.0) of the continental divide. (The use of the stratiform parameters does not intersect the DMIP simulation period, and hence is not of direct interest here.) Loosely speaking, the tropical Z-R produces about a factor of two more rainfall than the convective. It is not known, however, whether there are any specific events through 1997 that have been identified as "tropical" based on post analysis.

Lack of radar calibration also had an effect on the quality of Stage III data in the study area. It is known, for example, that KTLX (Twin Lakes, OK) was biased low (or "cold" in the NEXRAD lingo) in the early years (at least through 1995), resulting in rather significant underestimation, up to a factor of two, of rainfall (Smith et al. 1996, Seo et al. 1999).

Whereas the errors describe above affect many bins over a relatively large area in more or less the same ways, the effects of sampling errors are much more random and can vary from one HRAP bin to the next. The operational experience of Stage III data is limited to the lumped models, for which the effect of the sampling errors tends to average out. The effect of the sampling errors in distributed modeling is largely unknown.

Another important source of error in the DPA product, which has been fully identified only recently, is strictly computational. Due to the CPU and RAM limitations in the "legacy" Radar Product Generator (RPG), PPS uses I*2 arithmetic (rather than I*4). Inconsistencies were found in the arithmetic that resulted in truncation, as opposed to rounding-off, of rainfall amounts. The net effect of this bug (which has mostly been fixed in 2001) is minimal for most rainfall events. For long-lasting stratiform events, however, the total loss of rainfall (due to not counting very small amounts) can be rather significant (see http://hsp.nws.noaa.gov/oh/hrl/papers/2001mou/Mou01_PDF.html). Also, it is estimated that this error is a large contributing factor to the conditional bias seen in the DPA products, i.e., the smaller the rainfall estimate in the DPA product is, the larger the bias (on the low side) relative to the gauge rainfall (Seo et al. 1996).

Once the DPAs were transmitted to the RFC, they were fed into Stage II. Stage II was made basically of three algorithms; mean field bias adjustment, gauge-only analysis and radar-gauge analysis. Stage II was run on a radar site-by-radar site basis before its products were mosaicked in Stage III. This practice, favored by the first designers of the system for computational and programmatic reasons (Hudlow 1988), has significant drawbacks, as will be explained shortly. Of the Stage II algorithms, the mean field bias adjustment had by far the biggest quantitative impact. Note that, in terms of the scatter plot between the gauge and the matching radar rainfall estimates, the mean field bias adjustment has the effect of pulling the line of scatter closer to the diagonal, and hence can greatly impact the catchment-wide volume of water being estimated (see also Steiner et al. 1999).

For mean field bias adjustment, the algorithm of Smith and Krajewski (1991) was initially used. Operational experience, however, indicated that the algorithm tended to significantly undercorrect, resulting in mean field bias-adjusted rainfall being significant underestimates. After a period of redevelopment (Seo et al. 1997, Anagnostou et al. 1998), the algorithm was replaced by Seo et al. (1997) in the late spring/early summer of 1997.

Initially, gauge-only and radar-gauge analyses were carried out, respectively, by the reciprocal (or inverse) distance-squared method and by a scheme that performed linear weighted averaging, at each bin, of the gauge analysis estimate and the matching raw radar rainfall estimate. Based on operational experience, these algorithms were replaced in the spring of 1996 by the kriging- and cokriging-like, respectively, algorithms of Seo (1996a, 1996b). It is important to note that the quality (or lack thereof) of gauge-only analysis figures prominently in the quality of the Stage III data, particularly in the early days of NEXRAD when not all WSR-88Ds were in place and DPAs were frequently missing due primarily to communication problems.

If the DPA was not available/missing for the WSR-88D umbrella, the gauge-only analysis field was used for that site in the Stage III mosaicking process. Gauge-only analysis at the hourly scale, however, is susceptible to the number of available real-time hourly gauge data as well as its spatial configuration (which varies from analysis time to analysis time). Note, for example, that, if there are no gauge data over a large area, the gauge-only analysis algorithm has no choice, in the absence of radar data, but to assume no rainfall. As such, depending on the density of the real-time hourly gauge network in the area, the gauge-only analysis could be significant underestimates, due primarily to lack of detection of precipitation (i.e. by sparse gauges). An opposite problem of sort also existed with the initial gauge-only analysis algorithm (used up to the spring of 1996), which could produce significant overestimates due to too large a radius of influence assumed. It is also worth noting that not all gauges that are available now were available in the earlier years of NEXRAD. As such, the quality of Stage II/III products suffered not only from frequently missing DPAs but also from fewer rain gauge data to work with.

Radar-gauge analysis does not have as great a quantitative impact as the mean field bias adjustment. In the context of the scatter plot of gauge and matching radar data, the role of radar-gauge merging is primarily to reduce the scatter (as opposed to pulling the line of scatter toward the 45 line). Because the merging algorithm assumes (see Seo 1996b for details) that the mean field bias-adjusted radar rainfall estimates are bias-free (not only in the amount given that radar successfully detected rainfall, but also in the radar detection of rainfall), the quality of radar-gauge analysis depends directly on the quality of mean field bias adjustment. This has two large consequences. The first is that, in the early years of NEXRAD when the initial mean field bias adjustment algorithm tended significantly undercorrect the bias, the radar-gauge estimates (and hence the Stage III data, which are the mosaic of Stage II data) also tended to underestimate. The other is that, at far ranges from the radar where beam overshooting occurs (i.e. where radar beam overshoots the cloud top, thus failing to detect precipitation) particularly for low-topped precipitation systems in the cool season, the radar-gauge estimates necessarily tend to underestimate.

In the summer of 1996, ABRFC implemented a local bias adjustment algorithm called Process 1 (P1). P1 calculates HRAP bin-specific ratios of gauge-to-radar rainfall at gauge locations, and performs spatial interpolation of the ratios based on triangulation of gauge locations (Young et al. 2000, Seo and Breidenbach 2002). Because it relies exclusively on the data from the current hour for the adjustment, P1 is susceptible to sampling (both in space and time) errors, but works well in relatively uniform widespread cool-season precipitation (for which Stage III is known to perform poorly due, in large part, to the "adjust-and-mosaic," as opposed to "mosaic-and-adjust," strategy of data processing: see below). The general practice at ABRFC has been that Stage III is preferred in the warm season and P1 in the cool season. For a comparative analysis of Stage III and P1 products, the reader is referred to Young et al. 2000).

Once Stage II is run for all WSR-88D sites in the RFC service area and vicinity, the Stage II products (typically the radar-gauge merging estimates) are mosaicked in Stage III. The sole mosaicking rule used initially in Stage III was simple arithmetic averaging: average all estimates from all radars in the coverage overlap, and that is your Stage III estimate. This scheme often had grievous consequences in that it made good estimates from a close-in radar bad by mixing them with poor estimates at far ranges from another radar. To ameliorate the situation within the constraints of the software, another mosaicking rule was added in the early summer of 1997, which took the maximum among all estimates in the coverage overlap as the best estimate.

Because of the variety of the sources of error in radar-based/-aided precipitation estimation, the Hydrometeorological Analysis and Service (HAS) forecasters play a critical role in improving the quality and accuracy of Stage III data. The primary tool used for this man-machine interaction was the Stage III Graphical User Interface (GUI). P1 has its own GUI with which the forecaster could also "create snow." In the initial Stage III GUI, the forecasters could, for example, add/remove gauge observations or ignore radar data, and rerun the analysis algorithms. In the spring of 1999, the "draw-in precipitation" function was added to Stage III, which allowed manual addition and subtraction of precipitation amounts over the user-specified area.

The role of the HAS forecasters is particularly important in quality-controlling rain gauge data. Real-time hourly rain gauge data are subject to all kinds of errors (see, e.g., Steiner et al. 1999), and it is well known that an alarmingly large fraction of all observations that come in to the RFC is unusable. Also, because the majority of the gauges are not heated in the winter (and hence purposefully blocked out by the RFC), gauge-aided precipitation estimates from Stage II in winter may not necessarily be much of an improvement from the radar-only DPA estimates.

Based on the several years of operational experience with Stage II/III, much of the software was overhauled in 2000 and redeveloped into the Multisensor Precipitation Estimator (MPE). At ABRFC, MPE has been running since March 2001 (along with Stage II/III and P1). Because the historical Stage III data used for the current phase of DMIP do not intersect the MPE era, a detailed description of MPE is not of direct interest here. We only list the key features of MPE; "mosaic-and-adjust" (rather than "adjust-and-mosaic") strategy, "rational" mosaicking based on data-driven delineation of the effective coverage of the radar (Breidenbach et al. 1999), improved mean field bias adjustment (Seo and Breidenbach 2001), ingestion of satellite data-derived precipitation estimate (Fortune et al. 2002), implementation of local bias correction (Seo and Breidenbach 2002), ordinary (as opposed to simple) kriging and cokriging-like gauge-only and radar-gauge analysis algorithms to improve unbiasedness.

Because the quantitative use of Stage III data at the RFC has been limited to lumped models for rather large basins, by far the biggest problem at the RFC with the Stage III data has been the systematic (mostly on the low side) biases, particularly in the earlier days of NEXRAD (say from 1993 through mid-1997). Indeed, many of the recent changes and improvements have been primarily to reduce systematic biases in the Stage III data. What this means, in terms of error statistics, is that the efforts thus far have been geared more toward reducing the mean error (ME) and conditional (on the precipitation amount) ME at the basin scale, rather than reducing the root mean square error (RMSE) at the HRAP scale. Note that the unbiasedness matters particularly acutely at the RFCs, where the hydrologic model is run in a continuous mode, and hence even a relatively small bias in precipitation forcing can result, after some duration, in unrealistic drying up of the model soil moisture.

Because all hourly rain gauge data that were available in real time had already been used in the generation of Stage III data, it is difficult to assess, at the hourly scale, the accuracy of the Stage III product via independent validation. Independent validation at the daily scale, nevertheless, should be possible based on daily observations that were not reported in real time. Such an evaluation of the Stage III data, however, is beyond the scope of this phase of DMIP, and will be explored as an future endeavor.

To assess systematic biases in Stage III data, a number of studies have been carried out (Johnson et al. 1999, Stellman et al. 2000, Wang et al. 2000) to compare the Stage III data-derived mean areal precipitation estimate (referred to as "MAPX") with the rain gauge data-derived mean areal precipitation estimate (referred to as "MAP") on a long-term scale. The general finding for the Illinois basins is that, overall, MAPX is about 5 to 10 percent lower, and, as one would expect, the magnitude of the bias varies from basin to basin, from season to season, and from period to period (particularly those associated with use of Stage III only and both P1 and Stage III). On the event scale, the general experience with Stage III data is that, in periods (particularly in the early years), Stage III data are subject to much larger biases, which may potentially render some comparisons essentially meaningless: for some events, the error bound in the streamflow simulation due to the error in the precipitation data may be larger than the inter-model differences in streamflow simulation due to the model errors.

Because of the wide spectrum of error sources and algorithm changes, it is difficult to identify, at the event scale, which errors and algorithm changes may be affecting the accuracy of the Stage III estimates and how. Even if they could be identified and their effects qualitatively assessed, it is not possible, without independent validation using high-quality rain gauge data, to quantify the magnitude of the errors in the Stage III data. It is possible, however, to gain some sense of event-specific volumetric bias that may be present in the Stage III data based on the streamflow observations. For example, one may run the hydrologic model of choice many times using different adjustment factors to the Stage III data until the resulting simulated hydrograph is reasonably close, at least in the volumetric sense, to the observed. Obviously, the resulting bias estimate, representing the bias in the Stage III data aggregated at the space and time scales of the basin and the basin response, respectively, is subject to model errors and uncertainties in the initial conditions, and hence must be interpreted due caution (much more so in the model warm-up period). Nevertheless, in the absence of any direct evidence (in the form of high-quality rain gauge data), such inference may be the only way to glimpse at the magnitude of the first-order errors in the Stage III data at the event scale of temporal aggregation.

Such an exercise, based on the Sacramento model-unit hydrograph combination in the lumped mode, was carried out for TIFM7, WTTO2 and BLUO2 in the context of variational assimilation, which produces bias estimates in precipitation forcing as a by-product (see Seo et al. 2002 for details). The event-specific bias estimates ranged from 0.86 to 2.14 for TIFM7, 0.83 to 1.39 for WTTO2, and 0.85 to 1.68 for BLUO2. It is also seen that, for TIFM7, the Stage III data in the first year or so is of highly suspect quality and may not be taken seriously, and that, for BLUO2, consistent and significant low bias exists in the Stage III data well into 1996.

Because many of the error sources are tied to the sampling geometry of radar (and to that of gauges to some extent), very often, visualizing Stage III data (say, at the temporal scale of aggregation of a day) over the entire domain offers very good clues as to the kinds of errors that the Stage III data may be subject to. As such, the DMIP participants are encouraged to visually examine the Stage III data (e.g., at http://www.abrfc.noaa.gov/archive) associated with significant flood events for signs of artifacts and anomalies.

Obviously, the event-specific bias estimates described above (even if they are in the ball park) shed little light on the magnitude of error at a finer scale (say, at the HRAP and hourly scales). The hope is that, given that unbiasedness at a larger scale is a necessary condition for that at a smaller scale, such estimates may offer some guidance as to how much stock one may put in the model calibration and/or intercomparison results at a smaller scale.

In summary, due to a variety of error sources (sampling-geometrical, reflectivity-morphological, microphysical, sampling by sparse rain gauges, algorithm changes, etc.), the Stage III data are subject to systematic errors that may vary over various time scales (a storm scale, an intra-storm scale, seasonal, etc.). As such, care must be exercised in accepting and interpreting the model simulation results. The participants are also strongly encouraged to visually examine the Stage III data and to perform, e.g., sensitivity analysis to help gauge the magnitude of error that may be present in the Stage III data.
 

REFERENCES

Anagnostou, E. N., W. F. Krajewski, D.-J. Seo, and E. R. Johnson, 1998: Mean-field rainfall bias studies for WSR-88D. J. Hydrol. Eng, 3(3), 149-159.

Breidenbach, J. P., D.-J. Seo, P. Tilles, and K. Roy, 1999: Accounting for radar beam blockage patterns in radar-derived precipitation mosaics for River Forecast Centers, Preprints, 15th Conf. on IIPS, Amer. Meteorol. Soc., 5.22, Dallas, TX.

Fortune, M. A., J. P. Breidenbach, and D.-J. Seo, 2002: Integration of bias corrected, satellite- based estimates of precipitation into AWIPS at River Forecast Centers, Preprints, Int. Symp. on AWIPS, Amer. Meteorol. Soc., J7.4, Orlando, FL.

Fulton, R. A., J. P. Breidenbach, D.-J. Seo, D. A. Miller, 1998: WSR-88D rainfall algorithm., Wea. Forecasting., 13, 377-395.

Greene, D. R. and M. D. Hudlow, 1982: Hydrometeorologic grid mapping procedures. AWRA Int. Symp. on Hydrometeor. June 13-17, Denver, CO. (available upon request from NWS/HL)

Hudlow, M. D., 1988: Technological development in real-time operational

hydrologic forecasting in the United States., J. Hydrol., 102, 69-92.

Johnson, D., M. Smith, V. Koren, and B. Finnerty, 1999: Comparing mean areal precipitation estimates from NEXRAD and rain gauge networks. J. Hydrol. Eng., 4(2), 117-124.

Reed, S. M., and D. R. Maidment, 1999: Coordinate transformations for using NEXRAD data in GIS-based hydrologic modeling. J. Hydrol. Eng., 4, 174-183.

Seo, D.-J., and J. P. Breidenbach, 2002: Real-time correction of spatially nonuniform bias in radar rainfall data using rain gauge measurements. to appear in J. Hydrometeor.

Seo, D.-J., R. A. Fulton, and J. P. Breidenbach, 1997: Final report for Interagency MOU among the NEXRAD Program, WSR-88D OSF and NWS/OH/HRL, NWS/OH/HRL, Silver Spring, MD. (Available upon request from NWS/HL)

Seo, D.-J., 1998b: Real-time estimation of rainfall fields using radar rainfall and rain gauge data. J. Hydrol., 208, 37-52.

Seo, D.-J., J. P. Breidenbach, and E. R. Johnson, 1999: Real-time estimation of mean field bias in radar rainfall data. J. Hydrol., 131-147.

Seo, D.-J., V. Koren, and N. Cajina, 2002: Real-time variational assimilation of hydrologic and hydrometeorological data into operational hydrologic forecasting., submitted to J. Hydrometeor. (available upon request from NWS/HL)

Seo, D.-J., J. P. Breidenbach, R. A. Fulton, D. A. Miller, and T. O'Bannon, 2000: Real-time adjustment of range-dependent bias in WSR-88D rainfall data due to nonuniform vertical profile of reflectivity., J. Hydrometeor., 1(3), 222-240.

Smith, J. A., and W. F. Krajewski, 1991: Estimation of the mean field bias of radar rainfall estimates., J. Appl. Meteor., 30, 397-412.

Smith, J. A., D.-J. Seo, M. L. Baeck, and M. D. Hudlow, 1996: An intercomparison study of NEXRAD precipitation estimates. Water Resour. Res., 32, 2035-2045.

Smith, J. A. M. L. Baeck, and M. Steiner, 1997: Hydrometeorological assessment of the NEXRAD rainfall algorithms. Final report to NOAA/NWS/OH/HRL, Dept. of Civil Eng. and Oper. Res., Princeton Univ., Princeton, NJ. (available upon request from NWS/HL)

Steiner, M. J., J. A. Smith, S. J. Burges, C. V. Alonso, and R. W. Darden, 1999: Effect of bias adjustment and rain gauge data quality control on radar rainfall estimation. Water Resour. Res., 35, 2487-2503.

Stellman, K. M., H. E. Fuelberg, R. Garza, and M. Mullusky, An examination of radar- and rain gauge-derived mean areal precipitation over Georgia watersheds, Weather and Forecasting, 16(1), 133-144.

Wang, D., M. B. Smith, Z. Zhang, S. Reed, and V. Koren, 2000: Statistical comparison of mean areal precipitation estimates from WSR-88D, operational and historical gauge networks. Preprints, 15th Conf. on Hydrol., Amer. Meteor. Soc., Long Beach, CA, 107-110.

Young, C. B., A. A. Bradley, W. F. Krajewski and A. Kruger, 2000: Evaluating NEXRAD multisensor precipitation estimates for operational hydrologic forecasting. J. Hydrometeor., 1, 241-254.

Main Link Categories:
Home | OHD | NWS