National Weather Service United States Department of Commerce
layer hidden off the screen
noaa picture RFC River Forecast Verification Team
Notes
weather picture
edge <<< Back To RFC River Forecast Verification Team

 

Verification Team Report (DRAFT)

1. Introduction

In January 2001, a team was assembled to explore River Forecast verification and assist in the implementation of practical verification methods at the River Forecast Centers. National river forecast verification attempts prior to this have met with limited success. The primary reasons for developing and implementing a standardized forecast verification program are to target resources to the areas that would provide the greatest improvement and to develop metrics upon which goals for improvement could be based. See the attached Verification Team Charter.

2. Summary of Team Activities

The team first met in Silver Spring, Maryland from February 27 - March 1, 2001. Other team meetings were conducted via conference calls. The agenda and presentations from the Verification Workshop as well as a list of attendees can be found on the web at:

http://hsp.nws.noaa.gov/oh/hrl/presentations/verificationworkshop.htm

The first day and a half were focused on the science of forecast verification and the national database and tools that would be made available for river forecast verification. The remainder of the meeting focused on the actual implementation of the verification database and tools, the metrics provided by those tools, and immediate problems to be resolved as well as recommendations for future enhancements and changes. Another point of discussion involved the metrics to be presented to the Corporate Board in the short-term.

A short summary of these items follows:

A. What do we present to the Corporate Board. We are verifying forecasts out 3 days with 3 response times (SLOW, MEDIUM, FAST) and are separating ABOVE and BELOW Flood Stage. The statistic of choice for now (we know we need to come up with more robust and meaningful statistical measures) is Mean Absolute Error (MAE)

B. Verification data is being archived at all RFC's. Statistics have been delivered by each RFC to Headquarters and to Regional HSDs. At this time since there is no software to analyze the statistics delivered to Headquarters, HSD folks have been using a Quattro Pro spreadsheet to aggregate statistics for multiple months and multiple RFCs for presentation to the Corporate Board. Bill Lerner's group is responsible for taking this data and preparing presentations for the Corporate Board. Aggregated statistics for multiple months and multiple RFCs should be available on the web at such a point as OCWWS develops the software to prepare these presentations.

C. The Verification GUI (IVP) has been delivered to the RFCs and Regions and is available to look at all the paired data and slice it and compute statistics on it in various ways.

Some concerns were raised based on the inability of the software and unavailability of the database elements necessary to ensure that the desired forecast information was actually transferred from the IHFS database to the verification database. Joe Ostrowski of MARFC examined this problem and documented a redesign that would address this concern. See the attached document VDMRedesign2.wpd.


3. Team Recommendations

The storage requirements of the verification database may tax the resources of some RFCs. The historical archive database will include verification data and redesign information has been forwarded to that team defining the needs of the verification program. However, storage issues may occur before implementation of the archive database. We need to determine which RFCs may run out of hard drive space if the current configuration remains in place for the next 2 years or longer.

Regarding the software tools for extracting and pairing the verification database, most of the minor bugs in the software have been fixed. The major hurdles, those given highest priority by the team, include:

1. From the status report of the Verification Team's progress, is the following "The team ... determined that the number one priority was to improve the software performance to generate statistics for up to 100 forecast points. This work is to be done by OHD but will be delayed until October so that OHD can respond to a NWS-wide call to move AWIPS capabilities to Linux." This is located on the RFC Development Manager's home page. The SQL statements used in the verify program were analyzed ( using the Informix set explain on command) and tested thoroughly. Although the sql statements are as efficient as is possible, it was recognized that the program would run much more efficiently if the sql statements were simplified and more of the sorting done in the C/C++ program. With the current version of the software the only way to improve performance, particularly on large datasets, was to update statistics on the tables to be accessed prior to running the verify pairing/stats option. A script was developed by Joe Ostrowski that updates the statistics and this improves the extraction process performance. For additional improvement, the source code needs to be re-written to do the pairing in a C-program rather than in Informix. This verification system enhancement was given a high priority by the Verification Team. It is not currently in the top 50 requirements at OHD (finding a better metric is the only verification item that is a high priority right now), but needs to be grandfathered in to the next round of requirements.
2. Enhance the verify program to use the redesigned data ingest process proposed by the Verification Team. Include in this enhancement the ability to select paired data according to Data QC codes. Several issues involve quality control. Northern latitude RFCs need a way to indicate via the QC flag that stage data are ice-affected. Also, the verify software pairs every observation with every forecast stage that has been extracted to the Verification database (this can be limited now using the Type Source flag). RVD forecast information is also ingested at some RFCs and there is a strong desire to NOT include RVDs from the WFOs in the RFC verification. At times, raw data rather than QC'd data have been extracted to the VDB for pairing (CNRFC). This requires a check of each "pairs" file to make sure that non-quality controlled data didn't make it through. In order to distinguish between various sources and quality of forecast data, it is recommended that the database redesign be implemented, however, this may be done in conjunction with the Historical Archive Database redesign.
3. Develop statistics that can be aggregated without losing their information. MAE is the best of a poor selection for a national River Stage Forecast Verification metric. The usefulness of the available statistics is on a basin level at this point. In order to set meaningful goals for improvement, better statistics are needed at the RFC, Regional and National Scale. A new metric based on distributions is being added to the IVP, but additional metrics should be sought, tested, and evaluated.
4. Develop the ability to extract information for categorical statistics. River flood verification statistics have been used in the Southern Region based on the categories of minor, moderate and major flooding, and all agree that this information is very important. HSD has plans to make software changes incorporating the categorical stats into the IVP. Categorical capability is planned to exist only in the backend interface called IVP. This capability needs to be an option in the verify program and part of the verify output. To make this happen, it will have to be pushed through the HSD chiefs on the next round of the requirements process.
5. The verify software needs to fill in the river response column in the verification matrix generated by the RFCs. Currently, this field "flow_size" is left blank in the output file even when the IHFS database has a value. This is a bug and should be treated as such and fixed immediately.
6. Port historical forecast and observed data to the VDB. Several RFCs had a significant amount of historical verification data which needed to be ported into the verification database (particularly, MBRFC, ABRFC, NCRFC, CNRFC, NERFC). This has been accomplished to the extent possible at MBRFC, but still needs to be done at other RFCs. Those RFCs with historical verification data already have local applications with which they can manipulate their historical data. Wait for the implementation of the national archive database to port this historical verification data.
7. Develop consensus on standard Type Source (TS) values (See the Shef Manual Table 4). If this set of TS values is standardized, it will provide a tool to measure the amount of "value" added to the forecast stages by the hydrologic and HAS forecasters. For example, using the sorting capability to look at only "FF", one can verify the value of QPF. Looking at "FX" forecasts, one can verify the added value of the NEXRAD radar precipitation estimates (the DPA product) and even compare those results to the gage only MAP forecasts.

A. Forecast from External User – RVD from a WFO (FE)
B. FMAP and Mods (FF)
C. MAPX and no Mods (FX)
D. No FMAP, Mods (FA)
E. MAPX and Mods (FB)
F. FMAT and Mods (FC)
G. No FMAP, No Mods (FU)
H. FMAP, No Mods (FV)
I. "True" FMAP, No Mods (FW) (Calculate MAP, rerun IFP using "True" MAP in place of FMAP in post processing)

It is recommended that a follow-up verification team be tasked with items 3, 4, 5,and 7. Future improvements (lower priority items) that could also be included in the charter of a follow-up team (any current team members who have the interest would certainly be encouraged to continue as members of the follow-up team) should include the following:

8. Develop methodology to capture the "big miss" forecast where no stages above flood level are forecast for a flood only forecast point and flooding occurs. In this case, no pair of forecast and observed data is produced because no forecast was issued.
9. Develop and test verification tools for probabilistic forecasts. As AHPS becomes a reality, these will be needed.
10. Develop procedures for producing the different model outputs (See 6 above).
11. Examine additional metrics: LEPS, BIS, Heidke, persistence skill etc.
12. Verify on flows rather than stages, requires good historical rating curve information.
13. Add the ability to use the *+ format for dates in the verify input files.
14. Add the ability to sort by category (in batch mode).
15. Compare calibration rmse vs fcst rmse
16. Add the ability to sort by area size – as well as other basin characteristics.
17. Add the ability to sort by synoptic time.
18. Add the ability to compute statistics on the change in stage.
19. Add the ability to verify by individual for Forecaster Calibration, compute single and anonymous aggregate.
20. Permit group development – make source code available to all.
21. Add the ability to do Log transforms.
22. Provide a milestone table in VDB. This would record important events (such as a recalibration) that would explain why verification statistics had changed for a particular station.
23. RMSE for flows above baseflow. Separating out baseflow and only verify the flow difference between observed flows and baseflow would allow the RMSE to become a more useful metric for river forecast verification.

 
edge
 National Weather Service/OHD/OCWWS/RFCs