|
Verification
Team Report (DRAFT)
1. Introduction
In January 2001, a team
was assembled to explore River Forecast verification and assist
in the implementation of practical verification methods at the River
Forecast Centers. National river forecast verification attempts
prior to this have met with limited success. The primary reasons
for developing and implementing a standardized forecast verification
program are to target resources to the areas that would provide
the greatest improvement and to develop metrics upon which goals
for improvement could be based. See the attached Verification Team
Charter.
2. Summary of Team
Activities
The team first met in
Silver Spring, Maryland from February 27 - March 1, 2001. Other
team meetings were conducted via conference calls. The agenda and
presentations from the Verification Workshop as well as a list of
attendees can be found on the web at:
http://hsp.nws.noaa.gov/oh/hrl/presentations/verificationworkshop.htm
The first day and a half
were focused on the science of forecast verification and the national
database and tools that would be made available for river forecast
verification. The remainder of the meeting focused on the actual
implementation of the verification database and tools, the metrics
provided by those tools, and immediate problems to be resolved as
well as recommendations for future enhancements and changes. Another
point of discussion involved the metrics to be presented to the
Corporate Board in the short-term.
A short summary of these
items follows:
A. What do we
present to the Corporate Board. We are verifying forecasts out
3 days with 3 response times (SLOW, MEDIUM, FAST) and are separating
ABOVE and BELOW Flood Stage. The statistic of choice for now (we
know we need to come up with more robust and meaningful statistical
measures) is Mean Absolute Error (MAE)
B. Verification data is being archived at all RFC's. Statistics
have been delivered by each RFC to Headquarters and to Regional
HSDs. At this time since there is no software to analyze the statistics
delivered to Headquarters, HSD folks have been using a Quattro
Pro spreadsheet to aggregate statistics for multiple months and
multiple RFCs for presentation to the Corporate Board. Bill Lerner's
group is responsible for taking this data and preparing presentations
for the Corporate Board. Aggregated statistics for multiple months
and multiple RFCs should be available on the web at such a point
as OCWWS develops the software to prepare these presentations.
C. The Verification
GUI (IVP) has been delivered to the RFCs and Regions and is available
to look at all the paired data and slice it and compute statistics
on it in various ways.
Some concerns were raised
based on the inability of the software and unavailability of the
database elements necessary to ensure that the desired forecast
information was actually transferred from the IHFS database to the
verification database. Joe Ostrowski of MARFC examined this problem
and documented a redesign that would address this concern. See the
attached document VDMRedesign2.wpd.
3. Team Recommendations
The storage requirements
of the verification database may tax the resources of some RFCs.
The historical archive database will include verification data and
redesign information has been forwarded to that team defining the
needs of the verification program. However, storage issues may occur
before implementation of the archive database. We need to determine
which RFCs may run out of hard drive space if the current configuration
remains in place for the next 2 years or longer.
Regarding the software
tools for extracting and pairing the verification database, most
of the minor bugs in the software have been fixed. The major hurdles,
those given highest priority by the team, include:
1. From the
status report of the Verification Team's progress, is the following
"The team ... determined that the number one priority was
to improve the software performance to generate statistics for
up to 100 forecast points. This work is to be done by OHD but
will be delayed until October so that OHD can respond to a NWS-wide
call to move AWIPS capabilities to Linux." This is located
on the RFC Development Manager's home page. The SQL statements
used in the verify program were analyzed ( using the Informix
set explain on command) and tested thoroughly. Although the sql
statements are as efficient as is possible, it was recognized
that the program would run much more efficiently if the sql statements
were simplified and more of the sorting done in the C/C++ program.
With the current version of the software the only way to improve
performance, particularly on large datasets, was to update statistics
on the tables to be accessed prior to running the verify pairing/stats
option. A script was developed by Joe Ostrowski that updates the
statistics and this improves the extraction process performance.
For additional improvement, the source code needs to be re-written
to do the pairing in a C-program rather than in Informix. This
verification system enhancement was given a high priority by the
Verification Team. It is not currently in the top 50 requirements
at OHD (finding a better metric is the only verification item
that is a high priority right now), but needs to be grandfathered
in to the next round of requirements.
2. Enhance the verify program to use the redesigned data
ingest process proposed by the Verification Team. Include in this
enhancement the ability to select paired data according to Data
QC codes. Several issues involve quality control. Northern latitude
RFCs need a way to indicate via the QC flag that stage data are
ice-affected. Also, the verify software pairs every observation
with every forecast stage that has been extracted to the Verification
database (this can be limited now using the Type Source flag).
RVD forecast information is also ingested at some RFCs and there
is a strong desire to NOT include RVDs from the WFOs in the RFC
verification. At times, raw data rather than QC'd data have been
extracted to the VDB for pairing (CNRFC). This requires a check
of each "pairs" file to make sure that non-quality controlled
data didn't make it through. In order to distinguish between various
sources and quality of forecast data, it is recommended that the
database redesign be implemented, however, this may be done in
conjunction with the Historical Archive Database redesign.
3. Develop statistics that can be aggregated without losing
their information. MAE is the best of a poor selection for a national
River Stage Forecast Verification metric. The usefulness of the
available statistics is on a basin level at this point. In order
to set meaningful goals for improvement, better statistics are
needed at the RFC, Regional and National Scale. A new metric based
on distributions is being added to the IVP, but additional metrics
should be sought, tested, and evaluated.
4. Develop the ability to extract information for categorical
statistics. River flood verification statistics have been used
in the Southern Region based on the categories of minor, moderate
and major flooding, and all agree that this information is very
important. HSD has plans to make software changes incorporating
the categorical stats into the IVP. Categorical capability is
planned to exist only in the backend interface called IVP. This
capability needs to be an option in the verify program and part
of the verify output. To make this happen, it will have to be
pushed through the HSD chiefs on the next round of the requirements
process.
5. The verify software needs to fill in the river response
column in the verification matrix generated by the RFCs. Currently,
this field "flow_size" is left blank in the output file
even when the IHFS database has a value. This is a bug and should
be treated as such and fixed immediately.
6. Port historical forecast and observed data to the VDB.
Several RFCs had a significant amount of historical verification
data which needed to be ported into the verification database
(particularly, MBRFC, ABRFC, NCRFC, CNRFC, NERFC). This has been
accomplished to the extent possible at MBRFC, but still needs
to be done at other RFCs. Those RFCs with historical verification
data already have local applications with which they can manipulate
their historical data. Wait for the implementation of the national
archive database to port this historical verification data.
7. Develop consensus on standard Type Source (TS) values
(See the Shef Manual Table 4). If this set of TS values is standardized,
it will provide a tool to measure the amount of "value"
added to the forecast stages by the hydrologic and HAS forecasters.
For example, using the sorting capability to look at only "FF",
one can verify the value of QPF. Looking at "FX" forecasts,
one can verify the added value of the NEXRAD radar precipitation
estimates (the DPA product) and even compare those results to
the gage only MAP forecasts.
A. Forecast from
External User – RVD from a WFO (FE)
B. FMAP and Mods (FF)
C. MAPX and no Mods (FX)
D. No FMAP, Mods (FA)
E. MAPX and Mods (FB)
F. FMAT and Mods (FC)
G. No FMAP, No Mods (FU)
H. FMAP, No Mods (FV)
I. "True" FMAP, No Mods (FW) (Calculate MAP, rerun
IFP using "True" MAP in place of FMAP in post processing)
It is recommended that
a follow-up verification team be tasked with items 3, 4, 5,and 7.
Future improvements (lower priority items) that could also be included
in the charter of a follow-up team (any current team members
who have the interest would certainly be encouraged to continue
as members of the follow-up team) should include the following:
8. Develop methodology
to capture the "big miss" forecast where no stages above
flood level are forecast for a flood only forecast point and flooding
occurs. In this case, no pair of forecast and observed data is produced
because no forecast was issued.
9. Develop and test verification tools for probabilistic
forecasts. As AHPS becomes a reality, these will be needed.
10. Develop procedures for producing the different model
outputs (See 6 above).
11. Examine additional metrics: LEPS, BIS, Heidke, persistence
skill etc.
12. Verify on flows rather than stages, requires good historical
rating curve information.
13. Add the ability to use the *+ format for dates in the
verify input files.
14. Add the ability to sort by category (in batch mode).
15. Compare calibration rmse vs fcst rmse
16. Add the ability to sort by area size – as well as
other basin characteristics.
17. Add the ability to sort by synoptic time.
18. Add the ability to compute statistics on the change in
stage.
19. Add the ability to verify by individual for Forecaster
Calibration, compute single and anonymous aggregate.
20. Permit group development – make source code available
to all.
21. Add the ability to do Log transforms.
22. Provide a milestone table in VDB. This would record important
events (such as a recalibration) that would explain why verification
statistics had changed for a particular station.
23. RMSE for flows above baseflow. Separating out baseflow
and only verify the flow difference between observed flows and baseflow
would allow the RMSE to become a more useful metric for river forecast
verification.
|