Ideally weather records would be complete for every station – no missing values for any day, month or year. Unfortunately, in reality many values are missing. An example of a temperature record, with several monthly values missing, is shown below. The Rochester record spans a total of 416 months, of which 118 are missing.
Possible reasons that data values are missing from weather records are given below.
- Historically – and still in several weather stations today – weather measurements are taken by human observers. If the human observer is absent, there are no observations for that day (or period of absence).
- Equipment may fail, and until replacements arrive there are no observations.
- Stations may be closed down and opened again later, or moved to a new location.
- Extreme weather events, such as floods, may damage or destroy weather stations, taking them offline for a period of time.
- Some observations may fail quality control checks and may be removed from the record.
If too many daily values are missing, an accurate monthly average value cannot be calculated and is better left out. Missing daily values become missing monthly values, which likewise become missing annual values.
What can be done to improve this situation? The missing values can be estimated to infill the incomplete record. There are generally two approaches to perform infilling:
- Data from the same record can be used to estimate the missing values. For example, to estimate a missing January value from a monthly record (such as the Rochester one shown above), the average of all available January values can be used as substitute.
- Data from neighbouring stations can be used to estimate missing values.
The drawback of approach 1 above is its inability to infer extraordinary events. Suppose the missing January mentioned above were exceptionally cold. It would not be possible to tell this by looking at other measurements from the same site. A better approach in this case would be to investigate neighbouring weather stations – those that do have a value recorded for the date that is missing.
The missing value can then be recovered by combining data from neighbouring stations, and exploiting the known historical differences between the station under question and its neighbours.
More technical details are available in this report.