::install_github("https://github.com/aphalo/fmi2",
remotesref = "HY2-develop")
1 Introduction
The Finnish Meteorological Institute shares a large amount of data openly. One way of accessing observations from meteorological stations is through its open data web site. Data can be obtained both by manually downloading them or automatically using a computer program or script through calls to an API interface. Package ‘fmi2’ implements calls to some of the WPS API functions, allowing downloads directly from R.
The FMI also makes available climate data from 1961 to the present spatially interpolated to a 10 km x 10 km grid. However, I haven’t found an API to access the gridded data.
Monthly climatology for individual weather stations is also available. There are separate pages for each variable.
I find it convenient to use an R scrip to download weather data from FMI, but data can also be downloaded through FMI’s web pages. The scrip below caches a local copy of the data, and when run again appends the new data.
The FMI WPS API has limits to the amount of data that can be downloaded per query. In addition if we need to frequently retrieve fresh data from the same station or group of stations we can retrieve the missing data and append it to those previously downloaded instead of fetching again and again the same data. Of course if data changes retrospectively in the FMI service we will want to retrieve a fresh copy also of the older data.
2 Installing R package ‘fmi2’
The original version is available at (https://github.com/rOpenGov/fmi2).
An edited version that adds a function to retrieve solar radiation and a version of this file as a second vignette is available at (https://github.com/aphalo/fmi2). The package can be installed directly from GitHub with:
Its dependencies may need to be installed manually from CRAN.
The documentation for the modified ‘fmi2’ package is available on-line and can be read without need to locally install the package.
3 Set up
library(fmi2)
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(sf)
Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
library(lubridate)
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
library(ggplot2)
library(ggpmisc)
Loading required package: ggpp
Registered S3 methods overwritten by 'ggpp':
method from
heightDetails.titleGrob ggplot2
widthDetails.titleGrob ggplot2
Attaching package: 'ggpp'
The following object is masked from 'package:ggplot2':
annotate
We can query by FMI station ID, for simplicity we set it here.
# ID code of the station from where we fetch data
<- 101004 # Kumpula, change as needed, both weather and radiation
stn_fmisid # stn_fmisid <- 101154 # Lammi, change as needed, only weather
<- "2022-12-31 22:00" # UTC midnight in Finland starttime.char
We can query information about the station.
4 Downloading hourly data values
We store data locally in a file and if the file exists load and append to it the missing data between its end and “now”. We need to be careful with time zones!! It is simplest to use UTC for the data and only change the time zone for plotting.
To force the data to be retrieved again, we only need to delete the file we use to store it locally (or “cache”).
if (!file.exists("fmi-weather-data-wide.Rda")) {
# Used only once or when replacing all data
<- ymd_hm(starttime.char, tz = "UTC")
starttime <- data.frame()
wide_weather_data else {
} load("fmi-weather-data-wide.Rda")
# we start 59 min after end of previously downloaded data
<-force_tz(max(wide_weather_data$time), tzone = "UTC") + minutes(59)
starttime
}
# endtime <- trunc(now(), units = "mins")
<- starttime # do not read new data endtime
# we read the new data to a new dataframe
# (to avoid appending repeatedly to a long one)
<- data.frame()
new_wide_data while (starttime < endtime) {
<- starttime + days(28) # keep query size at max of 4 weeks
sliceendtime if (sliceendtime > endtime) {
<- endtime
sliceendtime
}<- obs_weather_hourly(starttime = as.character(starttime),
stn_data endtime = as.character(sliceendtime),
fmisid = stn_fmisid)
<- stn_data %>%
slice_data ::spread(variable, value) %>%
tidyr# convert the sf object into a regular tibble
::st_set_geometry(NULL)
sf
<- rbind(new_wide_data, slice_data)
new_wide_data <- sliceendtime + minutes(1)
starttime cat(".")
}
range(new_wide_data$time) # freshly read
Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
Inf
Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
-Inf
[1] Inf -Inf
<- rbind(wide_weather_data, new_wide_data)
wide_weather_data range(wide_weather_data$time) # all data to be saved
[1] "2022-12-31 22:00:00 UTC" "2023-08-01 12:00:00 UTC"
colnames(wide_weather_data)
[1] "time" "PA_PT1H_AVG" "PRA_PT1H_ACC" "PRI_PT1H_MAX"
[5] "RH_PT1H_AVG" "TA_PT1H_AVG" "TA_PT1H_MAX" "TA_PT1H_MIN"
[9] "WAWA_PT1H_RANK" "WD_PT1H_AVG" "WS_PT1H_AVG" "WS_PT1H_MAX"
[13] "WS_PT1H_MIN"
save(wide_weather_data, file = "fmi-weather-data-wide.Rda")
The description of the variables can be obtained from the server.
::describe_variables(colnames(wide_weather_data)[-1]) fmi2
# A tibble: 12 × 6
variable label base_phenomenon unit stat_function agg_period
<chr> <chr> <chr> <chr> <chr> <chr>
1 PA_PT1H_AVG Air pressure Air pressure hPa avg PT1H
2 PRA_PT1H_ACC Precipitation … Amount of prec… mm acc PT1H
3 PRI_PT1H_MAX Maximum precip… Amount of prec… mm/h max PT1H
4 RH_PT1H_AVG Relative humid… Humidity % avg PT1H
5 TA_PT1H_AVG Air temperature Temperature degC avg PT1H
6 TA_PT1H_MAX Highest temper… Temperature degC max PT1H
7 TA_PT1H_MIN Lowest tempera… Temperature degC min PT1H
8 WAWA_PT1H_RANK Present weathe… Weather <NA> rank PT1H
9 WD_PT1H_AVG Wind direction Wind deg avg PT1H
10 WS_PT1H_AVG Wind speed Wind m/s avg PT1H
11 WS_PT1H_MAX Maximum wind s… Wind m/s max PT1H
12 WS_PT1H_MIN Minimum wind s… Wind m/s min PT1H
ggplot(wide_weather_data, aes(with_tz(time, tzone = "EET"), TA_PT1H_AVG)) +
geom_line()
5 Downloading radiation data at 1 min
The station ID was set above, and we use it again. However, FEW WEATHER STATIONS measure radiation! e.g., Kumpula does, but Lammi does not.
if (!file.exists("fmi-sun-data-wide.Rda")) {
# Used only once or when replacing all data
<- "2023-01-15 22:00" # UTC at midnight in Finland
starttime.char <- ymd_hm(starttime.char)
starttime <- data.frame()
wide_sun_data else {
} load("fmi-sun-data-wide.Rda")
# we start 1 h after end of previously downloaded data
<- max(wide_sun_data$time) + minutes(1) + hours(2) # convert to UTC + 2h
starttime
}
# endtime <- trunc(now() - minutes(30), units = "mins")
<- starttime # do not read new data endtime
# we read the new data to a new dataframe
# (to avoid appending repeatedly to a long one)
<- data.frame()
new_wide_data while (starttime < endtime) {
<- starttime + days(1) # keep query size at max of 1 week
sliceendtime if (sliceendtime > endtime) {
<- endtime
sliceendtime
}<- obs_radiation_minute(starttime = as.character(starttime),
stn_data endtime = as.character(sliceendtime),
fmisid = 101004)
<- stn_data %>%
slice_data ::spread(variable, value) %>%
tidyr# convert the sf object into a regular tibble
::st_set_geometry(NULL)
sf
<- rbind(new_wide_data, slice_data)
new_wide_data <- sliceendtime + minutes(1)
starttime cat(".")
}
range(new_wide_data$time)
Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
Inf
Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
-Inf
[1] Inf -Inf
<- rbind(wide_sun_data, new_wide_data)
wide_sun_data range(wide_sun_data$time)
[1] "2023-01-15 22:00:00 UTC" "2023-06-29 20:57:00 UTC"
colnames(wide_sun_data)
[1] "time" "DIFF_1MIN" "DIR_1MIN" "GLOB_1MIN" "LWIN_1MIN"
[6] "LWOUT_1MIN" "NET_1MIN" "REFL_1MIN" "SUND_1MIN" "UVB_U"
save(wide_sun_data, file = "fmi-sun-data-wide.Rda")
::describe_variables(colnames(wide_sun_data)[-1]) fmi2
# A tibble: 9 × 6
variable label base_phenomenon unit stat_function agg_period
<chr> <chr> <chr> <chr> <chr> <chr>
1 DIFF_1MIN "Diffuse radiation" Solar radiation W/m2 avg PT1M
2 DIR_1MIN "Direct solar radia… Solar radiation W/m2 avg PT1M
3 GLOB_1MIN "Global radiation" Solar radiation W/m2 avg PT1M
4 LWIN_1MIN "Long wave solar ra… UV radiation W/m2 avg PT1M
5 LWOUT_1MIN "Long wave outgoing… UV radiation W/m2 avg PT1M
6 NET_1MIN "Radiation balance" Solar radiation W/m2 avg PT1M
7 REFL_1MIN "Reflected radiatio… Solar radiation W/m2 avg PT1M
8 SUND_1MIN "Sunshine duration" Sunshine durat… s acc PT1M
9 UVB_U "Ultraviolet irradi… UV radiation index avg PT1M
ggplot(wide_sun_data, aes(with_tz(time, tzone = "EET"), GLOB_1MIN)) +
geom_line()