class: center, middle, inverse, title-slide # Introduction to Geodata in R ## CORRELCON 2020 ### Michael Matiu ### 2020/11/07 --- ## What is geospatial data? Everything that can be pointed to somewhere on the earth (for now). .center[<a title="https://clauswilke.com/dataviz/geospatial-data.html#fig:world-orthographic" href="https://clauswilke.com/dataviz/geospatial-data.html#fig:world-orthographic"><img src="https://clauswilke.com/dataviz/geospatial_data_files/figure-html/world-orthographic-1.png" height="400px"></a>] .tiny[Source: Claus O. Wilke, *Fundamentals of Data Visualization*] --- ## Why projections? Problem: The world is a sphere, but maps are 2D. Task: Peel an orange, and lay the peels flat on the table. .pull-left[.center[ <br><br> <a title="Evan-Amos, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Orange-Whole-%26-Split.jpg"><img width="512" alt="Orange-Whole-&-Split" src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/7b/Orange-Whole-%26-Split.jpg/512px-Orange-Whole-%26-Split.jpg"></a> ]] .pull-right[ ![](intro-spatial-r_files/figure-html/worldmap-simple-1.png)<!-- --> ] --- ## Projections overview .panelset[ .panel[.panel-name[Issue] Exact projection of curves surfaces on flat surfaces impossible. Options: - preserve shapes (*conformal*) - preserve area (*equa-area*) - preserve direction (*azimuthal*) - preserve distance (*equidistant*) - preserve none of the above exactly, but approximate multiple features ] .panel[.panel-name[Ex1: Goode] Goode homolosine projection (equal-area) <a title="Strebe, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Goode_homolosine_projection_SW.jpg"><img width="auto" height="350px" alt="Goode homolosine projection SW" src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f2/Goode_homolosine_projection_SW.jpg/1024px-Goode_homolosine_projection_SW.jpg"></a> .tiny[Source: <a href="https://commons.wikimedia.org/wiki/File:Goode_homolosine_projection_SW.jpg">Strebe</a>, <a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>, via Wikimedia Commons] ] .panel[.panel-name[Ex2: Mercator] Mercator (conformal) <a title="Strebe, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Mercator_projection_Square.JPG"><img width="auto" height="350px" alt="Mercator projection Square" src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/73/Mercator_projection_Square.JPG/1024px-Mercator_projection_Square.JPG"></a> .tiny[Source: <a href="https://commons.wikimedia.org/wiki/File:Mercator_projection_Square.JPG">Strebe</a>, <a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>, via Wikimedia Commons] ] .panel[.panel-name[Ex3: Mollweide] Mollweide (equal-area) <a title="Strebe, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Mollweide_projection_SW.jpg"><img width="auto" height="350px" alt="Mollweide projection SW" src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Mollweide_projection_SW.jpg/1024px-Mollweide_projection_SW.jpg"></a> .tiny[Source: <a href="https://commons.wikimedia.org/wiki/File:Mollweide_projection_SW.jpg">Strebe</a>, <a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>, via Wikimedia Commons] ] .panel[.panel-name[Conclusion] Keep in mind that different projections exist. In most cases, though, no need to bother: - projection is often predetermined - software usually has good defaults ] ] --- ## Spatial data .panelset[ .panel[.panel-name[Vector] .center[<a href="https://geocompr.robinlovelace.net/spatial-class.html#fig:sf-ogc"><img src="https://geocompr.robinlovelace.net/figures/sf-classes.png" height="400px"></a>] .tiny[Source: Robin Lovelace, Jakub Nowosad, Jannes Muenchow, *Geocomputation with R*] ] .panel[.panel-name[Raster] .center[<a href="https://geocompr.robinlovelace.net/spatial-class.html#fig:raster-intro-plot"><img src="https://geocompr.robinlovelace.net/02-spatial-data_files/figure-html/raster-intro-plot-1.png" height="400px"></a>] .tiny[Source: Robin Lovelace, Jakub Nowosad, Jannes Muenchow, *Geocomputation with R*] ] .panel[.panel-name[CRS] Coordinate Reference System. Defines how spatial elements relate to the earth. Two types of coordinate systems: - **geographic**: longitude and latitude, datum (defines to what sphere/ellipsoid the lon & lat refer to) - **projected**: x (Easting) and y (Northing) on implicit flat surface, origin, and linear units (usually meters) Conversion between these two, and between all projections handled by PROJ library. Information on CRS usually given as proj4string (e.g. "+proj=longlat +datum=WGS84 +no_defs"), as EPSG code (e.g. 4326) or as WKT (well-known text representation of CRS). ] .panel[.panel-name[Attributes and values] Additional information on the spatial object (point, line, polygon). Examples: - Name - Country - Population - Date - Temperature - Elevation For rasters the cell values are of main interest. Values must be numeric. ] ] --- ## Spatial objects in R ### Basic The one-stop-shop for vector data: the **sf** package For raster data: **raster** and **stars** Quick exploration: **mapview** ### And a ton of other packages... ...for any need you might imagine. `->` [talk by Alexandra](https://alexandrakapp.github.io/correlcon_presentation/#/) .tiny[Example data used here is coming from https://www.gadm.org/, https://www.naturalearthdata.com, and https://chelsa-climate.org/.] --- ## sf .panelset[ .panel[.panel-name[why sf?] - succesor to **sp** package. - *supercharged* data.frame + attributes are data.frame columns + one additional column that holds the geometry (spatial objects) + plus stores CRS, extent (bounding box) + tidy: each row is one spatial object (point, line, polygon, or multi*) - closely integrated into the tidyverse (dplyr, pipe, geom_sf) - consistent naming: functions in **sf** start with `st_*` ] .panel[.panel-name[read] ```r # NUTS level 1 of Croatia sf_polygons <- st_read("data/gadm36_HRV_shp/gadm36_HRV_1.shp") ``` ``` ## Reading layer `gadm36_HRV_1' from data source `C:\projects-r-web\slides\2020-11-07-correlcon\data\gadm36_HRV_shp\gadm36_HRV_1.shp' using driver `ESRI Shapefile' ## Simple feature collection with 21 features and 10 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 13.48958 ymin: 42.38543 xmax: 19.43518 ymax: 46.55052 ## geographic CRS: WGS 84 ``` ] .panel[.panel-name[overview] ```r class(sf_polygons) ``` ``` ## [1] "sf" "data.frame" ``` ```r print(sf_polygons, n = 5) # print only first 5 instead of default 10 features ``` ``` ## Simple feature collection with 21 features and 10 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 13.48958 ymin: 42.38543 xmax: 19.43518 ymax: 46.55052 ## geographic CRS: WGS 84 ## First 5 features: ## GID_0 NAME_0 GID_1 NAME_1 VARNAME_1 ## 1 HRV Croatia HRV.1_1 Bjelovarska-Bilogorska Bjelovar-Bilagora ## 2 HRV Croatia HRV.2_1 Brodsko-Posavska Slavonski Brod-Posavina ## 3 HRV Croatia HRV.3_1 Dubrovacko-Neretvanska Dubrovnik-Neretva ## 4 HRV Croatia HRV.4_1 Grad Zagreb <NA> ## 5 HRV Croatia HRV.5_1 Istarska Istra ## NL_NAME_1 TYPE_1 ENGTYPE_1 CC_1 HASC_1 geometry ## 1 <NA> Županija County <NA> HR.BB MULTIPOLYGON (((17.40924 45... ## 2 <NA> Županija County <NA> HR.SP MULTIPOLYGON (((17.14642 45... ## 3 <NA> Županija County <NA> HR.DN MULTIPOLYGON (((16.27514 42... ## 4 <NA> Grad City <NA> HR.GZ MULTIPOLYGON (((16.06084 45... ## 5 <NA> Županija County <NA> HR.IS MULTIPOLYGON (((13.87487 44... ``` ] .panel[.panel-name[extract] ```r st_bbox(sf_polygons) ``` ``` ## xmin ymin xmax ymax ## 13.48958 42.38543 19.43518 46.55052 ``` ```r # st_crs(sf_polygons) # CRS object st_crs(sf_polygons)$epsg ``` ``` ## [1] 4326 ``` ```r st_crs(sf_polygons)$proj4string ``` ``` ## [1] "+proj=longlat +datum=WGS84 +no_defs" ``` ] .panel[.panel-name[tidyverse] ```r sf_polygons %>% dplyr::select(starts_with("NAM")) %>% dplyr::filter(stringr::str_starts(NAME_1, "B")) ``` ``` ## Simple feature collection with 2 features and 2 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 16.47865 ymin: 45.05429 xmax: 18.56575 ymax: 46.09102 ## geographic CRS: WGS 84 ## NAME_0 NAME_1 geometry ## 1 Croatia Bjelovarska-Bilogorska MULTIPOLYGON (((17.40924 45... ## 2 Croatia Brodsko-Posavska MULTIPOLYGON (((17.14642 45... ``` ] .panel[.panel-name[plot] ```r plot(sf_polygons) ``` <img src="intro-spatial-r_files/figure-html/unnamed-chunk-5-1.png" height="350px" /> ] .panel[.panel-name[mapview] In external file, because of size (mapview object holds *all* data) ```r mapview(sf_polygons) ``` ] ] --- ## More vector data and operations .panelset[ .panel[.panel-name[points & lines] ```r # naturalearth cities & rivers sf_points <- st_read("data/ne_10m_populated_places_simple/ne_10m_populated_places_simple.shp") ``` ``` ## Reading layer `ne_10m_populated_places_simple' from data source `C:\projects-r-web\slides\2020-11-07-correlcon\data\ne_10m_populated_places_simple\ne_10m_populated_places_simple.shp' using driver `ESRI Shapefile' ## Simple feature collection with 7343 features and 38 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -179.59 ymin: -90 xmax: 179.3833 ymax: 82.48332 ## geographic CRS: WGS 84 ``` ```r sf_lines <- st_read("data/ne_10m_rivers_lake_centerlines/ne_10m_rivers_lake_centerlines.shp") ``` ``` ## Reading layer `ne_10m_rivers_lake_centerlines' from data source `C:\projects-r-web\slides\2020-11-07-correlcon\data\ne_10m_rivers_lake_centerlines\ne_10m_rivers_lake_centerlines.shp' using driver `ESRI Shapefile' ## Simple feature collection with 1455 features and 34 fields (with 1 geometry empty) ## geometry type: MULTILINESTRING ## dimension: XY ## bbox: xmin: -164.9035 ymin: -52.15773 xmax: 177.5204 ymax: 75.79348 ## geographic CRS: WGS 84 ``` ] .panel[.panel-name[cropping & intersecting] - crop: bounding box (rectangle) - intersection: within polygon ```r sf_points_crop <- st_crop(sf_points, sf_polygons) sf_points_intersect <- st_intersection(sf_points, sf_polygons) ``` ```r mapview(list(sf_points_crop, sf_points_intersect, sf_polygons)) ``` ] .panel[.panel-name[mapview multiple] ```r sf_lines_crop <- st_crop(sf_lines, sf_polygons) ``` ```r mapview(list(sf_points_crop, sf_lines_crop, sf_polygons)) ``` ] ] --- ## ggplot2 integration .panelset[ .panel[.panel-name[intro] The **sf** package provides three geoms and a coord. - `geom_sf()` is used for points, lines, and polygons. Draws the appropriate geom depending on feature. - `geom_sf_text()` and `geom_sf_label()` for text and labels. - `coord_sf()` makes sure that all layers use a common CRS. By default, takes the CRS the first `geom_sf()`, but can be overridden. The **stars** packages provides `geom_stars()`. Important argument is `downsample=`, which allows to plot only every x-th value (the plot function for raster has by default a `maxpixels=500000` argument). ] .panel[.panel-name[code] ```r ggplot()+ geom_sf(data = sf_polygons)+ geom_sf(data = sf_points_intersect, aes(size = pop_max/1000))+ coord_sf()+ # optional theme_minimal() ``` ] .panel[.panel-name[plot] <img src="intro-spatial-r_files/figure-html/unnamed-chunk-12-1.png" height="450px" /> ] ] --- ## Manually creating sf objects Often not needed, since most spatial data comes as shape files or similar. The only use is for point data that comes as text. ```r tbl_ber <- tibble(name = "Berlin", long = 13.408333, lat = 52.5186) sf_ber <- st_as_sf(tbl_ber, coords = c("long", "lat"), crs = 4326) # 4326 is the EPSG code for longlat projection # also possible to supply CRS object or proj4string, but this is shorter sf_ber ``` ``` ## Simple feature collection with 1 feature and 1 field ## geometry type: POINT ## dimension: XY ## bbox: xmin: 13.40833 ymin: 52.5186 xmax: 13.40833 ymax: 52.5186 ## geographic CRS: WGS 84 ## # A tibble: 1 x 2 ## name geometry ## * <chr> <POINT [°]> ## 1 Berlin (13.40833 52.5186) ``` --- ## Raster .panelset[ .panel[.panel-name[read] ```r # average November precipitation (mm) rr <- raster("data/CHELSA_prec_11_V1.2_land.tif") rr ``` ``` ## class : RasterLayer ## dimensions : 20880, 43200, 902016000 (nrow, ncol, ncell) ## resolution : 0.008333333, 0.008333333 (x, y) ## extent : -180.0001, 179.9999, -90.00014, 83.99986 (xmin, xmax, ymin, ymax) ## crs : +proj=longlat +datum=WGS84 +no_defs ## source : C:/projects-r-web/slides/2020-11-07-correlcon/data/CHELSA_prec_11_V1.2_land.tif ## names : CHELSA_prec_11_V1.2_land ## values : -32768, 32767 (min, max) ``` ] .panel[.panel-name[plot] ```r plot(rr) ``` <img src="intro-spatial-r_files/figure-html/unnamed-chunk-15-1.png" width="800px" height="400px" /> ] .panel[.panel-name[crop] Cropping is again for the bbox. The raster package is still using sp (*Spatial*) classes, but no worries. ```r rr_crop <- crop(rr, as(sf_polygons, "Spatial")) plot(rr_crop) ``` <img src="intro-spatial-r_files/figure-html/unnamed-chunk-16-1.png" height="350px" /> ] .panel[.panel-name[mask] Masking is similar to intersection for `sf`. ```r rr_mask <- mask(rr_crop, as(sf_polygons, "Spatial")) # mask is faster, when pre-cropped plot(rr_mask) ``` <img src="intro-spatial-r_files/figure-html/unnamed-chunk-17-1.png" height="350px" /> ] .panel[.panel-name[multiple layers] Rasters can contain multiple layers (think cube or 3D array instead of 2D). `raster()` reads by default the first layer (band), but can be changed with `raster(x, band = .)` To read in all layers of a file, use `brick()`. Then there is `stack()` which can create a collection of layers from multiple objects or multiple files. (necessary to have the same extent and resolution) ] ] --- ## Combining vector with raster .panelset[ .panel[.panel-name[points] ```r nov_prec <- extract(rr, as(sf_points, "Spatial")) str(nov_prec) # only values ``` ``` ## num [1:7343] 106 97 100 92 85 24 17 22 10 12 ... ``` ```r sf_points2 <- sf_points %>% mutate(nov_prec = nov_prec) # add them to the sf_points ``` ] .panel[.panel-name[points example] ```r # show the wettest cities in Germany in November sf_points2 %>% dplyr::select(nameascii, adm0_a3, nov_prec) %>% dplyr::filter(adm0_a3 == "DEU") %>% dplyr::arrange(-nov_prec) ``` ``` ## Simple feature collection with 58 features and 3 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: 6.750017 ymin: 47.85035 xmax: 14.32997 ymax: 54.78375 ## geographic CRS: WGS 84 ## First 10 features: ## nameascii adm0_a3 nov_prec geometry ## 1 Freiburg DEU 94 POINT (7.869948 48.00042) ## 2 Wuppertal DEU 90 POINT (7.169991 51.25001) ## 3 Saarbrucken DEU 80 POINT (6.970003 49.25039) ## 4 Rosenheim DEU 80 POINT (12.13331 47.85035) ## 5 Heidelberg DEU 79 POINT (8.699975 49.41999) ## 6 Flensburg DEU 77 POINT (9.433315 54.78375) ## 7 Kiel DEU 76 POINT (10.13002 54.33039) ## 8 Essen DEU 73 POINT (7.016615 51.45) ## 9 Emden DEU 72 POINT (7.216655 53.36668) ## 10 Bremerhaven DEU 72 POINT (8.579982 53.55044) ``` ] .panel[.panel-name[raster to data.frame] We take the smallest (masked) raster. Things can get quite big, so watch out. ```r df_rr_mask <- as.data.frame(rr_mask, xy = T, na.rm = T) tibble(df_rr_mask) ``` ``` ## # A tibble: 93,826 x 3 ## x y CHELSA_prec_11_V1.2_land ## <dbl> <dbl> <int> ## 1 16.3 46.5 76 ## 2 16.3 46.5 76 ## 3 16.4 46.5 75 ## 4 16.4 46.5 74 ## 5 16.4 46.5 74 ## 6 16.3 46.5 76 ## 7 16.3 46.5 76 ## 8 16.3 46.5 77 ## 9 16.3 46.5 77 ## 10 16.3 46.5 76 ## # ... with 93,816 more rows ``` ] .panel[.panel-name[polygons] Option A: Use `extract()` as in the points case. Option B: Rasterize the shape file, and then use raster operations or coerce to data.frame. ```r rr_polygons <- rasterize(as(sf_polygons, "Spatial"), rr_mask) rr_polygons ``` ``` ## class : RasterLayer ## dimensions : 500, 713, 356500 (nrow, ncol, ncell) ## resolution : 0.008333333, 0.008333333 (x, y) ## extent : 13.49153, 19.43319, 42.38319, 46.54986 (xmin, xmax, ymin, ymax) ## crs : +proj=longlat +datum=WGS84 +no_defs ## source : memory ## names : layer ## values : 1, 21 (min, max) ## attributes : ## ID GID_0 NAME_0 GID_1 NAME_1 VARNAME_1 ## from: 1 HRV Croatia HRV.1_1 Bjelovarska-Bilogorska Bjelovar-Bilagora ## to : 21 HRV Croatia HRV.21_1 Zagrebacka Zagreb ## NL_NAME_1 TYPE_1 ENGTYPE_1 CC_1 HASC_1 ## <NA> Županija County <NA> HR.BB ## <NA> Županija County <NA> HR.ZG ``` ] .panel[.panel-name[polygons example] ```r plot(rr_polygons) ``` <img src="intro-spatial-r_files/figure-html/unnamed-chunk-22-1.png" height="400px" /> ] ] --- ## Further topics not covered here **Spatial operations** - vector: set operations, transformations - raster: functions on cell values, layers, ... **Reprojecting** - necessary for data analysis when data comes in different projections (just for visualization not always needed) - for raster data, this usually involves interpolation (nearest neighbour, bilinear, ...) **File formats** - Shape files (`.shp`) and GeoTIFFs (`.tif`) are most common - For climate grids standard is NetCDF (`.nc`) - but also other formats in the wild: GeoJSON (`.geojson`), Ascii (`.asc`), KML (`.kml`), ... - and integration into databases (e.g. PostGIS `->` talk by Malte) <!-- **Spatial statistics** - spatial correlation - variograms - kriging **Satellite image analysis** - layer (band) operations - classification - nice intro at last WhyR conference: --> --- ## Links - Actually a dataviz book. Only one section on geodata, but very concise:<br><br> Claus O. Wilke, *Fundamentals of Data Visualization*, https://clauswilke.com/dataviz/ - In depth view of spatial data analysis with R:<br><br> Robin Lovelace, Jakub Nowosad, Jannes Muenchow, *Geocomputation with R*, https://geocompr.robinlovelace.net/ - For the hard-core technical stuff:<br><br> Spatial Data Science with R — R Spatial, https://rspatial.org/ --- class: center, middle # `? -> ! -> ???`