2. Available data • chronosphere

Accessing registry tables

The chronosphere provides access to thousands of data items. The available data can be queried with the client package, or can be inspected on the project’s (in-development) website. To access the available data in R, we need client package.

library("chronosphere")

## Chronosphere - Evolving Earth System Variables
## Important: never fetch data as a superuser / with admin. privileges!
## 
## Note that the package was split for efficient maintenance and development:
##  - Plate tectonic calculations -> package 'rgplates'
##  - Arrays of raster and vector spatials -> package 'via'

Overview

The available data can be inspected with the datasets() function. This function is connecting to the remote where the registry files are and downloads an overview of the registered data items.

ds <- datasets()

## Use datasets(src = <src>) to see available versions and resolutions.

This function returns a a data.frame class object, which includes an overview of the available data( default argumentation). The function will also check for important changes to the remotes, which might indicate that some action (e.g. update of the client) is required. Every row represents a unique series (src-ser combinations), and some information about their defaults (version, resolution, class, etc):

str(ds)

## 'data.frame':    54 obs. of  9 variables:
##  $ topic        : chr  "Traits" "Traits" "Organism distribution" "Organism distribution" ...
##  $ sourceName   : chr  "Ancient Reef Traits Database" "Ancient Reef Traits Database" "BioDeepTime" "BioDeepTime" ...
##  $ src          : chr  "AncientReefTraits" "AncientReefTraits" "biodeeptime" "biodeeptime" ...
##  $ seriesName   : chr  "Reference list" "Denormalized Trait Data" "Denormalized biogeographic observations" "Bchron ages for records from the Neotoma Paleoecology Database" ...
##  $ ser          : chr  "refs" "traits" "denormalized" "neotoma-bchron" ...
##  $ defaultSeries: logi  FALSE TRUE TRUE FALSE FALSE TRUE ...
##  $ defaultClass : chr  "data.frame" "data.frame" "data.frame" "data.frame" ...
##  $ infoURL      : chr  "https://chronosphere.info/data/AncientReefTraits/refs/" "https://chronosphere.info/data/AncientReefTraits/traits/" "https://chronosphere.info/data/biodeeptime/denormalized/" "https://chronosphere.info/data/biodeeptime/neotoma-bchron/" ...
##  $ defaultVer   : chr  "1.04" "1.04" "1.0" "1.0" ...

Typically the 1) most recent version of the series with the 2) coarsest resolution and the 3) fastest expected loading time will be the default, which means that some data can be already be downloaded with a simple src - ser combination. Please check out the basic overview of the chronosphere’s data model to read more about what src, ser etc mean.

Note that this was tutorial was build on 2024-11-15, therefore the exact result that you see might be different!

Source-specific registry files

Note that the object that we just created does not contain all available items that come from a series. There can can be hundreds of these. This information is tabulated for every source (src), and needs to be accessed separately. This ensures that the growning of the chronosphere’s data library will not cause performance issues when it comes to finding data. To get these source-specific registry files, all we have to do is provide the source (src) argument. For instance, if we want to access the data items that are related to the PALEOMAP project (or, on the chronosphere), we have to provide the src="paleomap" argument:

pm <- datasets(src="paleomap")

This results in a different data.frame, which includes all items of the paleomap source. You can either look into the table with View() (or equivalent) funciton, or you can list out the available src- ser - ver - res combinations with this chunk of code:

unique(pm[, c("src", "ser", "ver", "resolution")])

##         src             ser               ver resolution
## 1  paleomap           areas                 7         NA
## 2  paleomap             dem          20180801        1.0
## 3  paleomap             dem          20180801        0.1
## 4  paleomap             dem            v24221        0.1
## 5  paleomap             dem            v24221        1.0
## 6  paleomap            gmst scotese02a_v21321        1.0
## 7  paleomap           model        v3-GPlates         NA
## 8  paleomap           model          v19o_r1c         NA
## 9  paleomap      paleoatlas                v3        0.1
## 10 paleomap paleocoastlines                 7         NA
## 11 paleomap        rainfall        scotese_02        1.0

Note that this large data.frame object will also include additional metadata, such as the long names of the series, the associated references, the class, as well as the URL of the datafile that is associated with the the item and the code that will be used to instantiate the object.

Saving the registry files

By default the registry files here are saved to a temporary directory, which will be destroyed when your R session quits. This means that you will have to re-download the registry files whenever you execute the datasets() function. This can be avoided, with the datadir argument, which has to point to an existing directory (where you have writing access). The registry files will be saved to this place.

# create chronosphere directory in the user's home
dir <- "~/chronosphere"
dir.create(dir, showWarnings=FALSE)
pm <- datasets(src="paleomap", datadir=dir)

The datadir argument will added as a package-wide variable in the next update, so you would not have to define it separately for either datasets() or fetch() function calls.

Now that we know how to inspect the available data, it is time to see how we can actually access these.