When it comes to practical learning or teaching I often use simulated data, but sometimes it’s preferable to have real or real-looking data to work with.
Here are sources of data that I’ve found or I’ve been referred to:
- package
datasets
- package
NRAIA
for non-linear datasets – not on CRAN anymore, but available viainstall.packages("NRAIA", repos="http://R-Forge.R-project.org")
- package
growthrates
for population growth datasets - datasets in package
mclust
, an “R package for model-based clustering, classification, and density estimation” - https://allisonhorst.github.io/palmerpenguins/ – an alternative to
data(iris)
- https://opendata.dc.gov/datasets/urban-forestry-street-trees/ – geo-referenced tree data with information on species and health status
- Data from “Skeletal correlates for body mass estimation in modern and fossil flying birds.” PLOS One 8: e82000
- gage height of the Potomac river from the USGS: https://nwis.waterdata.usgs.gov/nwis/uv?cb_00060=on&cb_00065=on&format=rdb&site_no=01618000&period=&begin_date=2020-01-14&end_date=2021-01-21
- USGS data on earthquakes that have occurred in the past 24 hours: https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv
The list of datasets contained in an R package can be obtained as follows: data(package='packagename')
, where packagename is to be replaced with the name of the package of interest.
Did you enjoy this? Consider joining my on-line course “First steps in data analysis with R” and learn data analysis from zero to hero!