When it comes to practical learning or teaching I often use simulated data, but sometimes it’s preferable to have real or real-looking data to work with.
Here are sources of data that I’ve found or I’ve been referred to:
NRAIAfor non-linear datasets – not on CRAN anymore, but available via
growthratesfor population growth datasets
- datasets in package
mclust, an “R package for model-based clustering, classification, and density estimation”
- https://allisonhorst.github.io/palmerpenguins/ – an alternative to
- https://opendata.dc.gov/datasets/urban-forestry-street-trees/ – geo-referenced tree data with information on species and health status
- Data from “Skeletal correlates for body mass estimation in modern and fossil flying birds.” PLOS One 8: e82000
- gage height of the Potomac river from the USGS: https://nwis.waterdata.usgs.gov/nwis/uv?cb_00060=on&cb_00065=on&format=rdb&site_no=01618000&period=&begin_date=2020-01-14&end_date=2021-01-21
- USGS data on earthquakes that have occurred in the past 24 hours: https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv
Did you enjoy this? Consider joining my on-line course “First steps in data analysis with R” and learn data analysis from zero to hero!