How to read data in R

This is one of the first step in R that trouble people the most but it’s not that hard if you observe some important rules.

First, before reading data in R it’s crucial to enter data properly. Most people do it in a spreadsheet (Excel or similar). When doing so, make sure that there are no empty cells. If you did not observe something, write “NA”. Make sure that there are no spaces where there should not be any (cells containing numbers, etc.).

Most statistical analyses in R require data to be provided in “long” format, namely one row for each observation. Example: if you want to compare the weight of sheep across 10 different farms, resist the temptation of having one row for each farm. One measurement = one row. All weight measurements will be in a column. For each weight measurement you’ll need all the information of interest, such as sheep unique ID (in one column) and the farm it belongs to (in another column). If each sheep was weighted more than once it’s still one weight measurement = one row, but you’ll add a column to specify when the weight measurement was taken (e.g. date, or season, or before/after a certain treatment… Whatever serves your purpose).

To read MS Excel files in R, but I find it much easier to read .csv files or tab-delimited files (saved as .txt). To save a spread sheet as .csv (comma-separated) or as tab-delimited .txt, select “Save as…” in your spreadsheet editor (probably Excel).

To read a .csv file in R, use:

mydata <- read.csv("FILE_LOCATION/FILE_NAME.csv")

To read a tab-delimited .txt file in R, use:

mydata <- read.delim("FILE_LOCATION/FILE_NAME.txt")

Where mydata is a name of your choice, while FILE_LOCATION and FILE_NAME must be those of the file where your data are stored. To avoid having to type FILE_LOCATION and FILE_NAME by hand, you can use this trick:

mydata <- read.csv(file.choose())
mydata <- read.delim(file.choose())

(The two lines of code above jam my Mac but they work fine on Windows).

To read data from an Excel file:

if (!require(openxlsx)) install.packages('openxlsx') # this installs package "openxlsx" it it isn't already installed.
library(openxlsx) # load package "openxlsx"
?read.xlsx # to read how to use the function
mydata <- read.xlsx("FILE_LOCATION/FILE_NAME.xlsx", sheet="SHEET_OF_INTEREST")