Skip to content

How to load data into R

    This is one of the first step in R that trouble people the most but it’s not that hard if you observe some important rules.

    First, before reading data in R it’s crucial to enter data properly. Most people do it in a spreadsheet (Excel or similar). When doing so, make sure that there are no empty cells. If you did not observe something, write “NA” in the corresponding cell. Make sure that there are no spaces where there should not be any (cells containing numbers, etc.).

    Most statistical analyses in R require data to be provided in “long” format, namely one row for each observation. Example: if you want to compare the weight of sheep across 10 different farms, resist the temptation of having one row for each farm. One measurement = one row. All weight measurements will be in a column. For each weight measurement you’ll need all the information of interest, such as sheep unique ID (in one column) and the farm it belongs to (in another column). If each sheep was weighed more than once, you’ll still have each measurement in a different row, but you’ll add a column to identify repeated weight measures univocally (e.g. by date, or season, or before/after a certain treatment… Whatever serves your purpose).

    To read Microsoft Excel files in R, but I find it much easier to read .csv files or tab-delimited files (saved as .txt). To save a spread sheet as .csv (comma-separated) or as tab-delimited .txt, select “Save as…” in your spreadsheet editor (probably Excel) and select the corresponding option.

    To load a .csv file into R, use:

    mydata <- read.csv("FILE_LOCATION/FILE_NAME.csv")

    To load a tab-delimited .txt file into R, use:

    mydata <- read.delim("FILE_LOCATION/FILE_NAME.txt")

    Where mydata is a name of your choice, while FILE_LOCATION and FILE_NAME must be those of the file where your data are stored. To avoid having to type FILE_LOCATION and FILE_NAME by hand, you can use this trick:

    mydata <- read.csv(file.choose())
    mydata <- read.delim(file.choose())

    (Using file.choose() jams my Mac but it works fine on Windows and in RStudio).

    To load data into R from an Excel file:

    if (!require(openxlsx)) install.packages('openxlsx') # this installs package "openxlsx" it it isn't already installed.
    library(openxlsx) # load package "openxlsx"
    ?read.xlsx # to read how to use the function
    mydata <- read.xlsx("FILE_LOCATION/FILE_NAME.xlsx", sheet="SHEET_OF_INTEREST")

    Did you enjoy this? Consider joining my on-line course “First steps in data analysis with R” and learn data analysis from zero to hero!