2.3 Data Import/Export in R

2.3.1 Reading Data from a Text File

  • The easiest way to import data into R’s statistical system is to do in a tabular format saved in a text/ file.

  • To import tabular data from a text file, R provides the function \(\mathtt{read.table()}\). \(\mathtt{read.table()}\) is the most convenient function to import tabular data from text files and can be easily used for data files of small or moderate size having data in a rectangular format. The arguments which can be passed to \(\mathtt{read.table()}\) are given below.

function (file, header = FALSE, sep = "", quote = "\"'", dec = ".", 
    numerals = c("allow.loss", "warn.loss", "no.loss"), row.names, 
    col.names, as.is = !stringsAsFactors, na.strings = "NA", 
    colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, 
    fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, 
    comment.char = "#", allowEscapes = FALSE, flush = FALSE, 
    stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", 
    encoding = "unknown", text, skipNul = FALSE) 
  • Some of the important arguments for the function \(\mathtt{read.table}\) are discussed below, for the rest see the help file using \(\mathtt{help(read.table)}\).
The name of the tabular (text) file to import along with the full path
A logical argument to specify if the names of the variables are available in the first row
Character to specify the seperator type, default “ “ takes any white space as a separator
To specify if the character vectors in the data are in quotes, this shuold specify the type of quotes
To specify if the character vectors should be converted to factors. The default behaviour is to read characters as characters and not factors
A logical value to specify if the extra leading and trailing white spaces have to be removed from the character fiels. This is used when sep !=".
Logical value to specify if the blank fields in a row should be filled.
  • The example below imports a tab delimited text file.

  • Note the use of “ in the sep argument for tab delimited data . The header argument is also TRUE here as our dataset has variable names in the first row.

  • Note that in the example below, the working directory for the RStudio session has already been set to the destination file’s directory (data folder). If the working directory is different from the location of the data file then either the working directory should be changed using \(\mathtt{setwd}\) or RStudio’s GUI or full path for the file’s location should be provided with the file name.

data_readtable = read.table("data/demo_data.txt", sep = "\t", header = TRUE)
       Date AAPL  MSFT
1 2/01/1998 4.06 16.39
2 5/01/1998 3.97 16.30
3 6/01/1998 4.73 16.39
4 7/01/1998 4.38 16.20
5 8/01/1998 4.55 16.31
6 9/01/1998 4.55 15.88
  • This data can be now saved into .Rdata format after importing from a text file using \(\mathtt{save}\) or can be written to another text file using \(\mathtt{write.table}\) as shown below:
# saving data as an object in .Rdata format
save(data_readtable, file = "data/data1.Rdata")
# saving data into another text file
write.table(data_readtable, file = "data/data1.txt")
  • Another convenient way to store the data is to store in RDS format.
saveRDS(data_readtable, file = "data/data1_rds.Rds")
  • These data files can then be loaded using load and readRDS functions
       Date AAPL  MSFT
1 2/01/1998 4.06 16.39
2 5/01/1998 3.97 16.30
3 6/01/1998 4.73 16.39
4 7/01/1998 4.38 16.20
5 8/01/1998 4.55 16.31
6 9/01/1998 4.55 15.88
  • Rds format can be loaded into a different object
data_readtable2 = readRDS("data/data1_rds.Rds")
'data.frame':	3936 obs. of  3 variables:
 $ Date: chr  "2/01/1998" "5/01/1998" ...
 $ AAPL: num  4.06 3.97 4.73 4.38 4.55 ...
 $ MSFT: num  16.4 16.3 ...

2.3.2 Reading Data from CSV files

  • Reading data from a CSV file is made easy by the \(\mathtt{read.csv}\) function. \(\mathtt{read.csv}\) function is an extension of \(\mathtt{read.table}\). It facilitates direct import of data from CSV files. \(\mathtt{read.csv}\) function takes the following arguments
  • The following example imports a CSV file with the same data as previously imported from a text file.
# Check the working directory before importing else provide full path
data_readcsv = read.csv("data/demo_data.csv")
       Date AAPL  MSFT
1 2/01/1998 4.06 16.39
2 5/01/1998 3.97 16.30
3 6/01/1998 4.73 16.39
4 7/01/1998 4.38 16.20
5 8/01/1998 4.55 16.31
6 9/01/1998 4.55 15.88
  • Similar to \(\mathtt{write.table}\) data can also be written to an external csv file using \(\mathtt{write.csv}\). The following example uses an inbuilt data set in R and exports it to a CSV.
  • Notice the use of row.names=FALSE to avoid creating one more column in the CSV file with row numbers
data(iris)  #R inbuilt dataset
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
write.csv(iris, "data/data_iris.csv", row.names = FALSE)

2.3.3 Reading from Excel Files

  • R does provide methods to import data from excel file with the help of external packages. There are methods provided by packages like readxl, \(\mathtt{gdata}\), \(\mathtt{XLConnet}\), \(\mathtt{xlsx}\).

2.3.4 Reading from Data Files from other Statistical Systems

When migrating from software like SPSS, Stata, Matlab users might want to use there old datasets generated from these systems in R. This requires methods for importing these datasets into R. There are packages like \(\mathtt{haven}\), \(\mathtt{foreign}\) and \(\mathtt{R.matlab}\) which provide these functionality.

2.3.5 Importing Data using RStudio

  • To import data click on Import Dataset \(\rightarrow\) From Excel.. \(\rightarrow\) for the file to import.

  • Remember the file should be in a tabular format, a text file or a csv are the best options. On clicking Import the data will be imported in a Data Frame and will be made visible by RStudio.

  • This will also generate basic data import command used for importing and viewing the file in the RStudio console as shown in the figure below. Note that the path in the command as shown in the console has been scrambled as it will be different for every computer

Import Dataset wizard in RStudio

Figure 2.2: Import Dataset wizard in RStudio