In the R programming language, one of the most common tasks for data analysts and data scientists is importing data from external files. While functions like
read.csv() in base R provide an easy-to-use approach for data importing, they can be relatively slow when handling large data files. Fortunately, there’s an alternative function that offers faster data importing:
fread() function is part of the
data.table package, a powerful package in R that extends the
data.frame structure, allowing for higher performance in terms of speed and memory usage. This function is often favored over base R functions for importing large datasets due to its efficiency.
In this comprehensive guide, we will dive deep into the usage of the
fread() function and demonstrate how it can make data importing in R much faster and more efficient.
Overview of fread()
fread() function is a faster and more flexible alternative to the base R
read.csv() functions. Here is the basic syntax:
fread(input, sep = "auto", header = "auto", nrows = -1L, skip = 0L, ...)
input: a character string specifying the file name or a system command to read from its standard output.
sep: the field separator character. The default is “auto”, which automatically detects the separator.
header: whether the first line (after skipping lines) should be used as column names. The default is “auto”, which automatically detects whether the first line is a header.
nrows: the maximum number of data rows to read. The default is -1, which means to read all rows.
skip: the number of lines to skip before reading data. The default is 0.
...: other arguments passed on to
Installing and Loading the data.table Package
Before you can use
fread(), you need to install and load the
data.table package. You can install it using
install.packages() and load it using
# Install the data.table package install.packages("data.table") # Load the data.table package library(data.table)
Once the package is loaded, you can use the
Basic Usage of fread()
fread() to read a .csv file is straightforward. Here’s an example:
# Use fread() to read a .csv file data <- fread("data.csv") # Print the first few rows of the data print(head(data))
In this example,
fread() reads the “data.csv” file and returns a data table, which is a high-performance version of a data frame.
Using fread() with Large Files
One of the main advantages of
fread() is its speed, which makes it particularly well-suited for reading large files. The function is designed to efficiently read files with millions of rows, making it a powerful tool for big data analysis.
Here’s how you can read a large .csv file with
# Use fread() to read a large .csv file large_data <- fread("large_data.csv") # Print the first few rows of the data print(head(large_data))
Despite the large size of the file,
fread() can typically read it much faster than
Selecting Specific Rows with fread()
If you want to read only a subset of rows from a file, you can use the
skip arguments. The
nrows argument specifies the maximum number of rows to read, and the
skip argument specifies the number of lines to skip before reading data.
Here’s how you can read rows 101 to 200 from a .csv file:
# Use fread() to read rows 101 to 200 subset_data <- fread("data.csv", skip = 100, nrows = 100) # Print the data print(subset_data)
In this example,
fread() skips the first 100 lines (rows 1 to 100) and then reads the next 100 lines (rows 101 to 200).
In this article, we’ve explored the
fread() function from the
data.table package in R. This function provides a fast and flexible way to import data from .csv files and other delimited text files. Its speed makes it particularly useful for large datasets. With the ability to select specific rows and automatically detect separators and headers,
fread() offers a powerful tool for data importing in R.