Pandas to_csv – write a dataframe to a csv file.

In this post, you will learn –

A. How to write a dataframe to a csv file in pandas.

B. Save dataframe to a csv file using different separators.

C. Save to a csv file with or without index.

D. Write a csv file without headers.

E. Write only few columns to csv.

F. Use different encoding when saving a csv file.

G. Specifying datetime format when saving a csv file.

H. Create a zip file using pandas.

Related Post – Pandas read_csv – read a csv file in Python.

A. How to write a dataframe to a csv file in pandas.

To write or save or export a dataframe to a csv file, we use the pandas to_csv function.

Let’s read a dataset to work with.

# import pandas
import pandas as pd

url="https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv"
df = pd.read_csv(url)
df.head()

To save a dataframe to a csv file, you have to pass the file_path + file_name to the to_csv function.

# save a dataframe as csv file
df.to_csv('../data/store_sales1.csv', index=False)

B. Save dataframe to a csv file using different separators –

By default, to_csv save the dataframe as a comma separated file. If you want, you can also save a file using different separators like semicolon (;), a tab (\t), a space ( ) and a pipe (|).

To save a file with different separators we use the sep parameter. Let’s save a csv file using tab.

# save a dataframe to csv with tab separator
df.to_csv('../data/store_sales2.csv', sep='\t', index=False)

C. Save to a csv file with or without index –

When you save a dataframe to a csv file, you also have the options to save row names or the index of the dataframe. Sometimes it is desirable and sometimes it’s not. By default it is index=True.

Let’s check the effect of both options.

# with index=True
df.to_csv('../data/store_with_index.csv')

# with index=False
df.to_csv('../data/store_without_index.csv', index=False)

Now, let’s read them back.

# with index=True
with_index = pd.read_csv('../data/store_with_index.csv')
with_index.head()

You can see when we used the index=True, the row names also gets added to the file which is not desirable in this case.

# with index=False
without_index = pd.read_csv('../data/store_without_index.csv')
without_index.head()

But with index=False no row indexes are saved which is what you want most of the time.

D. Write a csv file without headers –

Sometimes you may not want to write the columns names or headers of the dataframe when saving the file. To do that you can use the header parameter.

# without headers
df.to_csv('../data/store_without_headers.csv', header=False, index=False)

without headers = pd.read_csv('../data/store_without_headers.csv')
without_headers.head()

You can see column names have been removed when you use header=False. By default it is header=True and it will also write the column names.

E. Write only few columns to a csv file –

If you want to save only few columns of the dataframe to a csv file then you have to use the columns parameters.

# save only Gender and Age data 
df.to_csv('../data/gen_age.csv', columns=['Gender','Age'], index=False)

gen_age = pd.read_csv('../data/gen_age.csv')
gen_age.head()

F. Use different encoding when saving a csv file –

By default when you save a dataframe, pandas uses the ‘utf-8’ format which you can change using the encoding parameter.

# save a csv file with utf-16 encoding
df.to_csv('../data/store_utf16.csv', index=False, encoding='utf-16')

Now, if you try to read this file without passing the right encoding parameter, you will get the UnicodeDecodeError.

To read it correctly, you have to use the encoding=’utf-16′ in the read_csv function as this is the encoding we used when saving the file.

utf16 = pd.read_csv('../data/store_utf16.csv', encoding='utf-16')
utf16.head()

G. Specifying datetime format when saving a csv file –

If you have a datetime data and you want to write the file with different format then use the date_format parameter.

url='https://raw.githubusercontent.com/bprasad26/lwd/master/data/tesla_stock_prices.csv'

tesla = pd.read_csv(url, date_parser=['Date'])
tesla.head()

At this moment the date is in the ‘%Y-%m-%d’ format. Let’s change this to ‘%d-%m-%Y’ format.

# save as day-month-year format
tesla.to_csv('../data/tesla_dmy.csv', date_format='%d-%m-%Y', index=False)

H. Create a zip file using pandas.

Let’s say you want to save the dataframe to a csv file but you also want to compress it. For this you can use the compression parameter.

The possible values for the compression mode are –  {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}

# create a zip file
compression_opts = dict(method='zip',
                        archive_name='tesla.csv')  
tesla.to_csv('tesla.zip', compression=compression_opts)

This will create a tesla.zip folder and inside of it there will be a tesla.csv file.

Related Post – Pandas read_csv() – read a csv file in Python.

If you liked this post then please share it with others and also subscribe to our blog below to learn more about pandas.

Rating: 1 out of 5.

Leave a Reply