How to Create Pandas DataFrame in Python

Spread the love

What is a DataFrame?

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It’s similar to a spreadsheet, SQL table, or a dictionary of Series objects. It is generally the most commonly used pandas object and is designed to handle a wide variety of data types, including numerical, categorical, datetime, and textual data.

Creating a DataFrame

Pandas DataFrames can be created in various ways. You can create them from lists, dictionaries, Series, and even other DataFrames. Let’s dive into each method:

Creating DataFrame from Lists

The simplest way to create a DataFrame is using a list.

import pandas as pd
data = [['Alex', 10], ['Bob', 12], ['Clarke', 13]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

Here, each sublist in the data list represents a row in the DataFrame. The columns parameter is a list of column names.

Creating DataFrame from Dict

We can also create a DataFrame from a dictionary, where the keys correspond to column names, and the values (which are lists or arrays) correspond to the data in the columns.

data = {'Name':['Tom', 'Nick', 'John'], 'Age':[20, 21, 19]}
df = pd.DataFrame(data)

Creating DataFrame from Series

A DataFrame can also be created from pandas Series:

series_dict = {
    'Column 1': pd.Series([1, 2, 3]),
    'Column 2': pd.Series(['one', 'two', 'three'])

df = pd.DataFrame(series_dict)

Creating DataFrame from another DataFrame

A DataFrame can be created from another DataFrame:

df2 = pd.DataFrame(df, copy=True)

Here, copy=True ensures that changes to the new DataFrame don’t affect the original.

Handling Indexes

DataFrames have an index that uniquely identifies each row. By default, this is an integer that starts from 0 and increments by 1 for each row. You can specify the index when creating a DataFrame:

df = pd.DataFrame(data, index=['first', 'second', 'third'])

Specifying Data Types (dtypes)

When creating a DataFrame, pandas infers data types from the data. If you want to specify data types, you can do so using the dtype parameter:

df = pd.DataFrame(data, dtype=float)

This will make all data in the DataFrame floats. If you want to specify data types per column, you can do so after creating the DataFrame:

df['column_name'] = df['column_name'].astype('int')

Creating DataFrame from Files

Pandas provides functions to read data from various file formats like CSV, Excel, SQL databases, etc., directly into a DataFrame:

# From a CSV file
df = pd.read_csv('file.csv')

# From an Excel file
df = pd.read_excel('file.xlsx')

# From a SQL query
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
df = pd.read_sql_query('SELECT * FROM my_table', engine)


This comprehensive guide covers the creation of pandas DataFrames from various data sources, including lists, dictionaries, Series, other DataFrames, and files. The pandas library provides a wide range of functionalities to handle and analyze data, with the DataFrame being one of the most utilized structures due to its flexibility and efficiency.

Leave a Reply