Pandas DataFrames

Spread the love

What is a DataFrame?

A DataFrame is a two-dimensional labeled data structure with columns potentially of different types. You can think of it like a spreadsheet or SQL table, or a dictionary of Series objects. DataFrames are generally the most commonly used pandas object.

Creating a DataFrame

Creating a DataFrame is simple and can be done in multiple ways, such as from a list, dictionary, or by reading from a file. Here are some examples:

From a list:

import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])

From a dictionary:

import pandas as pd
data = {'Name':['Tom', 'Nick', 'John'], 'Age':[20, 21, 19]}
df = pd.DataFrame(data)

Viewing Data

To view a small sample of a DataFrame, use the head() and tail() methods. head() returns the first n rows (default is 5), while tail() returns the last n rows (default is 5).


Dataframe Information

It is often useful to get a quick description of the data, especially in large DataFrames. The info() and describe() methods can help in this regard. info() provides a summary of the DataFrame including the data types, non-null values, and memory usage. describe() provides descriptive statistics for each column.


Selecting Data

You can select data in a DataFrame using column names, or using iloc and loc for position-based or label-based data selection, respectively.

Select a column by name:


Select by position:


Select by label:


Sorting Data

You can sort a DataFrame using any column, using sort_values(). If you want to sort by multiple columns, you can pass a list of column names.

df = df.sort_values('Age')

Applying Functions

You can apply functions to DataFrames in a vectorized way. For instance, using apply() with a lambda function can let you quickly perform computations across an entire DataFrame.

df['Age'] = df['Age'].apply(lambda x: x + 1)

Missing Data

Pandas uses the special float value NaN (Not a Number) to represent missing data. Functions like isnull() or notnull() allow you to detect missing data, while functions like dropna() or fillna() allow you to handle missing data.

# Detect missing values

# Drop rows with missing values
df = df.dropna()

# Fill missing values with a specified value
df = df.fillna(value=0)

Grouping Data

Grouping data is done via the groupby() function. You can group by a single column or by a list of columns. After grouping, you can apply aggregation functions like sum(), count(), mean(), etc.


Merging, Joining, and Concatenating

There are several ways to combine DataFrames including merge(), join(), and concat().


df1.merge(df2, on='common_column')




pd.concat([df1, df2])

Reading and Writing to Files

Pandas can easily read data stored in different file formats like CSV, Excel, SQL databases, etc. Similarly, data can be written to these formats as well.

Reading a CSV file:

df = pd.read_csv('file.csv')

Writing to a CSV file:



The pandas DataFrame is a powerful data manipulation tool that forms the foundation for most Python-based data analyses. Its flexibility, functionality, and easy-to-use nature make it a go-to for data scientists worldwide.

Leave a Reply