The **sample** method in Pandas let’s us randomly sample data from a DataFrame.

### syntax –

`dataframe.sample(n, frac, replace, weights, random_state, axis)`

**n – **The number of rows to return. Default value is 1.

**frac – **A fraction of rows to return, like 0.5 for 50% of the rows

**replace – **sample with or without replacement. By default without replacement.

**weights – **Specifies the importance of certain rows or columns

**random_state –** the seed of the random generator

**axis – **Whether to sample rows or columns. By Default sample rows.

## Example –

Let’s read a dataset to work with.

```
import pandas as pd
url = 'https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv'
df = pd.read_csv(url)
df.head()
```

### 1 . Randomly Sample N Data Points from the DataFrame –

To randomly sample n data points from the dataframe, we can use the **n** parameter of sample method in pandas.

Let’s say we want to randomly sample 10 data points from the dataframe.

`df.sample(n=10, random_state=42)`

We use the **random_state** parameter for reproducibility. If you run the above code again you will get the same sets of 10 data points.

### 2 . Randomly Sample Fraction of Data Points from the DataFrame –

To randomly sample the fractions of data points, we can use the **frac** parameter of sample method.

Let’s say we want to randomly sample 20% of the data from the dataframe.

`df.sample(frac=0.2, random_state=42)`

Only top few rows are shown here.

### 3 . Random Sampling Without Replacement –

By Default pandas does random sampling without replacement i.e. same data point can’t be selected more than once. You can explicitly set this using the replace parameter.

`df.sample(n=5, replace=False)`

### 4 . Random Sampling With Replacement –

To do random sampling with replacement, set the** replace** parameter to **True** i.e. same data points can be selected more than once.

`df.sample(n=5, replace=True)`

### 5. Using DataFrame column as Weights –

Rows with larger value in the column are more likely to be sampled.

`df.sample(n=5, weights='Net Sales')`