4 Functions to Know If You Are Planning to Switch from Pandas to Polars


#Functions #Planning #Switch #Pandas #Polars

Both Pandas and Polars code are included

Soner Yıldırım
Towards Data Science
Photo by israel palacio on Unsplash

Pandas can sometimes be difficult to work with when data size is large. Two main issues associated with large datasets are Pandas doing in-memory analytics and creating intermediate copies.

On the other hand, Pandas’ user-friendly API and rich selection of flexible functions make it one of most popular data analysis and manipulation libraries.

Polars is a great alternative to Pandas especially when the data size becomes too large for Pandas to handle easily. The syntax of Polars is somewhere between Pandas and PySpark.

In this article, we’ll go over four must-know functions for data cleaning, processing, and analysis with both Pandas and Polars.

First things first. We, of course, need data to learn how these functions work. I prepared sample data, which you can download in my datasets repository. The dataset we’ll use in this article is called “data_polars_practicing.csv”.

Let’s start by reading the dataset into a DataFrame, which is the two-dimensional data structure of both Polars and Pandas libraries.

import polars as pl

df_pl = pl.read_csv("data_polars_practicing.csv")


(image by author)
import pandas as pd

df_pd = pd.read_csv("data_polars_practicing.csv")


(image by author)

As we see in the code snippets above, the head method displays the first five rows of the DataFrame in both Polars and Pandas. One important difference is that Polars show the data types of columns but Pandas doesn’t. We can also use the dtypes method to see column data types.

We now have a Polars DataFrame called df_pl and a Pandas DataFrame called df_pd.

1. Filter