Pandas Tricks Part — 4

Published in

Analytics Vidhya

4 min readJun 2, 2020

In this article, I will continue with the Data Analysis and Manipulation tricks using pandas in python. In my last article, we discussed about datatypes and there conversion, also we discussed about concatenating data frames via rows as well as columns. These analysis and manipulation techniques are very useful and have helped me a lot in my professional career while working on data and getting valuable insights out of it.

Today, we will use movies data frame from IMDb and demonstrate some useful pandas tricks. This tricks can then be used on a variety of business data.

Let’s read out the data frame first :

movies = pd.read_csv('http://bit.ly/imdbratings')

Then, you can have a glimpse of the data frame by using the command movies.head() as shown :

movies.head()

If you want to check the length of the entire dataset or how many rows are there in the dataset, you can make use of the len() command as shown :

len(movies) ##To check the length of movies dataset

Length or number of rows in the movies data set

We can also extract a chunk of the data set into another data set. This is especially useful, when out of a very large dataset, we want to first focus on a fraction of it, in order to study it and draw out our conclusions. This helps us to draw out valuable insights in different sets and compare them with each other as well as with the full data set. These studies help us to draw out conclusion on how the things are changing with time and what has happened in different data sets which is different from the entire study done in full.

Below, we have created two data sets by assigning 75% of the data to movies1 and the remaining to movies2.

movies_1 = movies.sample(frac=0.75, random_state=1234)movies_2 = movies.drop(movies_1.index)len(movies_1) + len(movies_2)  ## To check whether the division has been done correctly

Division of movies data set into two chunks of 75% and 25%

Note: This entire approach will not work if the index values are not unique. So, we need to sort our two data sets first , then we should move ahead with analysis on them.

movies_1.index.sort_values()  ## Sort and checkmovies_2.index.sort_values()  ## Sort and check

Now, in our movies data set, if we want to know how many unique genres are there, we can check this with the help of unique command as shown:

movies.genre.unique()  ## Shows us the unique Genres

If we want to check specific set of genres, then we can make use of the OR condition or isin method.

Similarly, with the help of ‘~’, we can display the genres which should not belong to the ones mentioned in the isin() method as shown :

movies[~movies.genre.isin(['Action', 'Drama', 'Western'])].head()

To get Genres other than the ones in isin() method

Now, if we want to know the number of movies in different genres, we can do so, with the help of value_counts() method.

In order to get the count of most frequent genres, we can make use of the nlargest() method.

counts.nlargest(3)  ## Counts the 3 most frequent Genres in the Movie Dataset

With all this, I am concluding this article and we will discuss some other useful pandas tricks in the next article. Hope you learned and enjoyed.

Pandas Tricks Part — 4

Written by Kumar Brar