Thumbnail for null by null

6m 33s822 words~5 min read
Auto-Generated

[0:00]Hello and welcome to another episode of the Data Bender show. Today we're going to be talking about data analysis and we're going to be doing some data analysis live during the show. So feel free to join along, use the comments to interact with us. Let us know what you think. We're going to be using Google Colab. It's a free service from Google that allows you to run Jupiter notebooks in the cloud and it comes with all the good stuff that you need, like Python and a lot of the common data science libraries pre-installed. So if you want to follow along, you can just go to colab.research.google.com and start a new notebook. And we're going to be using a data set today from Kaggle. It's a data set about video games. And it contains information like the name of the game, the platform, the year of release, the genre, the publisher, and also sales numbers in millions of dollars for different regions. So we're going to be using this data set to answer some questions like what are the most popular genres, what are the most popular platforms, and how have sales changed over time? And we're also going to be looking at some of the relationships between these different variables. So let's get started. I'm going to share my screen now and we're going to jump right into the code. So the first thing we're going to do is import the libraries that we need. We're going to be using pandas for data manipulation, Matplotlib for plotting, and Seaborn for more advanced plotting. And then we're going to load the data set. So I've already downloaded the data set from Kaggle and uploaded it to my Google Drive. So I'm just going to mount my Google Drive here and then load the CSV file into a pandas data frame. And then we're going to take a look at the data. So we're going to use the head method to look at the first few rows of the data frame and then the info method to look at the data types and see if there are any missing values. So it looks like we have a pretty clean data set. There are no missing values in most of the columns, but there are some missing values in the year of release column and in the publisher column. So we're going to have to deal with those. For now, we're just going to drop the rows with missing values in the year of release column because it's a small number of rows and it won't significantly affect our analysis. And for the publisher column, we're just going to fill the missing values with a placeholder like unknown because we don't want to lose those rows. And then we're going to convert the year of release column to an integer because it's currently a float. And then we're going to start our analysis. So the first thing we're going to do is look at the distribution of genres. So we're going to use the value_counts method to count the number of games in each genre and then we're going to plot that as a bar chart. And it looks like action games are the most popular genre, followed by sports and then miscellaneous. And then we're going to do the same for platforms. So we're going to look at the distribution of platforms and plot that as a bar chart. And it looks like the PS2 is the most popular platform, followed by the DS and then the PS3. And then we're going to look at sales over time. So we're going to group the data by year of release and sum the sales for each year and then plot that as a line chart. And it looks like sales peaked around 2008 and then started to decline. And then we're going to look at the relationship between genre and sales. So we're going to group the data by genre and sum the sales for each genre and then plot that as a bar chart. And it looks like action games have the highest sales, followed by sports and then shooter. And then we're going to do the same for platform and sales. So we're going to group the data by platform and sum the sales for each platform and then plot that as a bar chart. And it looks like the PS2 has the highest sales, followed by the X360 and then the PS3. So that's it for our data analysis today. We've looked at the distribution of genres and platforms, sales over time, and the relationship between genre and sales and platform and sales. I hope you've enjoyed this episode of the Data Bender show. If you have any questions or comments, please leave them below. And don't forget to like and subscribe. We'll see you next time.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript