Approach:

Using Python and dataset Gapminder, First, I grouped the “Countries into Continents” and added new column “Continent” in the dataset.

Then, I removed Continents with lesser countries using Numpy’s NAN function. Later, I wrote the Continent ID to their Original descriptions and displayed it. In addition to that I also tried to use the CrossTab function for better Visualization and finally clubbed 2-3 data frames for a big picture and displayed top 10 rows.

In below picture is the sample records from Gapmnider dataset:

P2

Columns are :

country, continent, incomeperperson, alcconsumption, armedforcesrate, breastcancerper100th, co2emissions, femaleemployrate, hivrate, internetuserate, lifeexpectancy, oilperperson, polityscore, relectricperperson, suicideper100th, employrate, and urbanrate

Sample (for above columns):

Afghanistan, 2, 100, 0.03, 0.5696534, 26.8, 75944000, 25.60000038, 3.654121623, 48.673, 0, 6.6843853, 55.70000076, 24.04

India, 2, 786.7000981, 2.69 0.5739204, 19.1 30391317000, 32.29999924, 0.3, 7.499995878, 65.438, 0.126978752, 9, 110.7054664, 18.58382607, 55.40000153, and 29.54

In the above and below picture, following are the output:

P3

Output 1 – the Number of Rows and Columns in the datset

Output 2 – Continent’s IDs and Count of Countries within.

Output 3 – Removed the Continent ID 1 & 5, as they had few countries, using numpy’s NAN function. A “CONTN” dataframe was created from original “Continent” dataframe using the NAN function and displayed as out put.

Output 4 – Replaced Original Continent ID by Name of the Continent, using the Recode function.

Output 5 – Used the CrossTab function and re-displayed the details as in Output 4 above, a better visualization.  Using “Pandas.Cut” function categorized the “Continent” dataframe and “CONTS” dataframe of output 4, and was used a Columns in the Cross Tab.

Output6 – New variable “sub1” was created and clubbed multiple dataframes like “CONTS” having Continent description, Original dataframe “Continent”, and added further columns. Top 10 rows where displayed using Head function.

Advertisements