Part 1

Question#1:

Suppose you are provided the Dataframe below. Compute the mean of the data1 column using the labels from key1.

NOte: access data1 and call groupby with the column (a Series) at key1

• My Output may not match with your’s since we are using `np.random.randn(5)

`

In :

import numpy as np

import pandas as pd

df = pd.DataFrame({‘key1’ : [‘a’, ‘a’, ‘b’, ‘b’, ‘a’],

‘key2’ : [‘one’, ‘two’, ‘one’, ‘two’, ‘one’],

‘data1’ : np.random.randn(5),

‘data2’ : np.random.randn(5)})

df

Out:

In :

grouped =  #TO DO — Compute groupby

grouped  #TO DO — Compute mean

Out:

key1

a    0.170724

b   -0.247155

Name: data1, dtype: float64

Question#2:

Derive the mean for the series below using Group by into states, years

In :

states = np.array([‘Ohio’, ‘California’, ‘California’, ‘Ohio’, ‘Ohio’])

years = np.array([2005, 2005, 2006, 2005, 2006])

df[‘data1’] #TO DO — Compute groupby and mean

Out:

California  2005   -1.028313

2006    0.887314

Ohio        2005   -0.443082

2006    1.045025

Name: data1, dtype: float64

In :

Question#3:

Explain the code and output below

In :

df[‘data1’].describe() #TO DO — Complete the Code

Out:

count    5.000000

mean     0.003572

std      1.128177

min     -1.381624

25%     -1.028313

50%      0.495459

75%      0.887314

max      1.045025

Name: data1, dtype: float64

Question#4: Explain the code and output below

In :

df[‘data2’].describe()

Out:

count    5.000000

mean    -0.220463

std      0.628949

min     -1.014234

25%     -0.577313

50%     -0.325994

75%      0.296270

max      0.518958

Name: data2, dtype: float64

Reference(s) title & URL:

## Question#1:

### Write a Python program to draw a line with suitable label in the x axis, y axis and a title.

In :

`import matplotlib.pyplot as plt`
`X = range(1, 50)`
`Y = [value * 3 for value in X]`
`print("Values of X:")`
`print(*range(1,50)) `
`print("Values of Y (thrice of X):")`
`print(Y)`
` # TO DO - Plot lines and/or markers to the Axes.`
` #  TO DO - Set the x axis label of the current axis.`
` #  TO DO - Set the y axis label of the current axis.`
` #  TO DO - Set a title `
` #  TO DO - Display the figure.`
`Values of X:`
`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49`
`Values of Y (thrice of X):`
`[3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99, 102, 105, 108, 111, 114, 117, 120, 123, 126, 129, 132, 135, 138, 141, 144, 147]`

## Question#2:

### Write a Python programming to display a bar chart of the popularity of programming Languages.

Sample data: Programming languages: Java, Python, PHP, JavaScript, C#, C++ Popularity: 22.2, 17.6, 8.8, 8, 7.7, 6.7

In :

`import matplotlib.pyplot as plt`
`x = ['Java', 'Python', 'PHP', 'JavaScript', 'C#', 'C++']`
`popularity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]`
`x_pos = [i for i, _ in enumerate(x)]`
`plt.bar(x_pos, popularity, color='blue')`
`plt.xlabel("Languages")`
`plt.ylabel("Popularity")`
`plt.title("PopularitY of Programming Language\n" + "Worldwide, Oct 2017 compared to a year ago")`
`plt.xticks(x_pos, x)`
`# Turn on the grid`
`plt.minorticks_on()`
`plt.grid(which='major', linestyle='-', linewidth='0.5', color='red')`
`# Customize the minor grid`
`plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')`
` #  TO DO - draw the plot`

## Question#3:

### Write a Python programming to display a horizontal bar chart of the popularity of programming Languages.

In :

`import matplotlib.pyplot as plt`
`x = ['Java', 'Python', 'PHP', 'JS', 'C#', 'C++']`
`popularity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]`
`x_pos = [i for i, _ in enumerate(x)]`
`plt.barh(x_pos, popularity, color='green')`
`plt.xlabel("Popularity")`
`plt.ylabel("Languages")`
`plt.title("PopularitY of Programming Language\n" + "Worldwide, Oct 2017 compared to a year ago")`
`plt.yticks(x_pos, x)`
`# Turn on the grid`
`plt.minorticks_on()`
`plt.grid(which='major', linestyle='-', linewidth='0.5', color='red')`
`# Customize the minor grid`
`plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')`
`#  TO DO - draw the plot`

## Question#4:

### Write a Python programming to display a bar chart of the popularity of programming Languages. Use different color for each bar

In :

`import matplotlib.pyplot as plt`
`x = ['Java', 'Python', 'PHP', 'JavaScript', 'C#', 'C++']`
`popularity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]`
`x_pos = [i for i, _ in enumerate(x)]`
`​`
`plt.bar(x_pos, popularity, color=['red', 'black', 'green', 'blue', 'yellow', 'cyan'])`
`​`
`plt.xlabel("Languages")`
`plt.ylabel("Popularity")`
`plt.title("PopularitY of Programming Language\n" + "Worldwide, Oct 2017 compared to a year ago")`
`plt.xticks(x_pos, x)`
`# Turn on the grid`
`plt.minorticks_on()`
`plt.grid(which='major', linestyle='-', linewidth='0.5', color='red')`
`# Customize the minor grid`
`plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')`
` #  TO DO - draw the plot`

## Question#5:

### Write a Python programming to create a pie chart of the popularity of programming Languages.

Sample data: Programming languages: Java, Python, PHP, JavaScript, C#, C++ Popularity: 22.2, 17.6, 8.8, 8, 7.7, 6.7

In :

`import matplotlib.pyplot as plt`
`# Data to plot`
`languages = 'Java', 'Python', 'PHP', 'JavaScript', 'C#', 'C++'`
`popuratity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]`
`colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", "#8c564b"]`
`# explode 1st slice`
`explode = (0.1, 0, 0, 0,0,0)  `
`# Plot`
`plt.pie(popuratity, explode=explode, labels=languages, colors=colors,`
`autopct='%1.1f%%', shadow=True, startangle=140)`
`​`
`plt.axis('equal')`
` #  TO DO - draw the plot`

## Question#6:

### Population pyramid can be used to show either the distribution of the groups ordered by the volumne. Or it can also be used to show the stage-by-stage filtering of the population as it is used below to show how many people pass through each stage of a marketing funnel.

In :

`import pandas as pd`
`import matplotlib.pyplot as plt`
`import seaborn as sns`
`​`
`# Read data`
`df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv")`
`​`
`# Draw Plot`
`plt.figure(figsize=(13,10), dpi= 80)`
`group_col = 'Gender'`
`order_of_bars = df.Stage.unique()[::-1]`
`colors = [plt.cm.Spectral(i/float(len(df[group_col].unique())-1)) for i in range(len(df[group_col].unique()))]`
`​`
`for c, group in zip(colors, df[group_col].unique()):`
`    sns.barplot(x='Users', y='Stage', data=df.loc[df[group_col]==group, :], order=order_of_bars, color=c, label=group)`
`​`
`  `
` #  TO DO - draw the plot  xlabel`
` #  TO DO - draw the plot  ylabel`
`plt.yticks(fontsize=12)`
` #  TO DO -  plot the title`
` #  TO DO -  plot the legend`
` #  TO DO -  plot the graph`

Reference(s) title & URL:

