Part 1
Question#1:
Suppose you are provided the Dataframe below. Compute the mean of the data1 column using the labels from key1.
NOte: access data1 and call groupby with the column (a Series) at key1
- My Output may not match with your’s since we are using `np.random.randn(5)
`
In [0]:
import numpy as np
import pandas as pd
df = pd.DataFrame({‘key1’ : [‘a’, ‘a’, ‘b’, ‘b’, ‘a’],
‘key2’ : [‘one’, ‘two’, ‘one’, ‘two’, ‘one’],
‘data1’ : np.random.randn(5),
‘data2’ : np.random.randn(5)})
df
Out[5]:
key1 | key2 | data1 | data2 | |
0 | a | one | 0.495459 | 0.296270 |
1 | a | two | -1.028313 | -1.014234 |
2 | b | one | 0.887314 | 0.518958 |
3 | b | two | -1.381624 | -0.577313 |
4 | a | one | 1.045025 | -0.325994 |
In [0]:
grouped = #TO DO — Compute groupby
grouped #TO DO — Compute mean
Out[9]:
key1
a 0.170724
b -0.247155
Name: data1, dtype: float64
Question#2:
Derive the mean for the series below using Group by into states, years
In [0]:
states = np.array([‘Ohio’, ‘California’, ‘California’, ‘Ohio’, ‘Ohio’])
years = np.array([2005, 2005, 2006, 2005, 2006])
df[‘data1’] #TO DO — Compute groupby and mean
Out[12]:
California 2005 -1.028313
2006 0.887314
Ohio 2005 -0.443082
2006 1.045025
Name: data1, dtype: float64
In [0]:
Question#3:
Explain the code and output below
In [0]:
df[‘data1’].describe() #TO DO — Complete the Code
Out[14]:
count 5.000000
mean 0.003572
std 1.128177
min -1.381624
25% -1.028313
50% 0.495459
75% 0.887314
max 1.045025
Name: data1, dtype: float64
Question#4: Explain the code and output below
In [0]:
df[‘data2’].describe()
Out[15]:
count 5.000000
mean -0.220463
std 0.628949
min -1.014234
25% -0.577313
50% -0.325994
75% 0.296270
max 0.518958
Name: data2, dtype: float64
Reference(s) title & URL:
Part 2
Question#1:
Write a Python program to draw a line with suitable label in the x axis, y axis and a title.
In [0]:
import matplotlib.pyplot as plt
X = range(1, 50)
Y = [value * 3 for value in X]
print("Values of X:")
print(*range(1,50))
print("Values of Y (thrice of X):")
print(Y)
# TO DO - Plot lines and/or markers to the Axes.
# TO DO - Set the x axis label of the current axis.
# TO DO - Set the y axis label of the current axis.
# TO DO - Set a title
# TO DO - Display the figure.
Values of X:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
Values of Y (thrice of X):
[3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99, 102, 105, 108, 111, 114, 117, 120, 123, 126, 129, 132, 135, 138, 141, 144, 147]
Question#2:
Write a Python programming to display a bar chart of the popularity of programming Languages.
Sample data: Programming languages: Java, Python, PHP, JavaScript, C#, C++ Popularity: 22.2, 17.6, 8.8, 8, 7.7, 6.7
In [0]:
import matplotlib.pyplot as plt
x = ['Java', 'Python', 'PHP', 'JavaScript', 'C#', 'C++']
popularity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]
x_pos = [i for i, _ in enumerate(x)]
plt.bar(x_pos, popularity, color='blue')
plt.xlabel("Languages")
plt.ylabel("Popularity")
plt.title("PopularitY of Programming Language\n" + "Worldwide, Oct 2017 compared to a year ago")
plt.xticks(x_pos, x)
# Turn on the grid
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth='0.5', color='red')
# Customize the minor grid
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
# TO DO - draw the plot
Question#3:
Write a Python programming to display a horizontal bar chart of the popularity of programming Languages.
In [0]:
import matplotlib.pyplot as plt
x = ['Java', 'Python', 'PHP', 'JS', 'C#', 'C++']
popularity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]
x_pos = [i for i, _ in enumerate(x)]
plt.barh(x_pos, popularity, color='green')
plt.xlabel("Popularity")
plt.ylabel("Languages")
plt.title("PopularitY of Programming Language\n" + "Worldwide, Oct 2017 compared to a year ago")
plt.yticks(x_pos, x)
# Turn on the grid
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth='0.5', color='red')
# Customize the minor grid
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
# TO DO - draw the plot
Question#4:
Write a Python programming to display a bar chart of the popularity of programming Languages. Use different color for each bar
In [0]:
import matplotlib.pyplot as plt
x = ['Java', 'Python', 'PHP', 'JavaScript', 'C#', 'C++']
popularity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]
x_pos = [i for i, _ in enumerate(x)]
plt.bar(x_pos, popularity, color=['red', 'black', 'green', 'blue', 'yellow', 'cyan'])
plt.xlabel("Languages")
plt.ylabel("Popularity")
plt.title("PopularitY of Programming Language\n" + "Worldwide, Oct 2017 compared to a year ago")
plt.xticks(x_pos, x)
# Turn on the grid
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth='0.5', color='red')
# Customize the minor grid
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
# TO DO - draw the plot
Question#5:
Write a Python programming to create a pie chart of the popularity of programming Languages.
Sample data: Programming languages: Java, Python, PHP, JavaScript, C#, C++ Popularity: 22.2, 17.6, 8.8, 8, 7.7, 6.7
In [0]:
import matplotlib.pyplot as plt
# Data to plot
languages = 'Java', 'Python', 'PHP', 'JavaScript', 'C#', 'C++'
popuratity = [22.2, 17.6, 8.8, 8, 7.7, 6.7]
colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", "#8c564b"]
# explode 1st slice
explode = (0.1, 0, 0, 0,0,0)
# Plot
plt.pie(popuratity, explode=explode, labels=languages, colors=colors,
autopct='%1.1f%%', shadow=True, startangle=140)
plt.axis('equal')
# TO DO - draw the plot
Question#6:
Population pyramid can be used to show either the distribution of the groups ordered by the volumne. Or it can also be used to show the stage-by-stage filtering of the population as it is used below to show how many people pass through each stage of a marketing funnel.
In [0]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Read data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv")
# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
group_col = 'Gender'
order_of_bars = df.Stage.unique()[::-1]
colors = [plt.cm.Spectral(i/float(len(df[group_col].unique())-1)) for i in range(len(df[group_col].unique()))]
for c, group in zip(colors, df[group_col].unique()):
sns.barplot(x='Users', y='Stage', data=df.loc[df[group_col]==group, :], order=order_of_bars, color=c, label=group)
# TO DO - draw the plot xlabel
# TO DO - draw the plot ylabel
plt.yticks(fontsize=12)
# TO DO - plot the title
# TO DO - plot the legend
# TO DO - plot the graph