Instructor:
Email:
Date of Presentation (50%):
Date of subumission (50%):
In this coursework you are expected to:
- Apply Data Analysis concepts and skills to a real-life business problem using Python.
- Develop teamwork competency through collaborating with your co-worker.
- Write and communicate an academic project in a professional creative manner using Jupyter notebook.
- Avoid plagiarism. I this case, all group members will get zero with
B
grade in the transcript.
Group No.: […]
student 1: Full name + Student ID
student 2: Full name + Student ID
Group formation:
- Students must complete this group project in the self-selected groups of 2 students.
- One member of a group should inform the module leader about the group formation. Send
full names
andstudent IDs
by email to *** - If you fail to form a gorup, the module leader will randomly create it.
Delivery (CRWK submission):
- Use this CRWK Report template (download
CRWK_CN6009_2022.ipynb
from Moodle), and DO NOT use any other file/report/template. - Convert your source file (.ipynb) to HTML and submit it. If you work with Google Colab, check the last cell.
- The subumission is due on
- Use Turnitin submission link in the “Assessment and Feedback” section In Moodle site or here:
Marks Breakdown:
- Manage
Null
values [5 marks] - Data summarization using Python libraries and methods [45 marks]
- Descriptive statistic to quantitatively describe and summarize data [30 marks]
- PPDPlan (Personal Professional Development Plan) [20 marks]
Import Libraries and Data
In [ ]:
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns# import whatever your need
# load datadf_corona = pd.read_csv('covid-data.csv')df_corona.sample(5)
Manage Null Values [5 marks]
Task 1: Find null values and replace/drop
with your own approach.
In [ ]:
# explain your approach here.
Data Summarization [15 marks]
Task 2: Print/Visualize the sum of total_tests
for countries in Africa
.
In [ ]:
# add your solution here.
Descriptive Statistics Approaches [30 marks]
Task 3: Apply your own Descriptive Statistics approaches on 3 columns and explain/visualize/interprete your findings.
In [ ]:
# add your solution here.
Data Summarization [20 marks]
Task 4: Create a new DataFrame called DF_deaths
, containing only the columns new_cases
, total_deaths
, and aged_65_over
. Make sure to avoid Null
values.
- What is the
Mean, Mode, Median and Skewness
of thetotal_deaths
wherenew_cases
is greater than the mean? - What is the
5-number summary
of thetotal_deaths
whereaged_65_over
is less than the mean?
In [ ]:
# add your solution 4.1 here.
In [ ]:
# add your solution 4.2 here.
Data Summarization [10 marks]
Task 5: Create a function to do a specific task on data.
hint: The more challenging/advanced function, the more mark.
In [ ]:
# add your solution here.
Complete PPDPlan (Personal Professional Development Plan) [20 marks]
Task 6: Every member must complete PPDPlan (CN6009).dox
file and submit it alongside this report.
It means, if there are two members in a group, you have to submit two PPDPlan (CN6009).dox
files alongside report.
How to convert ipynb
to HTML
in Google Colab
?
- Download your
CRWK_CN6009_2022.ipynb
solution file. - Drag-drop it in Google Colab.
- Execute this:
!jupyter nbconvert --to html CRWK_CN6009_2022.ipynb
- Submit
.HTML
report via Turnitin in *** Moodle site.