Instructor:
Email:
Date of Presentation (50%):
Date of subumission (50%):
In this coursework you are expected to:
- Apply Data Analysis concepts and skills to a real-life business problem using Python.
- Develop teamwork competency through collaborating with your co-worker.
- Write and communicate an academic project in a professional creative manner using Jupyter notebook.
- Avoid plagiarism. I this case, all group members will get zero with
Bgrade in the transcript.
Group No.: […]
student 1: Full name + Student ID
student 2: Full name + Student ID
Group formation:
- Students must complete this group project in the self-selected groups of 2 students.
- One member of a group should inform the module leader about the group formation. Send
full namesandstudent IDsby email to *** - If you fail to form a gorup, the module leader will randomly create it.
Delivery (CRWK submission):
- Use this CRWK Report template (download
CRWK_CN6009_2022.ipynbfrom Moodle), and DO NOT use any other file/report/template. - Convert your source file (.ipynb) to HTML and submit it. If you work with Google Colab, check the last cell.
- The subumission is due on
- Use Turnitin submission link in the “Assessment and Feedback” section In Moodle site or here:
Marks Breakdown:
- Manage
Nullvalues [5 marks] - Data summarization using Python libraries and methods [45 marks]
- Descriptive statistic to quantitatively describe and summarize data [30 marks]
- PPDPlan (Personal Professional Development Plan) [20 marks]
Import Libraries and Data
In [ ]:
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns# import whatever your need
# load datadf_corona = pd.read_csv('covid-data.csv')df_corona.sample(5)Manage Null Values [5 marks]
Task 1: Find null values and replace/drop with your own approach.
In [ ]:
# explain your approach here.
Data Summarization [15 marks]
Task 2: Print/Visualize the sum of total_tests for countries in Africa.
In [ ]:
# add your solution here.
Descriptive Statistics Approaches [30 marks]
Task 3: Apply your own Descriptive Statistics approaches on 3 columns and explain/visualize/interprete your findings.
In [ ]:
# add your solution here.
Data Summarization [20 marks]
Task 4: Create a new DataFrame called DF_deaths, containing only the columns new_cases, total_deaths, and aged_65_over. Make sure to avoid Null values.
- What is the
Mean, Mode, Median and Skewnessof thetotal_deathswherenew_casesis greater than the mean? - What is the
5-number summaryof thetotal_deathswhereaged_65_overis less than the mean?
In [ ]:
# add your solution 4.1 here.
In [ ]:
# add your solution 4.2 here.
Data Summarization [10 marks]
Task 5: Create a function to do a specific task on data.
hint: The more challenging/advanced function, the more mark.
In [ ]:
# add your solution here.
Complete PPDPlan (Personal Professional Development Plan) [20 marks]
Task 6: Every member must complete PPDPlan (CN6009).dox file and submit it alongside this report.
It means, if there are two members in a group, you have to submit two PPDPlan (CN6009).dox files alongside report.
How to convert ipynb to HTML in Google Colab?
- Download your
CRWK_CN6009_2022.ipynbsolution file. - Drag-drop it in Google Colab.
- Execute this:
!jupyter nbconvert --to html CRWK_CN6009_2022.ipynb - Submit
.HTMLreport via Turnitin in *** Moodle site.


