STA20006 Assignment (Part A) Pg: 1
______________________________________________________________________________________________
Assessment Title: Assignment (Part A)
Due Date: Wednesday 11th April at 11:59 PM
Assessment weighting: 20%
Assessment type: Individual
_______________________________________________________________________________________________
Introduction
The objective of this assignment is for you to understand research scenarios that concern correlation and regression.
In particular, it is important that you master the components of multiple regression, such as moderation and
mediation, as these techniques are regularly used within the Health Sciences. It is highly recommended that you
attempt the assignment questions on a weekly basis (after completing each module) so that the information is still
fresh and does not become overwhelming if you leave it to the last minute.
Recommended timeline of assignment tasks:
Task 1 | after completing module 1: A review of concepts
Task 2 | after completing module 2: Correlation & regression revisited
Task 3 | after completing module 3: The basic concepts of multiple regression
Task 4 | after completing module 4: Part & partial correlations
Task 5 | after completing module 5: Report writing, testing assumptions and, moderation
To successfully complete these 5 tasks, it is highly recommended that you attend all of the lecture and tutorial
sessions, and complete the accompanying activities / questions. If you have any queries, require an extension, or any
other related issues, please contact the unit convenors: Minh Huynh (mhuynh@swin.edu.au) or Laura Tirlea
(lauratirlea@swin.edu.au) first.
Submission:
Assignments are to be submitted via blackboard (through Turnitin) on or before the due date. Students are expected
to submit the assignments as word / pdf documents. The file name should be of the form Surname_Task1.doc For
example, if your name was Minh Huynh, your first workbook should be saved in a file called Huynh_Task1.doc.
SPSS Output:
All relevant SPSS outputs should be attached as an appendix. Appropriate graphs and tables of statistics should be
included in the main body of the report. DO NOT place large SPSS tables full of irrelevant statistics in the body of
your report. If there are just a few statistics include them in the text.
Swinburne University of Technology – Department of Statistics, Data Science & Epidemiology
STA20006 Analysis of Variance & Regression
Semester 1 2018
_______________________________________________________________________________________________
STA20006 Assignment (Part A) Pg: 2
Assessment Task
Non-alcoholic fatty liver disease (NAFLD) is one of the major health crises affecting the modern age. It has a global
prevalence of 25.2% (Younossi et al, 2016), and is predicted to eclipse hepatitis C as the leading cause of live
transplants by 2020. Many researchers believe that the NAFLD epidemic is driven by skyrocketing rates of obesity,
diabetes, high cholesterol, and lifestyle choices.
In 2018 a study was conducted to explore the factors associated with NAFLD for a sample of 150 Australian adults.
The participants in this sample completed a variety of physical, mental and psychological assessment tasks, and had
their data recorded by members of the research team. The full list of variables and descriptions of the data collected
can be found below:
Information on the data collected
Variable Variable coding Variable description
ID NA The identification number allocated to each participant
Age NA The participants age (years)
Sex 0: Female
1: Male The sex of the participant
Country 0: Australian
1: Overseas The participant’s country of birth
M_Status
0: Never married
1: Married
2: Separated
The participant’s present marital status
R_Hypertension NA
A doctor’s assessment of the severity of the participant’s
hypertension. Higher scores indicate higher levels of hypertension.
R_Obesity NA A doctor’s assessment of the severity of the participant’s obesity.
Higher scores indicate higher levels of obesity.
SoftDrinks NA The number of soft drinks consumed (375ml per serving) during the
study period.
R_NAFLD NA A doctor’s assessment of the severity of the participant’s NAFLD.
Higher scores indicate higher levels of NAFLD.
Inactivity NA A doctor’s assessment of the of the participant’s Inactivity level.
Higher scores indicate higher levels of inactivity.
BGL NA The participants blood glucose level (which was measured at the
end of the study period)
CAF NA A doctor’s assessment of the of the participant’s consumption of
caffeine. Higher scores indicates more caffeine consumption
_______________________________________________________________________________________________
STA20006 Assignment (Part A) Pg: 3
Task ONE (8 marks)
As the researchers were compiling the data for this study, they quickly realised that some of the data for 10 of their
participants was not entered into the SPSS data file. Fortunately, they still possessed the raw data for these 10
participants, which has been provided below:
ID: 29
Age: 21
Sex: Male
Country: Overseas
M. Status: Married
Soft Drinks consumed: 5
ID: 103
Age: 38
Sex: Female
Country: Australian
M. Status: Separated
Soft Drinks consumed: 3
ID: 50
Age: 30
Sex: Male
Country: Australian
M. Status: Never Married
Soft Drinks consumed: 8
ID: 23
Age: 27
Sex: Male
Country: Overseas
M. Status: Never Married
Soft Drinks consumed: 11
ID: 55
Age: 20
Sex: Female
Country: Overseas
M. Status: Never Married
Soft Drinks consumed: 4
ID: 65
Age: 27
Sex: Male
Country: Overseas
M. Status: Married
Soft Drinks consumed: 14
ID: 78
Age: 40
Sex: Male
Country: Australian
M. Status: Separated
Soft Drinks consumed: 5
ID: 142
Age: 32
Sex: male
Country: Overseas
M. Status: Separated
Soft Drinks consumed: 8
ID: 88
Age: 40
Sex: Female
Country: Australian
M. Status: Separated
Soft Drinks consumed: 10
ID: 109
Age: 23
Sex: Female
Country: Australian
M. Status: Never Married
Soft Drinks consumed: 0
a. Add the data for the 10 cases provided here to the existing data file. Ensure you are using the same coding as
the existing cases (HINT: save your new data file!)
b. Generate an appropriate table for the variable: [Marital Status] and briefly comment on the output. Note:
include the SPSS generated table with your answer.
c. Generate an appropriate graph that compares the average inactivity time per month for Australian-born and
Overseas-born workers. Briefly comment on your output. Note: include the SPSS generated graph with your
answer.
Note: this task should not exceed half a page (you may need to resize your output)
_______________________________________________________________________________________________
STA20006 Assignment (Part A) Pg: 4
Task TWO (10 marks)
In 2017, a study by Siddiqi and colleagues was conducted which explored the effects of soft drink consumption on
NAFLD. In their study, the researchers provided evidence to support the hypothesis that excessive consumption of
soft drinks as associated with developing NAFLD. The abstract for this study is provided below:
Given that the above study only concerns undergraduate medical students, it is not a good representation of the
overall Australian population. Thus, you have been tasked with re-investigating this hypothesis (i.e. people who
consume more soft drinks are more likely to be at risk for NAFLD) with your completed data file (see task ONE).
Write a brief report on your results, including an appropriate hypothesis test. Include all relevant statistics and
graphs, and appendix the relevant output.
Note 1: Assume all assumptions have been met
Note 2: Check the marking rubric to see how this question is graded
Note 3: The report should be written in APA format
_______________________________________________________________________________________________
STA20006 Assignment (Part A) Pg: 5
Task THREE (20 marks)
In a follow up study, the researchers decided to investigate the factors that affect hypertension for their sample of
Australian adults. The researchers included the following predictors into their model: (1) BGL, (2) inactivity, and (3)
sex. Using the completed data file, run a multiple regression to address this scenario and answer the following
questions:
a. From the raw correlations table, which of the predictors were significantly related to the dependent
variable? Quote relevant statistics.
b. Give the regression equation for hypertension (to two decimal places).
c. Use the regression equation to predict one’s risk of hypertension for a male with an inactivity score of 36
and a BGL reading of 7.5.
d. Interpret the partial regression coefficient for BGL.
e. Which was the most important predictor in this regression? Quote relevant statistics.
f. When all of the predictors are taken into account, which predictors contributed significantly to the multiple
regression? Quote relevant statistics.
g. Is the value of Multiple R significant? What does this tell us? Quote relevant statistics.
h. How much of the variation in hypertension can be explained by this linear model?
i. Suggest a population that may benefit from these findings. Use the statistical results to support your answer
Note 1: Assume all assumptions have been met
Note 2: When quoting relevant statistics, use APA format
Note 3: Check the marking rubric to see how this question is graded
_______________________________________________________________________________________________
Task FOUR (8 marks)
Continuing from the previous task, the researchers have written down four observations (A to D) that they have
made based upon their analyses:
A. When BGL, inactivity and sex are included in the model, they account for 75.1% of the variability in BGL.
B. The raw correlation coefficient for amount of BGL is .743. That is, by itself, BGL can account for 74.3% of the
variability in hypertension.
C. Inactivity accounts for 65.9% of the variability in hypertension, over and above the variability explained by
the other predictors.
D. The correlation between inactivity and hypertension can be explained in terms of BGL. People with higher
BGL tend to be more inactive, and people who are more inactive tend to have higher levels of hypertension.
Unfortunately, the researchers have interpreted parts of their output incorrectly, which has led to potential errors
with their observations. Your task is to identify and comment on any errors that you can detect for each of the four
observations (A to D) and discuss what should have been written instead.
Note 1: If you agree with any of the observations then write “no errors” for that section
Note 2: You will need to run the analysis yourself to check the researcher’s observations
Note 3: Assume all assumptions have been met
Note 4: Check the marking rubric to see how this question is graded
_______________________________________________________________________________________________
STA20006 Assignment (Part A) Pg: 6
Task FIVE (22 marks)
Suppose the researchers were also interested in investigating the factors which affect risk of Non-alcoholic fatty liver
disease (NAFLD) for their sample of Australian adults. The researchers included the following predictors into their
model: (1) hypertension, (2) obesity, (3) caffeine, and (4) sex (female / male). The researchers hypothesised that:
1) People with more severe hypertension will have higher levels of Non-alcoholic fatty liver disease
2) People with higher levels of obesity will have higher levels of Non-alcoholic fatty liver disease
3) People with higher levels of Caffeine consumption will have higher levels of Non-alcoholic fatty liver disease
Using the data provided from the Assignment1.sav data file, write a report on the analysis addressing these
hypotheses. Include relevant and formatted tables in the body of your report.
Note 1: Assume all assumptions have been met
Note 2: Check the marking rubric to see how this question is graded
Note 3: The report should be written in APA format
End of Assignment (Part A) questions.