Type your student number here:
Part A Testing Assumptions
The data file you will need for these assessment tasks is titled ” part A output.spv ” and can be found on unnihub. You are required to check data assumptions from a multiple regression data screening procedure by interpreting this SPSS output file.
The procedure was conducted on 170 cases with the following variables: “iv1” to “iv5” which refer to mean scores on five independent variables and “DV” refers to mean scores on a continuous dependent variable.
Section 1
These questions are about the table of correlations in the SPSS output file.
Table 1. Correlations amongst the variables in the regression model
DV | iv1 | iv2 | iv3 | iv4 | iv5 | |
DV | ||||||
iv1 | .131 | |||||
iv2 | .063 | |||||
iv3 | -.010 | |||||
iv4 | -.032 | |||||
iv5 | -.021 |
Q1 Observing Table 1 above (partially reproduced from the output file) assess whether the following statements are true or false (10 marks):
- the correlation between iv2 and the DV is statistically significant
True False
- Amongst the DV with IV correlations, the highest is between iv1 and the DV
True False
- In a regression model, the variables iv2, iv3, iv4, and iv5 are likely to be weak predictors of the DV
True False
- the correlation between iv5 and the DV is strong and negative
True False
- the correlation between iv1 and the DV is statistically significant
True False
Q2 Looking at the correlation table in the output file, what is the highest correlation between the IVs? Which variables is this correlation between? Is it problematic? What assumption are you assessing by examining it? (8 marks)
highest correlation: .973
Which variables : IV3 and IV5
Is it problematic: yes
What assumption: Multicollinearity
Section 2
These questions are about the Tolerance and VIF values in the SPSS output
Q3 If variable has a Tolerance <.1 and VIF value > 10 what does that indicate? (2 marks)
Answer: Multicollinearity is present in the data
Q4 Write the name(s) of any variables fulfilling the criteria to the question 3 above (if none, write “none”). (5 marks)
Answer: IV3 and IV5
Q5 What action would you advise be taken in the light of your previous answer (5) marks)
Answer: remove one of the variables from the analysis
Section 3
These questions are about outliers (2 marks each)
Q6 What is the maximum value of the std residuals from the output.
Answer: 2.848
Q7 Does this indicate any univariate outliers?
Answer: No
Q8 In the output, what is the maximum value of the of statistic which assesses if outliers are influential in these data?
Answer: .077 (Cook’s Distance)
Q9 Are there any influential outliers in the data?
Answer: No
Q10 Name the distance statistic used to identify multivariate outliers.
Answer: Mahalanobis Distance
Section 4
These questions are about other assumptions
Q11 Other than normality, name two other data assumptions that can be tested by inspecting the standardised residual scatterplot (2 marks)
Answer1: Linearity
Answer2: Homoscedasticity
Q12 Referring to your answer to Q10 and the standardised residual scatterplot in the output, comment briefly on whether or not the two assumptions are met. (4 marks)
Answer1: No assumption of Linearity is not met. The points are not evenly spread above and below the line at zero and it is not a rectangular distribution
Answer2: Yes, Homoscedasticity assumption is met as there is no evidence of a funnel-shaped distribution
Q13 What data assumption does the Dubin-Watson statistic test? (1 mark)
Answer: independence of residuals
Q14 Is this assumption met in the present data? Briefly explain how you arrived at this answer (3 marks)
Answer: No the assumption of independence of residuals has not been met as the value of the Dubin-Watson statistic is .061. It needs to be between 1.5 and 2.5
Part B Hierarchical Multiple Regression
The data file you will need for these assessment tasks is titled ” Hierarchical Regression Practice Data.sav” and can be found on unihub along with this file. Please analyse the data file as it is. Do not delete or change any of the data.
A researcher measures a DV and 3 IVs. She wants to test IV3 while controlling for IV1 and IV2. Conduct a hierarchical regression with the control variables in the same block and IV3 in a separate block. Select the relevant output statistics.
Q15 Which variables are entered into each model? (4 marks)
Model 1 | Model 2 |
IV(s) = IV1 and IV2 | IV(s) = IV1 and IV2 and IV3 |
DV = DV | DV = DV |
Q16 Give the following values for each model (12 marks)
Model 1 | Model 2 |
R = .58 R2= .34 Adjusted R2= .31 | R = .69 R2= .48 Adjusted R2= .46 |
Q17 Approximately what % of the total variance in the DV is accounted for by each model? (4 marks)
Model 1 | Model 2 |
% of variance = 31-34 | % of variance = 46-48 |
Q18 For model 1, formally report the statistical test of R and its significance. Report the F ratio, degrees of freedom and p value. (4 marks)
f(2,91)= 22.94, p<.001
Q19 For model 2, formally report the change statistics (3 marks)
R2change = .142, fchange (1,90) =24.44, p<.001
Q20 How much more variance does model 2 predict and is this significantly more than model 1? (2 marks)
14% yes, it is significant
Q21 What are the standardised regression weights for the 3 variables for model 2 and formally report the statistic testing if they are significant? (9 marks)
IV1= beta = -.03 t(90) =.37, p=.76
IV2= beta = .23 t(90) =2.24, p=.03
IV3= beta = .52 t(90) =4.94, p<.001
Q22 Report and interpret the sr2 values for model 2. (12 marks)
sr2 | interpretation |
IV1= .00078 | Less than .1% of the unique variation in the DV is explained by IV1 |
IV2= .0289 | Nearly 3% of the unique variation in the DV is explained by IV2 |
IV3= .142 | Just over 14% of the unique variation in the DV is explained by IV2 |