This is a detailed report that you will develop based on your analysis of your team project data. Make sure to include all the sections and, complete and describe the results of each of the steps detailed below.
Report Write-up Structure
- Title Page: provides the title of the study along with the names of the team members.
- Introduction: is a section introducing the study with a one paragraph scenario that mentions what the study is about and what the objectives are.
- Background: is a section describing the numerical dependent (i.e., response) variable of interest as well all the potential numerical and categorical predictor variables that will be used to develop the multiple regression model.
- Simple Regression Modeling – is a section, demonstrating the complete development of a simple linear regression model that was done in Module 2. You would comment on your summary of findings from this report.
- Multiple Regression Modeling: is a section, based on M2 and M3, demonstrating the complete development of a multiple regression model using the numerical predictor variable (used in Part 1 of the project), several other numerical predictors and one categorical (dummy) predictor variable as necessary. The 8-step modeling process should be followed. Given the developed scenario, appropriate values of the predictor variables should be used to obtain the prediction (i.e the value of the dependent/response variable).
- Appropriate tables and charts should be included in the body of this section or in an Appendix to the report.
- Summary of Finding: The report should end with a short paragraph that connects to the Introduction section scenario
Developing the Report
Use the same Excel project file for Part 2 of the project that you had used in Part 1 of the project. As mentioned in Part 1 of the project, each Excel project file has two tabs – DATA and Variable INFO. The DATA worksheet contains a YELLOW column representing the numerical response variable. The DARK ORANGE column, is the numerical predictor variable used in the simple regression modeling in Part 1 of the project. For Part 2 of the project use the DARK ORANGE column numerical predictor variable, the BLUE column categorical variable (the “dummy” variable) and all other WHITE column numerical variables as potential predictor variables in the multiple regression analysis.
Introduction
Continuing with the introduction you had developed for Part 1 report of the team project, using the YELLOW-highlighted numerical variable column in your Excel worksheet as the response variable of interest, create a “scenario” of one paragraph that describes the data file for your project and why you (the ?????? Corporation/Group) are performing a multiple regression analysis with multiple variables this time.
Background
Describe (in words) the rationale for considering these predictor variables as potential predictors of the numerical response variable. One of these predictor variables should be a two-category “dummy” variable – the BLUE-highlighted categorical variable column in your Excel worksheet.
Multiple Regression Modeling Steps
- Open the Excel worksheet containing your Team Project Data.
- As you learned in Modules 2 and 3, you will be using the set of potentially meaningful numerical independent variables and the one selected “two-category” dummy variable in your study to develop a “best” multiple regression model for predicting your numerical response variable Y. Follow the step by step modeling process described in the PowerPoints at the end of Module 3.
- Start with a visual assessment of the possible relationships of your numerical dependent variable Y with each potential predictor variable by developing the scatterplot matrix (use JMP) and paste this into your report.
- Then fit a preliminary multiple regression model using these potential numerical predictor variables and, at most, one categorical dummy variable.
- Then assess collinearity with VIF until you are satisfied that you have a final set of possible predictors that are “independent,” i.e., not unduly correlated with each other. Note your observations.
- Use stepwise regression approaches to fit a multiple regression model with this set of potentially meaningful numerical independent variables (and, if appropriate, the one selected categorical dummy variable).
- (1) Based on the forward modeling criterion determine which independent variables should be included in your regression model.
- (2) Based on the backward selection modeling criterion determine which independent variables should be included in your regression model.
- (3) Based on the mixed selection modeling criterion determine which independent variables should be included in your regression model.
- (4) Based on the Adjusted r2criterion determine which independent variables should be included in your regression model.
- Comment on the consistency of your findings in Step 2D (1)-(4) (if (or if not) they are the same, explain why? hint: see VIFs).
- Paste screenshots of (1), (2), and (3) outputs from Step 2D above into your report.
- Based on Step 2D (along with the principle of parsimony if necessary) select a “best”multiple regression model. Note your finding.
- Using the predictor variables from your selected “best” multiple regression model, rerun the multiple regression model in order to assess its assumptions. You may use Excel or JMP for this step.
- Look at the set of residual plots, cut and paste them into the report, and briefly comment on the appropriateness of your fitted model.
- (1) If the assumptions are met and the fitted model is appropriate, continue to Step 2J.
- (2) If the normality assumption is problematic, state this but continue to Step 2J. Note: You do not need to check the assumption of independence in your project. That assumption is met because your project is not time-dependent.
- (3) If either the linearity or equality of variance assumption is violated in one or two scatter plots of Y with individual predictors then transform the particular independent variables involved (try log, square root, or etc.) and rerun the multiple regression model as in Step 2H.
- Assess the significance of the overall fitted model. Note your observation.
- Assess the significance of each predictor variable. Note your observations.
- Write the sample multiple regression equation for the “final best” model you have developed.
- Interpret the meaning of the Y intercept and interpret the meaning of all the slopes for your fitted model (but do this in whatever units you used for Y to build this model).
- Interpret and describe the meaning of the coefficient of multiple determination r 2 .
- Interpret and describe the meaning of the standard error of the estimate SYX (in the units you used to build this model).
- Determine the 95% confidence interval estimate for each coefficient estimate (that you find for the independent variables) and interpret their impact on the dependent variable (Y) accordingly. Explain why we need to consider/study the confidence interval estimate of coefficients (hint: sample data).
- Select one value for each of your independent variables in their respective relevant ranges:
- Predict ŷ and include the units in the results