This  excel workbook contains information on home sales across four ZIP codes in King County during the period of May 2014 through May 2015 (791 rows total).

The entire dataset contains about 20,000 rows and information for 70 ZIP codes. 

  • The first sheet, About, tells you what each column represents.
  • The second sheet, Data, contains the data for this project. This is the same data you used in Project Part 1.
  • The third – fifth sheets,  including CorrelationsMultiple Linear Models, and Residual Plots is where you will produce some of the graded work for this project part. 
  • The sixth sheet, Real Estate Calculator, is where you will create a home price estimator using the best model you found. results. 
  • The seventh sheet, Example Calculators, shows three different possible approaches to create your real estate calculator.

Each graded sheet already has areas laid out for you to insert your work. 

Examine Correlations Between Input Variables

Before building your multiple linear regression model, you need to pick the best variables for thus model. Try to distinguish those by calculating their correlations.

Follow these steps:

STEP 1: Create a correlation matrix.

1. Proceed to STEP 1 in the Correlations sheet.

2. Starting at cell B10, compute the correlation between the price and bedrooms variables using the CORREL() function. Continue to calculate the correlations for remaining variable pairs.

Note: You will only have to calculate the correlations for the empty white cells. As you’re filling in your white cells, it will auto-populate the other half of the correlation matrix. For example, cell B10 will have auto-filled cell C9.

3. You have now created a correlation matrix! Examine your correlations and then move on to the next step.

STEP 2: Rank the correlations

1. Proceed to STEP 2 in the Correlations sheet.

2. Re-visit the correlation coefficients you calculated in STEP 1. Find the three most correlated variables, also known as the highest correlations. (Remember, you are looking for the r-values closest to +1 or -1.)

3. Once you’ve identified the three highest r-values, add the highest r-value to cell B21, and it’s corresponding variable names to cell C21:D21. Continue to do this for the second and third highest correlations, in the bottom rows.

4. Now, time for least correlated variables. Re-visit the correlation coefficients you calculated in STEP 1. Find the three least correlated variables, also known as the lowest correlations. (Remember, you are looking for the r-values closest to 0)

5. Once you’ve identified the three lowest r-values, add the lowest r-value to cell G21, and it’s corresponding variable names to cell H21: I21. Continue to do this for the second and third lowest correlations, in the bottom rows.

STEP 3: Assess the correlations.

When you build a multiple linear model, ideally you are looking for a variety of input variables that don’t have a strong correlation to one another. For example, sqft_living and sqft_above are strongly correlated. If you build a simple linear model using sqft_living, and then build a multiple linear model by adding in sqft_above, you find that you don’t get much payoff by adding that second variable. That’s because the model already has the information it needs about the general size of the house. 

1. Proceed to STEP 3 in the Correlations sheet.

2. Time to analyze your correlations. In cell A29, write a brief assessment of your findings with regards to correlations between input variables. Answer the following question in your assessment.

  • What variables do you feel would work best for your multiple linear regression models? Why?

Create Multiple Linear Regressions

Now that you examined the correlations between your input variables, it’s time to start building some multiple linear models. 

STEP 1: Create Multiple Linear Regression Models

1. Choose two of the variable pairs from the Correlations sheet you will use to create your multiple linear regression models. 

HOW SHOULD I DECIDE WHICH VARIABLES TO CHOOSE?

You want two variables that will work together to make a strong joint model. This means you should choose variables that are both strongly correlated to price but are not strongly correlated to each other.

You don’t want to select two variables that are strongly correlated to each other because the second variable doesn’t bring much extra information. For example, if you built a simple linear model using sqft_living and then added sqft_above to your model, you wouldn’t be adding much information to your model, because sqft_living and sqft_above are highly correlated. The model had already accounted for the general idea of sqaure footage. But if you use two variables that are both related to price, but that aren’t so related to each other (perhaps sqft_living and age?) you’re bringing in more information to your model. 

2. Proceed to STEP 1 in the Multiple Linear Models sheet. Type the names of the two variables you selected into cell B7 and C7.

3. In cell A9, use the LINEST() function to compute a multiple linear model for these two input variables vs. price. Include the intercept and verbose parameters, both set to true. 

HINT

=LINEST(y values, {x1 values, x2 values}, True, True)

4. Use the information from the LINEST() and write out the corresponding equation in cell B15

HINT

The rightmost value the LINEST() returns (cell C9) is the y-intercept, and the x-coefficients are returned in the opposite order as they were input. 

5. Repeat these instructions for a second pair variable names in cell B16 and C16. (You can keep one of the variables you used the first time.) Calculate LINEST() in cell A18 and enter the corresponding linear equation in cell B23.

6. Finally, choose three variables (you can reuse any or all of the variables you used in the two-variable linear regressions) and compute a multiple linear regression for three variables. Start by including the three variable names in cells B24, C24, and D24. Calculate LINEST() in cell A26 and enter the corresponding linear equation in cell B31.

Compute RMSE

Now that you came up with multiple linear models for two pairs and one group of three, you can compare how well they performed against the simple linear regressions you computed in Part 1 of this project. 

Follow these steps:

STEP 2: Compute RMSE

1. From the Data sheet, copy the values from your top-correlated input variables starting at column H7: and I7: in the Multiple Linear Models sheet. 

2. To calculate the prediction, you need to match the coefficients to the variables correctly. In cell J7, enter the following:

=$C$9*H7 + $B$9*I7 + $D$9

This represents the product of the first coefficient-variable pair plus the product of the second coefficient-variable pair plus the y-intercept.

3. Copy the formula into the rest of column J, ensuring that the values next to column H and column I update correctly, while the coefficients and the y-intercept remain the same. 

4. In column K, compute the difference between the predicted price and the actual price. (predicted – actual)

5. In column L, square the values in column K. This is the squared error. 

6. Repeat this process for the other two multiple linear models you computed. 

Plot Residuals

When you created a scatterplot showing the residuals for simple linear models, you plotted the input variable on the x-axis. With a multiple linear model, you have more than one input variable. Instead of creating a scatterplot, you will plot the distribution of residuals using a histogram. The more tightly your residuals cluster around zero, the better your model did with its predictions. 

Follow these steps:

1. In the Multiple Linear Models sheet, select column K and insert a chart. Under Chart Type, scroll down until you find Histogram. Copy and paste this chart into “INSERT RESIDUAL PLOT HERE” under Model #1 Residual Plot in the Residual Plots sheet. 

2. Compute the r-squared value based on your predicted and actual y-values for this model in cell G7.

3. Repeat for your other two multiple linear models. 

4. Time to analyze your visualizations. Write brief assessments about the shape and range of each of the residual plots.

  • Are they normally distributed?
  • Are they skewed?
  • Are there any extreme values?
    Note: You might need to temporarily increase the size of your chart to see extreme values since they won’t have much height.

Intermission: Explore the Colab Notebook (Part 2 Supplement)

You will now briefly leave this page and work through a Colab notebook to further examine the possibilities of building multiple linear models.

  1. Navigate to the Colab notebook (Part 2 Supplement) by copying the link below.
https://anointed-dobsonfly-hebs-1622.nt.run/notebooks/8-4_C4_0_Housing_project_pt2_091119.ipynb
  1.  
  2. Follow the instructions in the Colab notebook no need to the knowledge check.
  3. When you’re finished, return to this tab to finish Part 2 of the project.

As a reminder, you will not be graded on your work in the Colab notebook (Part 2 Supplement). 

But you need to complete this part of the project in order to complete the project as a whole. 

Create a Real Estate Calculator

Use the information gathered from creating linear models of the Seattle housing data in Google Sheets (Multiple Linear Models sheet of this spreadsheet) and the Colab you just completed (Part 2 Supplement) to create a final model.

Your final model will be a Real Estate Calculator with at least:

  • three numerical predictors
  • one categorical predictor
    Note: The categorical variable can be a Boolean dummy variable or you can create a different linear model for each categorical value.

How to create a Real Estate Calculator:

You will not be creating a calculator from scratch. There is a calculator template provided for you in the Real Estate Calculator sheet. Your goal is to fill in the calculator with the statistical analysis you have completed thus far to make more accurate predictions.

In the Example Calculator sheet, there are three examples of how you can approach customizing your own calculator. Note that these examples only use two numerical predictors. Your final model must have at least three numerical predictors. 

EXAMPLE CALCULATOR EXPLANATIONS

  •  The first example uses a dropdown to specify the categorical variable, and it relies on the use of IF() statements to compute the final prediction. You can examine how this model works by looking at the contents of cell E9. 
  • The second example in the Example Calculator sheet uses three separate lines, one for each possible categorical value. Each line represents a distinct model. You can examine how these models work by looking at the contents of cells E24, E25, and E26. 
  • Finally, the third example gives an example of a model that predicts the log price. This example does not incorporate the categorical data; the model is the same for all vehicle types. You can examine how this model works by looking at the contents of cells E41 (log price) and F41 (raw price transformation). If you use this model as your inspiration, you should also look to the other two examples for ideas about how to incorporate the categorical data into your model. 

Ultimately, it’s up to you to create the most accurate model and user-friendly calculator you can. Test it out for several values to see whether they make sense. When you are happy with your model, complete the final step below, and move on to the submission instructions below.

Follow these steps:

  1. Proceed to the Real Estate Calculator sheet.
  2. Decide on the predictors for your model. Remember they have to meet the following criteria (at least):
    1. three numerical predictors
    1. one categorical predictor
  1. Once decided, replace “categorical_predictor” in cell C5 and “numerical_predictor“s in cells D5E5, and F5 with the chosen variable names.
  2. Cell G6 is where you will predict home prices based on your selection of input variables. Insert the best-fit equation based on your analysis. FEELING STUCK?

Feel free to take inspiration from the other Real Estate Calculators. 

How does their equation compare to the ones you’ve created earlier in your analysis?

  1. Test out your calculator. Change values for your predictors in cells C6:F6 to update the Predicted Value of the property.        
  2. In cell D12, write four to six sentences describing your model and how you arrived at it. 

All papers are written by ENL (US, UK, AUSTRALIA) writers with vast experience in the field. We perform a quality assessment on all orders before submitting them.

Do you have an urgent order?  We have more than enough writers who will ensure that your order is delivered on time. 

We provide plagiarism reports for all our custom written papers. All papers are written from scratch.

24/7 Customer Support

Contact us anytime, any day, via any means if you need any help. You can use the Live Chat, email, or our provided phone number anytime.

We will not disclose the nature of our services or any information you provide to a third party.

Assignment Help Services
Money-Back Guarantee

Get your money back if your paper is not delivered on time or if your instructions are not followed.

We Guarantee the Best Grades
Assignment Help Services