Multiple Regression Analysis Final Project Description
The purpose of this assignment is to apply multiple regression concepts, interpret multiple regression analysis models, and justify business predictions based upon the analysis.
For this assignment, you will use the “Strength” dataset. You will use SPSS to analyze the dataset and address the questions presented. Findings should be presented in a Word document along with the SPSS outputs.
The compressive strength (Y) of concrete is influenced by the mixing proportions and by the time that it is allowed to cure, although the exact relationship between the strength and the components is unknown. The provided data includes the results of n = 1030 concrete strength experiments that include the following:
Strength (in MPa): The compressive strength of the concrete.
Age (in days): The number of days the concrete was allowed to cured.
Coarse_Aggregate (in kg/m3): The proportion of coarse aggregate in the mix.
Fine_Aggregate (in kg/m3): The proportion of fine aggregate in the mix.
Cement (in kg/m3): The proportion of cement in the mix.
Slag (in kg/m3): The proportion of furnace slag in the mix.
Superplasticizer (in kg/m3): The proportion of plasticizer in the mix.
Water (in kg/m3): The proportion of water in the mix.
Ash (in kg/m3): The proportion of fly ash in the mix.
Derive various transformations of compressive strength to determine which transformation, if any, results in a variable that most closely mimics a normal distribution. To do this, plot Q-Q plots after each transformation listed below, and decide which one should be used to build a multiple linear model. Explain your answer and provide the SPSS output as an illustration.
Strength (no transformation)
Square root of Strength
(Natural) Log of Strength
Reciprocal of Strength
Based on the transformation selected in Part 1, build a multiple linear regression model with all eight predictors.
Use t-tests to determine if any of the predictors significantly affect the compressive strength of concrete. Explain why each variable should or should not be included in the model. Assume α = 0.05. Show the appropriate model results to explain your answer.
If any predictors from question 1 are found to be not significant, remove them and re-run the model to create a reduced model (RM). Are all the remaining variables still statistically significant? Show the appropriate model results to explain your answer.
Based on the RM, should there be concern about multicollinearity among the predictors selected? Show the appropriate model results to explain your answer.
After fitting the RM, derive the residual plot (standardized residuals vs. standardized predicted values) and normal probability plot. Interpret each plot.
What is the coefficient of determination, R2, of the RM? How would you interpret the R2?
Based on the RM, what would be the new estimated compressive strength that is currently 50 MPa, after a 10-day increase in curing time? Assume all other predictors are held constant.
How would you interpret the intercept (constant) in the RM? Does the interpretation make sense given the data you used to build the RM?
Given the following components and aging time below, what is the estimated compressive strength based on the RM?
Age: 50 days
Coarse_Aggregate: 900 kg/m3
Fine_Aggregate: 600 kg/m3
Cement: 300 kg/m3
Slag: 200 kg/m3
Superplasticizer: 7 kg/m3
Water: 190 kg/m3
Ash: 70 kg/m3
What is a 95% confidence interval of the estimate in Part 3? How would you interpret the 95% confidence interval? (Hint: Use the SPSS scoring wizard to address this question.)