STAT3600
Assignment 3 (submit Q7c, Q10, Q11)
Deadline: 11 Apr, 2025
Note: (1) Numeric values should be presented in 4 decimal places. (2) Show the intermediate steps for Q4 — Q11.
1. A personal officer in a governmental agency administered four newly developed aptitude tests to each of 25 applicants for entry-level clerical positions in the agency. For purposes of the study, all 25 applicants were accepted for positions irrespective of their test scores. After a probationary period, each applicant was rated for proficiency on the job. In order, the job proficiency score (y) and the scores on the four tests (x1, x2, x3, x4) for the 25 employees were stored in ‘job.txt’.
a. Fit a regression model for 4 predictor variables to the data. Write down the fitted regression model. State clearly the assumptions.
b. Construct the ANOVA table and hence test whether there is a regression relation. Use α = 0.05.
c. Find the 99% confidence interval for each of the regression coefficients.
d. Test whether x2 can be dropped from the regression model given that x1, x3 and x4 are retained. Use the F test statistic and α = 0.05. State the hypotheses and conclusion.
e. Determine the subset of variables that is selected by the forward selection method, based on the entry level α = 5%. Show your steps. Report the fitted final selected model.
f. Determine the subset of variables that is selected by the backward elimination method, based on the removal level α = 5%. Show your steps.
g. Find the values of R2 and Cp for the full model and the selected model in (e). Comment on the results.
2. A research laboratory was developing a new compound for the relief of severe cases of hay
fever. In an experiment with 36 volunteers, the amounts of the two active ingredients (factor A and B) in the compound were varied at three levels each (1 = low, 2 = medium, 3 = high).
Randomizations was used in assigning four volunteers to each of the nine treatments. The data on hours of relief are stored in ‘hayfever.txt ’. Assume that a two-way classification model is appropriate for the above data.
a. Compile a two-way ANOVA table to test whether the effects of the two factors are additive or not, using a 5% significance level.
b. Test whether or not main effect for each ingredient is present, using a 5% significance level.
c. Given your answer to (a), is it meaningful to test for main factor effects? Explain.
d. Construct a 95% confidence interval for the pairwise comparison in hours of relief between the low and the medium levels of ingredient A. When setting up the contrast, you may average over the three levels offactor B using equal weights.
3. The staff of a service center for electronic equipment includes three technicians who specialize in repairing three widely used makes of disk drives for desktop computers. It was desired to study the effects of technician (factor A) and make (factor B) on the service time. ‘diskdrive.txt’ stores the data showing the number of minutes required to complete the repair job in a study where each technician was randomly assigned to five jobs on each make of disk drive.
a. Formulate a two-way classification model for the above observed data.
b. Test at the 5% level whether there are significant interactions between “technician” and“make”
c. According to your findings in (b), will it be meaningful to compare the difference among the technician WITHOUT specifying the “make" of the disk drives?
d. Calculate the sample mean of the 10 observations for each technician.
e. Show that the variance of the difference between the any two sample means considered in (d) is 2σ2/15, where σ2 is the common variance assumed for the number of minutes for each repair job.
f. Calculate an estimate of σ based on the two-way classification model specified in (a).
g. Derive from (d), (e) and (f) a 95% confidence interval for a contrast in number of minutes for repair jobs between the technicians 1 and 3.
4. For the cell means model, show that the least squares estimators ofthe parameters μi are
maximum likelihood estimators for the normal error, Eij~N(0, σ2).
5. For a 1-way ANOVA model show that
6. For a 1-way ANOVA model, show that
7. Consider a regression analysis of Y on X1 – X4 for 25 observations. The SSE for various sub- models are given below.
None
|
49.3302
|
X1, X2
|
35.4285
|
X1, X2, X3
|
30.6408
|
X1
|
35.4303
|
X1, X3
|
30.9165
|
X1, X2, X4
|
15.1429
|
X2
|
49.2882
|
X1, X4
|
15.4802
|
X1, X3, X4
|
13.7342
|
X3
|
40.2796
|
X2, X3
|
40.0287
|
X2, X3, X4
|
21.3869
|
X4
|
26.4931
|
X2, X4
|
26.3378
|
X1, X2, X3, X4
|
12.9804
|
|
|
X3, X4
|
22.1175
|
|
|
a. Apply simple linear regression model with one regressor at a time. Select regressors whose effects are significant at the 10% level of significance. Report SSR, MSR, MSE, F-value and conclusion.
b. Apply multiple regression model using all the selected regressors in (a). Test the significance of the regression coefficients at the 5% level of significance. Report the ANOVA table, F-values and conclusions.
c. Determine the subset of variables that is selected by the backward elimination method, based on the removal level F = 4. Show your steps. Report SSR, MSE and F-value at each step. Report the selected regressors.
d. Determine the subset of variables that is selected by the forward selection method, based on the entry level F = 4. Show your steps. Report SSR, MSE and F-value at each step. Report the selected regressors.
e. Determine the subset of variables that is selected by the stepwise method, based on the entry level F = 4 and removal level F = 3. Show your steps. Report SSR, MSE and F-value at each step. Report the selected regressors.
f. Determine the subset of variables that is selected by adjusted R2. Show your steps. Report the fitted final selected model.
g. Determine the subset of variables that is selected by Cp. Show your steps. Report the fitted final selected model.
h. Determine the subset of variables that is selected by AIC. Show your steps. Report the fitted final selected model.
i. Determine the subset of variables that is selected by BIC. Show your steps. Report the fitted final selected model.
8. You are given the following matrices computed for a polynomial regression analysis Y = β0 + β1X + β2X2 +ε .
The matrices are properly ordered according to the regression function given above.
a. Calculate the LSE of the regression coefficients.
b. Construct an ANOVA table and thus test at the 5% level of significance whether there is a regression of Y on X and X2.
c. Test at the 5% level of significance whether the quadratic term is necessary.
d. Estimate the mean for Y when X = 1.5. Construct a 95% confidence interval for the estimation.
9. You are given the following matrices computed for a regression analysis with interaction Y = β0 + β1X1 + β2X2 + β3X1X2 + ε .
The matrices are properly ordered according to the regression function given above.
a. Calculate the LSE of the regression coefficients.
b. Construct an ANOVA table and thus test at the 5% level of significance whether there is a regression of Y on X1, X2 and the interaction term.
c. Construct a 95% confidence interval for each of the regression coefficients.
d. Test at the 5% level of significance whether there is an interaction effect.
e. Predict the value of Y for an individual whose X1 = 1.5 and X2 = -0.5 Construct a 95% prediction interval for the individual.
10. Seven observations of a dependent variable X and a factor A are given as follows.
A
|
1
|
1
|
1
|
2
|
2
|
3
|
3
|
X
|
0
|
1
|
0
|
4
|
3
|
3
|
5
|
Consider a one-way classification model.
a. Write down a cell-means model for the analysis.
b. Estimate the parameters in (a).
c. Obtain the fitted values for X.
d. Compile the ANOVA table.
e. Conduct a size 0.05 test to determine whether or not the means for X for the three levels of A differ.
f. Construct a 95% confidence interval for the mean for X for each level of A.
g. By the Bonferroni’s method, construct the simultaneous 95% confidence intervals for the pairwise comparisons among the three levels. Comment on the comparison.
h. Estimate the difference between the mean average for X of Levels 1 and 2 and the mean of X of Level 3. Constructure a 95% confidence interval and hence, comment on the difference.
11. The following data of X are given for three levels offactor A and 2 levels offactor B. There are two observations for each treatment.
|
|
A
|
|
|
|
|
1
|
2
|
3
|
B
|
1
|
0
1
|
5
6
|
4
3
|
|
2
|
2
2
|
7
9
|
2
0
|
a. Calculate and plot the means for X for the six treatments. Does it appear interaction effects between factors A and B? Explain.
b. Write down a factor-effects model for the 2-way classification model with interaction.
c. Estimate the main effects of factors A and B and the effects of interaction, respectively.
d. Compile the ANOVA table.
e. Test the treatment effects at the 5% level of significance. State null and alterative hypothesis, decision rule and conclusion.
f. Test whether or not main effects for factor A, using a 5% significance level. Write down the null and alterative hypothesis, decision rule and conclusion.
g. Given your answer to (e), is it meaningful to test for main factor effects? Explain.
h. Construct Bonferroni’s simultaneous 95% confidence intervals for the total six pairwise comparisons among the three levels offactor A among level 1 offactor B and among the three levels offactor A among level 2 offactor B. Comment on the comparisons.
i. Assume there is no interaction between the two factors. Compile the ANOVA table. Test whether or not main effects for factor A and B, respectively, using a 5% significance level.
12. This study is to establish the suitable correlation model for describing the relationship between strain and hardness during cold rolling forming process of complex profiles. The hardness and the strain data are stored in ‘material’.
a. Consider a simple linear regression model where hardness is the response variable and strain as the independent variable.
i. Report the fitted model.
ii. Test at the 5% level of significance where strain has an effect on hardness.
iii. Report the R2 and thus, comment on the fitness.
b. Produce a scatter plot hardness against strain. Is there a linear relationship?
c. Consider a 4-th order polynomial regression.
i. Report the LSE of the regression coefficients.
ii. Test at the 5% level of significance for each of the regression coefficients.
iii. Test at the 5% level of significance where the non-linear terms of strain have some
effects on hardness or not. State the null and alternative hypothesis, decision rule and conclusion.
iv. Estimate the mean of hardness when strain = 0.8. Construct a 95% confidence interval for the estimate.
13. This study aims to verify the additive role of lung CT-Volumetry in testing the efficacy of three widely distributed COVID-19 vaccinations; namely the "Sinopharm", "Oxford-AstraZeneca", and "Pfizer-BioNTech" vaccinations. The CT-Severity scores and the vaccinations used for a number of patients are stored in ‘vaccine.csv’. The variables are given as follows.
Variable
|
Description
|
CTscore
|
CT-score
|
Vaccine
|
Vaccination
1 = Non Immunized, 2 = Sinopharm, 3 = Oxford-AstraZeneca, 4 = Pfizer-BioNTech
|
Consider a one-way classification model.
a. Compile the ANOVA table.
b. Conduct a size 0.05 test to determine whether or not the mean CT-scores are difference among the four vaccinations. State the null and alternative hypothesis, decision rule and conclusion.
c. Calculate the mean CT-score for each vaccination. Construct a 95% confidence interval for each of the means.
d. By the Bonferroni’s method, construct the simultaneous 95% confidence intervals for the pairwise comparisons among the four vaccinations. Comment on the comparisons.
e. Estimate the difference between the mean CT-score of non-immunized patients and the the average of the mean CT-scores ofthe three vaccinations. Constructure a 95% confidence interval and hence, comment on the difference.