STAT3600
Assignment 1 (submit Q7 – Q10)
Deadline: 26/2/2024
Note: (1) Numeric values should be presented in 4 decimal places. (2) Do not use computer and show the intermediate steps for Q1 to Q4, Q7 to Q9.
1. Suppose Y~Nn (μ, Σ) with Σ nonsingular. Let A be a d × n constant matrix of rank d.
a) Determine the distribution of A(Y - μ).
b) Determine the distribution of (AΣAT )—1/2 A(Y - μ).
c) Using (b) or otherwise, show that
(Y - μ)TAT (AΣAT )—1A(Y - μ)~xd(2) .
2. Suppose Y~Nn (μ, Σ), and there exists an m × n matrix Q such that
QΣQT = Im and Qμ = 0.
Define Z = QY.
a) What is the distribution of Z?
b) What is the distribution of ⅡZⅡ2 = ZTZ?
3. Let x~x5(2) and z~N(0,1) be independent random variables.
a) Calculate the 5% upper quantiles of X, z2/x and z/√x .
b) Denote your answers to (a) by x,f, t, respectively. Calculate
i. P(x ≤ x),
ii. P(z2 /x ≤ f),
iii. P(x/z2 ≤ f),
iv. P(Z/√X ≤ t),
v. P(|Z| > t√X)
c) Calculate the 2.5% upper quantile of z/√x. Show that its square equals f. Explain.
4. Do not use computer. Consider a linear regression model when Y is regressed on X. It is given that
a) Calculate the least squares estimates of the intercept and the slope.
b) Calculate the standard errors of the estimated intercept and slope.
c) Test at the 5% level of significance whether the slope is -1.1.
d) Estimate the mean of Y when X = 0.8. Construct a 90% confidence interval for the estimate.
5. A random sample of 18 U.S. males was selected, and the following information was recorded for each individual:
x = weight (in g) offat consumed per day,
y = total cholesterol (in mg) in blood per deciliter.
The data are given in ‘fat.csv’.
a) Ployy against x.
b) Fit a simple linear regression model to the dataset and plot the fitted regression
line on the graph obtained in (a). Report the least squares estimates of the regression coefficients.
c) Test at the 5% level whether “daily fat intake” is effective in explaining the variation in cholesterol level among the U.S. males.
d) Construct a 95% confidence interval for the expected cholesterol level for people whose daily fat intake is 100g.
e) Construct a 95% prediction interval for the cholesterol level of an individual whose daily fat intake is 100g.
f) A margarine manufacturer claims that the difference between the expected blood
cholesterol level of individuals consuming 100g of fat per day and that of those consuming 40g of fat per day does not exceed 35 mg/dl. If his claim is true, then perhaps some people would be willing to include extra fat in their diets, thinking that the resulting increase in cholesterol is small enough so that there is no need for concern.
Carry out a size 0.05 test for the manufacturer’s claim.
6. The time (y) required for a merchandiser to stock a grocery store shelf with Coca Cola bottles and the number of cases of Coca Cola stocked (x) are stored on ‘cola.csv’. A simple linear regression model is proposed to regress the response y on the explanatory variable x, assuming i.i.d. N(0; σ2) random errors.
a) Calculate the least squares estimates of they-intercept and the slope of the regression line.
b) Carry out a t test to determine if there is a significant linear relationship between x andy at the 5% level.
c) Based on your fitted regression line in (a), estimate the expected time required to stock zero case of Coca Cola.
d) Do you think your answer to (c) is reasonable? Suggest a more reasonable model,
which is a special case of the simple linear regression model, to describe the relationship between x andy.
e) Conduct an appropriate t test at the 5% level to test whether the model suggested in
(d) is acceptable in place of the more general simple linear regression model.
7. Do not use computer. Given the data
a) Obtain the mean corrected data matrix,
b) Verify the columns of Xc are linearly dependent. Specify an a, = [a1, a2, a3] vector that establishes the dependence.
c) Obtain the sample covariance matrix and verify that it is singular.
8. Do not use computer. Given
a) Find the probability P(y1 < 3).
b) Find the probability P(y1 < 3ly2 = 2).
c) Find the distribution of
Hence, find the probability P(x < 3).
d) Find a 2 × 1 vector, a, such that Y2 and are independent.
9. Do not use computer. Five observations of two variables are given as follows.
X
|
-2
|
-1
|
0
|
1
|
2
|
Y
|
-7.1
|
-4.0
|
-2.1
|
1.6
|
4.3
|
Consider a linear regression model when Y is regressed on X. It is given that
a) Calculate the least squares estimates of the intercept and the slope.
b) Calculate the standard errors of the estimated intercept and slope.
c) Construct a 95% confidence interval for each parameter.
d) Test at the 5% level of significance whether the slope is 3.
e) Estimate the mean of Y when X = -1.5. Construct a 95% confidence interval for the estimate.
10. The study aimed to study the prediction of BMI (kg/m2) using mid-upper arm circumference, MUAC, (cm). The data are stored in ‘muac.csv’.
a) Fit a simple linear regression model to the dataset and plot the fitted regression line on the graph obtained in (a). Report the estimates of the regression coefficients.
b) Find the coefficient of determination and comment on the fitness of the model.
c) Interpret the slope quantitatively.
d) Find the predicted values and residuals ofthe first 2 observations.
e) Find SSE and MSE.
f) Find the standard errors of the intercept and the slope.
g) Test at the 5% level whether MUAC is effective in explaining the variation in BMI.
h) Construct at 90% confidence interval for the true slope. Hence, test at the 10% level of significance whether the slope is 1.
i) Construct a 95% confidence interval mean BMI at MUAC = 25 cm.
j) Construct a 95% prediction interval for the difference between the average BMI of
two groups of subjects. The first group of subjects have MUAC as 25, 26 and 28. The second group of subjects have MUAC as 27 and 29.