STAT3600
Statistical Analysis
Assignment 1 (submit Q15, Q16, Q17) Deadline: 24/2/2025
Note: (1) Numeric values should be presented in 4 decimal places. (2) Do not use computer and show the intermediate steps for Q1 to Q12, Q15 to Q16.1.
1. For the matrices below, obtain (1) A+B, (2) A 一 B, (3) AC, (4) ABT, (5) BTA
State the dimension of each resulting matrix.
2. Let A and B be defined as follows:
a) Are the column vectors ofA linearly dependent?
b) What is the rank ofA?
c) Calculate the determinant of A.
d) Calculate the inverse of A.
e) Repeat (a) – (d) for B.
3. Is A given below idempotent?
4.
a) Find a matrix A of the quadratic form.
b) For the matrix:
Find the quadratic form. of the observations Y1,Y2, and Y3.
5. Are the following matrices positive definite or non-negative definite?
6. The data of five pairs of two random variables are given as follows.
x1
|
-1
|
1
|
0
|
1
|
2
|
x2
|
1
|
2
|
4
|
3
|
2
|
a) Calculate the sample mean vector for .
b) Calculate the sample covariance matrix for .
c) Calculate the sample correlation matrix for .
7. The mean vector and the variance-covariance matrix for a random vector given as,
a) Calculate the mean and variance for
Z = 2Y1 - 3Y2 +5Y3
b) Calculate the mean vector and variance-covariance matrix for
Z1 = Y1 +Y2 - Y3 +2
Z2 = 2Y1 - 3Y2 +Y3 -1
8. It is given that
a) Calculate P(Y1 ≤ 2)
b) Calculate P(Y1 ≤ 4IY2 = -1,Y3 = 2)
c) Show that whether the following two random variables are independent or not.
Z1 = Y1 - Y2 +Y3,Z2 = 2Y1 +Y2
9. Suppose Y~Nn(μ, Σ) with Σ nonsingular. Let A be a d × n constant matrix of rank d.
a) Determine the distribution of A(Y - μ).
b) Determine the distribution of (AΣAT)-1/2 A(Y - μ).
c) Using (b) or otherwise, show that
10. Suppose Y~Nn(μ, Σ), and there exists an m × n matrix Q such that
QΣQT = Im and Qμ = 0.
Define Z = QY.
a) What is the distribution of Z?
b) What is the distribution of ||Z||2 = ZTZ?
11. Let and Z~N(0,1) be independent random variables.
a) Calculate the 5% upper quantiles of X, Z2/x and Z/√x .
b) Denote your answers to (a) by x,f, t, respectively. Calculate
i. P(x ≤ x),
ii. P(Z2/x ≤ f),
iii. P(x/Z2 ≤ f),
iv. P(Z/√x ≤ t),
v. P(|Z|>t√X).
c) Calculate the 2.5% upper quantile of Z/√x. Show that its square equals f. Explain.
12. The data of n observations is given as x1,x2, … ,xn. The sum of the squared errors is defined as Derive the formula of the least square estimator for μ .
13. A random sample of 18 U.S. males was selected, and the following information was recorded for each individual:
x = weight (in g) offat consumed per day,
y = total cholesterol (in mg) in blood per deciliter.
The data are given in ‘fat.csv’.
a) Ployy against x.
b) Fit a simple linear regression model to the dataset and plot the fitted regression
line on the graph obtained in (a). Report the least squares estimates of the regression coefficients.
c) Test at the 5% level whether “daily fat intake” is effective in explaining the variation in cholesterol level among the U.S. males.
d) Construct a 95% confidence interval for the expected cholesterol level for people whose daily fat intake is 100g.
e) Construct a 95% prediction interval for the cholesterol level of an individual whose daily fat intake is 100g.
f) A margarine manufacturer claims that the difference between the expected blood
cholesterol level of individuals consuming 100g of fat per day and that of those consuming 40g of fat per day does not exceed 35 mg/dl. If his claim is true, then perhaps some people would be willing to include extra fat in their diets, thinking that the resulting increase in cholesterol is small enough so that there is no need for concern.
Carry out a size 0.05 test for the manufacturer’s claim.
14. The time (y) required for a merchandiser to stock a grocery store shelf with Coca Cola bottles and the number of cases of Coca Cola stocked (x) are stored on ‘cola.csv’. A simple linear regression model is proposed to regress the response y on the explanatory variable x, assuming i.i.d. N(0; σ2) random errors.
a) Calculate the least squares estimates of they-intercept and the slope of the regression line.
b) Carry out a t test to determine if there is a significant linear relationship between x andy at the 5% level.
c) Based on your fitted regression line in (a), estimate the expected time required to stock zero case of Coca Cola.
d) Do you think your answer to (c) is reasonable? Suggest a more reasonable model, which is a special case of the simple linear regression model, to describe the relationship between x andy.
e) Conduct an appropriate t test at the 5% level to test whether the model suggested in
(d) is acceptable in place of the more general simple linear regression model.
15. Consider a linear regression model when Y is regressed on X for 10 observations. It is given that
a) Calculate the least squares estimates of the intercept and the slope.
b) Calculate the sum of squared errors and the unbiased estimate of the variance of the error.
c) Calculate the standard errors of the estimated intercept and slope.
d) Construct a 90% confidence interval for the intercept and the slope.
e) Predict an individual value of Y when X = 2. Construct a 99% prediction interval for the prediction.
15.1 Refer to Q15
a) Estimate the covariance between the estimate of intercept and the slope.
b) Test at the 5% level of significance whether the slope is —1.5.
c) Estimate the mean of Y when X = 2. Construct a 90% confidence interval for the estimate.
16. Four observations of two variables are given as follows.
X
|
-1
|
0
|
1
|
2
|
Y
|
-2.6
|
-0.5
|
0.4
|
1.6
|
Consider a linear regression model when Y is regressed on X. It is given that
a) Write down the data matrix.
b) Calculate the least squares estimates of the intercept and the slope.
c) Calculate the sum of squared errors and the unbiased estimate of the variance of the error.
d) Estimate the covariance matrix of the estimators ofthe intercept and the slope.
e) Calculate the standard errors of the estimated intercept and slope.
f) Test at the 10% level of significance whether the slope is 1.5.
g) Estimate the mean of Y when X = 1.5. Construct a 95% confidence interval for the estimate.
16.1 Refer to Q16
a) Construct a 95% confidence interval for the intercept and the slope.
b) Predict an individual value of Y when X = 2.5. Construct a 90% prediction interval for the prediction.
17. The study presents the linear regression model of moisture detection technique in the building material. The permittivity (F/m) and the moisture (vol%) of a number of bricks are stored in ‘material’.
a) Ploy moisture against permittivity. Is a linear regression appropriate?
b) Fit a simple linear regression model to the dataset and plot the fitted regression
line on the graph obtained in (a). Report the least squares estimates of the regression coefficients.
c) Interpret the intercept and the slope quantitatively.
d) Test at the 5% level of significance whether permittivity is effective in explaining the variation in moisture.
e) Predict the moisture of an individual brick with permittivity = 3.6 F/m. Construct a 95% prediction interval for the prediction.
17.1 Refer to Q17.
a) Construct a 90% confidence interval for each of the estimates in (b).
b) Test at the 5% level of significance whether the slope is 0.1 vol% (F/m)-1 .
c) Estimate the expected moisture of the bricks with permittivity = 4.4 F/m. Construct a 90% confidence interval for estimate.
d) Construct a 95% prediction interval for the difference between the average moisture of two groups of bricks. The first group of bricks have permittivity as 3.4 and 3.5 and the second group of bricks have permittivity as 4.1 and 4.3.