Semester 1 2020

Assignment 2

Due: 9 am, Monday 18 May

• Assignments are to be submitted (uploaded) via Canvas.

• Your assignment should show all relevant and brief working and reasoning, as marks will be given for

method as well as for correct answers. Please spell check your document.

• Paste any relevant and brief R code and output into the appropriate places so that it can be seen easily

along with your other work. Graphics from R can be resized within your document; make them smaller

as necessary.

• Assignments count for 50% of the assessment in this subject. This one is worth 15%, and covers the

work done in chapters 4 to 6.

• The number of marks given for each question may be fine-tuned. The total number of marks for this

assignment is 45.

• Tutors will not help you directly with assignment questions. However, they may give some help with

R.

• Solutions to the assignment questions will be made available later.

• When constructing a panel of graphs with multiple plots, it is good to use the R command

par(mfrow = c(nrows,ncols)) where nrows is the number of rows and ncols the number of columns

in the panel. The default is (1,1).

MAST90044 Thinking and Reasoning with Data Assignment 2

Q1 The following table of frequencies shows age at first pregnancy by incidence of cervical cancer diagnosed

in women aged 50{59. Reference: Graham S and Shotz W (1979), Epidemiology of cancer of the cervix

in Buffalo, J National Cancer Inst 63(1):23{27.

Control Cervical cancer

Age at first 6 25

203 42

114 7

pregnancy > 25 (a) Enter the data into R and perform a chi-squared test of the association between age at first

pregnancy and incidence of cervical cancer. Is this test justified here? Briefly explain.

(b) Perform a test of the association using Fisher’s exact test and compare your conclusion here to

that from part (a). Explain briefly when the Fisher test would be preferred to the chi-squared test

of association.

[6 + 6 = 12 marks]

Q2 Ophthalmologists from Victoria and Western Australia have surveyed children in the Western Desert

in Western Australia to assess the prevalence and severity of trachoma. The data below come from two

years of a longitudinal survey. There are six stages of trachoma, of increasing severity. In this study,

children were observed to have trachoma up to the fourth stage. The data below show the stages of

trachoma including an additional level | those with no signs of trachoma.

Stage 1993 2003

None

Stage 1: Follicular

Stage 2: Intense inflammatory

Stage 3: Trachomatous scarring

Stage 4: Trichiasis 124 264

88 46

7 3

0 2

2 0

(a) Perform a suitable test to examine the association between severity of trachoma and year of survey.

What is your conclusion?

(b) Assess the validity of a politician’s claim that the prevalence (widespread presence) in 2003 was

20%.

[4 + 4 = 8 marks]

2

MAST90044 Thinking and Reasoning with Data Assignment 2

Q3 An investigator wished to determine whether epinephrine has the effect of elevating plasma cholesterol

levels in humans. Twelve adult males were selected and given both a placebo and the drug. Blood

samples were taken following injection of the placebo and again after injection of epinephrine. Analysis

of the blood samples resulted in the following data:

Cholesterol Levels (mg/100mL)

subject placebo epinephrine

1 178 184

2 240 243

3 210 210

4 184 189

5 190 200

6 181 191

7 156 150

8 220 226

9 210 220

10 165 163

11 188 192

12 214 216

These data are also available in TRD=asst03data.csv on LMS.

(a) Formulate an appropriate statistical model, defining all the terms. State the null and two-sided

alternative hypotheses which reflect the research question of interest.

(b) Enter the data into R, and calculate the means for placebo and epinephrine.

Find a 95% confidence interval for the mean difference in cholesterol levels between the placebo

and epinephrine. Use the confidence interval to test your null hypothesis.

(c) Would a 99% confidence interval contain zero? Briefly explain.

[4 + 4 + 2 = 10 marks]

Q4 Transient hypothyroxinemia is a common finding in premature infants. It is not thought to have longterm consequences, or to require treatment. A study was performed to investigate whether it might

have long-term effects, and to this end, blood thyroxine values were obtained on routine screening in

the first week of life for a sample of infants who weighed 2000g or less at birth and were born at 33

weeks gestation or earlier. These results will later be related to motor and cognitive development.

Our aim here is to develop a model to estimate the thyroxin level for a specified gestational age. The

data are available in (TRD=asstQ4data.csv) on LMS:

g.age thyroxine

30 8.1

28 7.2

31 9.2

…

…

(a) Read the data into R and produce an appropriate graphical summary (with meaningful labels) of

the relationship between thyroxin level and gestational age.

(b) Write down an appropriate statistical model for examining the relationship, and fit the model in

R.

(c) i. Give a non-statistical interpretation of the coefficient of g.age.

ii. Find a 95% confidence interval for this coefficient.

3

MAST90044 Thinking and Reasoning with Data Assignment 2

iii. Is thyroxine level related to gestational age? Explain.

iv. What percentage of the total variation in thyroxine level is explained by gestational age?

(d) A record of a new baby became available. Find an interval within which the thyroxine level of this

premature baby of gestational age 31 weeks is likely to lie. Use 95% confidence.

(e) Examine appropriate diagnostic plots and comment on anything that is noteworthy or that may

challenge the assumption of the model.

[2 + 4 + 4 + 2 + 3 = 15 marks]

Total marks = 45