ECON3208/5048, Applied Econometric Methods, 2020 T3

Krita Informatics Best Academic Writing Services

ECON3208/5408 Course Project Description, 2020T3
The Topic and Data
The topic is based on Kenkel and Terza (2001): The effect of physician advice on alcohol consumption
(, also included in the kit), where a major task is to estimate the effect of
advice on drinks. The data (KTDATA.DTA) and a do-file ( for reading the data are provided.
This topic involves various issues that may be encountered in empirical research. The issues
include endogeneity and some special data features. Mostly, these issues have been discussed in
ECON3208 and two assignments. You should carry out this project using the tools and techniques
covered in our course (up to the end of Ch17.2, and up to p18 of Slides-W5-1a) although they may not be
perfect for the data.
You are not required to replicate the above Kenkel and Terza (KT) article (as some techniques
and methods there are not covered in ECON3208). You should use this article to gain a good
understanding of the topic, motivation, questions of interest, issues involved, and data to be analysed.
The Report
You should read Chapter 19 of Wooldridge to get insights about how to proceed with an empirical
project. You should report your analysis in the following 7 sections. You should limit your report to 8
pages (excluding the cover sheet).

  1. Introduction (1 page). You may discuss why the topic is of interest and how it is related to previous
    literature (referring to two or three related articles discussed in KT). You should outline the
    econometric issues, your modelling strategies, and provide a summary of your findings.
  2. Data (0.5 page). You may briefly describe the data, including the data source, variable definitions,
    important descriptive statistics, and the main features of key variables. You should let readers see
    what you see as important.
  3. Conceptual Model (1 page). You may very briefly describe the empirical economic model, on which
    your econometric models are based. This can motivate your choice of regressors in the econometric
    models. You should read Section 2 of KT for this part.
  4. Econometric Models (2 pages). You may describe your econometric models in detail, and discuss
    how you address various issues in econometric analysis (such as suspected endogeneity and data
    features – drinks being nonnegative with many zeros and advice being binary). The main assumptions
    and estimation method for each econometric model should be briefly discussed. You may need to
    complete this section in conjunction with your computation in Stata, which could involve many trialand-error iterations. See also the “Econometric Analysis” section below.
  5. Empirical Results (2 pages). Your results and findings of econometric analysis should be presented
    in detail in this section. You may use tables for your presentation (e.g., similar to Table 17.3 of
    Textbook). You should interpret your results properly, using the tools covered in ECON3208.
    Comparing results from different models is a good way to check if your findings are robust or
    insensitive to the variations in models and assumptions. You may also want to present the results of
    relevant tests, which may justify or reject the models and assumptions you use. It is important to
    comment on the merits and drawbacks of your econometric models, and discuss possible violation of
    your main assumptions and biases in your findings.
    Page 3 of 4
  6. Conclusions (0.5 page). You may reiterate your main findings here, and comment on possible policy
    implications. You may discuss briefly the remaining issues that you are unable to resolve, and you
    may comment on how you would like to tackle them.
  7. References (0.5 page). You should list your textbook (if it is used) and articles you have read and
    used as references.
    Econometric Analysis
    (a) A goal of this project is for you to explore and apply the knowledge and tools you have learned so far
    (up to the end of Ch17.2) in a research project. You should be able to comment on the strength and
    weakness of your models and methods.
    (b) You should briefly explain why some variables are included in, and others are excluded from, an
    equation. Always pay attention to endogeneity: Is there endogeneity? Do I have valid instruments?
    Can I test the validity of instruments? Does endogeneity make a difference?
    (c) You should start with linear models. While not perfect, linear models can be regarded as a linear
    approximation to the true model. It can also serve as a benchmark for comparisons. In particular, we
    understand well how endogeneity is handled in linear models.
    (d) The method we test for endogeneity (see Ch15.5a) can also be used to estimate the regression
    coefficients in the presence of endogeneity. This approach, known as “control function” method (see
    p10-13 of Slides-W2-1b and p13 of Slides-W4-2b), can be extended to nonlinear models. Suppose we
    want to use (𝑥𝑥, 𝒛𝒛1) to explain 𝑦𝑦, where 𝒛𝒛1 is exogenous, and 𝑥𝑥 is possibly endogenous. Note that 𝒛𝒛1
    may involve two or more variables (i.e., it can be a vector). You can think of 𝑦𝑦 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 and
    𝑥𝑥 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 in this context.
    Assume that the reduced-form equation for 𝑥𝑥 can be either linear with 𝑥𝑥 = 𝒛𝒛1𝝅𝝅1 + 𝒛𝒛2𝝅𝝅2 + 𝑣𝑣, or
    probit with 𝑥𝑥 = Φ(𝒛𝒛1𝝅𝝅1 + 𝒛𝒛2𝝅𝝅2) + 𝑣𝑣. Here, (𝒛𝒛1, 𝒛𝒛2) are exogenous, (𝝅𝝅1,𝝅𝝅2) are parameters, Φ(⋅) is
    the standard normal CDF, and 𝑣𝑣 is an error term with 𝐸𝐸(𝑣𝑣|𝒛𝒛1, 𝒛𝒛2) = 0. Note that 𝒛𝒛2 may involve two
    or more variables (i.e., it can be a vector).
    Further, assume that the structural equation for 𝑦𝑦 can be either linear with 𝑦𝑦 = 𝑥𝑥𝑥𝑥 + 𝒛𝒛1𝜷𝜷 + 𝑢𝑢, or
    Tobit with 𝑦𝑦 = max{0, 𝑥𝑥𝑥𝑥 + 𝒛𝒛1𝜷𝜷 + 𝑢𝑢). For Tobit, 𝑢𝑢 is an error term that is conditionally normal with
    𝑢𝑢 = 𝜃𝜃𝜃𝜃 + 𝑒𝑒, 𝐸𝐸(𝑒𝑒|𝑣𝑣, 𝑧𝑧1, 𝑧𝑧2) = 0, 𝑒𝑒 ∼ 𝑁𝑁(0, 𝜎𝜎2), and (𝛾𝛾,𝜷𝜷, 𝜃𝜃) are parameters. The structure of 𝑢𝑢 here
    takes into account the possible correlation between 𝑢𝑢 and 𝑣𝑣. The parameter 𝜃𝜃 can be used to test
    whether 𝑥𝑥 is exogenous (when 𝜃𝜃 = 0, 𝑢𝑢 and 𝑣𝑣 are uncorrelated) or endogenous (when 𝜃𝜃 ≠ 0, 𝑢𝑢 and 𝑣𝑣
    are correlated).
    It follows that the structural equation for 𝑦𝑦 can be expressed as 𝑦𝑦 = 𝑥𝑥𝑥𝑥 + 𝒛𝒛1𝜷𝜷 + 𝑣𝑣𝑣𝑣 + 𝑒𝑒 for the
    linear model, and 𝑦𝑦 = max{0, 𝑥𝑥𝑥𝑥 + 𝒛𝒛1𝜷𝜷 + 𝑣𝑣𝑣𝑣 + 𝑒𝑒) for the Tobit model, where 𝑒𝑒 is normally
    distributed and uncorrelated with (𝑥𝑥, 𝒛𝒛1, 𝒛𝒛2). Hence, if we were able to observe 𝑣𝑣, the OLS estimation
    would be applicable to the linear model and the maximum likelihood estimation would be applicable
    to the Tobit model.
    As we do not observe 𝑣𝑣, we use a two-step approach (control function approach). If models are
    correct, (𝛾𝛾,𝜷𝜷, 𝜃𝜃) can be consistently estimated in two steps:
    Page 4 of 4
    (i) estimate the reduced-form equation for 𝑥𝑥, either the linear model 𝑥𝑥 = 𝒛𝒛1𝝅𝝅1 + 𝒛𝒛2𝝅𝝅2 + 𝑣𝑣 or
    the probit model 𝑥𝑥 = Φ(𝒛𝒛1𝝅𝝅1 + 𝒛𝒛2𝝅𝝅2) + 𝑣𝑣, and save the residual 𝑣𝑣�;
    (ii) estimate the structural equation (either linear or Tobit) replacing 𝑣𝑣 by 𝑣𝑣�.
    However, the standard errors from Step (ii) can be incorrect because they are based on the first step
    estimation. As we did not cover how to correct such standard errors, you may assume the standard
    errors from Step (ii) are good approximates to the true standard errors, and acknowledge this
    For brevity, the above presentation does not include an intercept in the models. In your report,
    however, all models should include an intercept.
    Stata Commands
    For Stata commands, you may consult the Stata do-files (from Weeks 2 to Week 5) deposited in the
    “Tutorials” folder on Moodle. You may also consult the do-files for Assignments 1 and 2. The following
    points should also be useful.
    • OLS estimation of linear model
    regress x z1 z2
    predict xhat //Save fitted values
    predict vhat, residuals //Save residuals in vhat
    test z2 //Test null hypothesis that coef on z2 is zero
    • 2SLS estimation of linear model
    ivregress 2sls y z1 (x=z2 z3) //2SLS using z2 and z3 as instruments for x
    predict yhat //Save 2SLS fitted values
    predict uhat, residuals //Save residuals in uhat
    • Probit estimation
    probit x z1 z2
    predict xhat //Save fitted values in xhat
    generate vhat=x-xhat //Save residuals in vhat
    • Tobit estimation
    tobit y x z1
    predict yhat, ystar(0,.) //Save fitted values in yhat
    margins, dydx(x) predict(ys(0,.)) //Find partial effect of x
    display r(rho)^2 //Display R-squared
    • Tobit estimation: Prefix a binary regressor x with “i.”
    tobit y i.x z1
    margins, dydx(i.x) predict(ys(0,.)) //Find partial effect of binary x