Introduction to Econometrics I Project
Using STATA, perform a regression analysis of the data provided.
The dependent variables deaths, heart, and liver, each can be regressed on alcohol as nice simple regression examples.The conventional wisdom is that wine is good for the heart but not for the liver, something that is apparent in the regressions.Because the number of observations is small, this can be a good data set to illustrate calculation of the OLS estimates and statistics.
You will be working on a project where you use multivariate regression analysis to analyze
economic data. You will be responsible for determining the research question, formulating the
regression model, finding the relevant data and papers, performing the analysis and discussing the
results. Chapter 19 in Wooldridge’s “Introductory Econometrics” has many useful examples and
suggestions for carrying out an empirical project.
Technical Details-Formats
Your final project must be submitted uploaded to a designated folder on Brightspace.
Please submit an electronic copy of your first draft for a format check by.
If any important part of the paper is missing or is not properly presented, you will receive an email
within 3-4 days.
1. Cover page. The cover page should be structured as follows:
Name
B00#
Date
Project Title
Prepared for Introduction to Econometrics I Project
2. Length. The maximum length, including figures, tables and references, should not exceed
12 pages.
3. Font size and space. The text should be double-spaced, with size 12 font.
4. Equations. Use an equation editor (built-in in MS Word) to specify your model(s), and
number all equations in your text sequentially (1, 2, etc).
Introduction to Econometrics I Project
Proposed Outline
1. Introduction
In this section, describe the research question and explain why it is important. Focus on the
dependent variable. Provide a brief description of what you will do in your project (in each section),
without getting into detail.
2. Literature review
Provide a short review of journal articles and/or books that are closely related to your project.
Include the complete reference for each reviewed study in the reference section.
3. Methodology
This section must discuss in detail what you will do in this project. You should mention the
questions that you will answer and how you plan to do so. For example, you write that you will
investigate the effect of education and experience on wages. This will be done by considering a
multivariate linear model, to be estimated by OLS. If there is a similar paper in the literature, you
must explain the difference between your work and the cited paper. Is it in the methodology? Do
you include more independent variables in your analysis? Do you use a different estimation
technique? Do you have a different data set?
Remember to write the regression that you plan to run using the following format:
𝑤𝑎𝑔𝑒𝑖 = 𝛼 + 𝛽𝑒𝑑𝑢𝑖 + 𝑢𝑖
Focus on the independent variables. For each independent variable, explain why you have included
it in the model and whether you expect it to have a positive or negative impact on the dependent
variable.
4. Description of the data
In this section, you describe your data set in detail: the variables, their nature (continuous,
categorical, or binary 0/1), time period that they span, the number of observations, and the source
of the data.
Summary statistics should be provided either in tables or figures, depending on the type of data.
The full range of summary statistics (mean/variance/min/max/skewness/kurtosis) can be provided
for continuous variables. Binary or categorical variables can be reported using frequency tables or
pie charts.
Provide some discussion of the descriptive statistics of the dependent and independent variables.
If you notice some patterns in your data, interesting or strange, mention them here. You can also
include some preliminary analysis about the relationship between variables of interest using
scatterplots between pairs of variables.
5. Results
In section 3 you have explained your methodology. In this section, you should estimate the models
based on your data and report the results. The regression outputs and specification tests must be
provided and discussed. In the class you will learn how to estimate the models and how to do
inference for the models (i.e., testing hypotheses about the values of the parameters of your model
based on OLS estimates). You are asked to use what you have learned to estimate your models
and make inference. In this section, you will also discuss the model specification and potential
biases. You may consider additional independent variables that matter for explaining the
dependent variable or use different nonlinear transformations of existing variables.
If you have regressed the same variable of interest on different independent variables, you should
discuss which resulting model is better in terms of the goodness of fit (𝑅
2
and 𝑎𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅
2
Introduction to Econometrics I Project
).
You need to be aware that your results would be reliable if OLS assumptions are satisfied. After
running the regressions, you should test for functional misspecification and heteroskedasticity.
Note: Detailed instructions for this section, as well as relevant STATA commands, will be posted
later on Brightspace
6. Conclusion
This is the final section of your project. You should provide a summary of what you have done. In
one or at most two paragraphs, state the questions that you wished to answer and your main
findings (independent variables that have some effects on the dependent variables and magnitude
of each effect). How can you use these results for policy making (practical purposes)? You may
also provide suggestions for further research (i.e., including other variables, considering different
functional forms, using different estimators, etc.).
7. References
You should list all cited studies that are related to your research questions following the Chicago
Manual of Style as follows:
Andrews D., and E. Zivot, (1992), Further Evidence on the Great Crash, the Oil-Price, and the
Unit-Root Hypothesis, Journal of Business & Economic Statistics, 10, 251-270.
Introduction to Econometrics I Project
Appendix
Including tables and figures in the main text may lead to some difficulties regarding the layout of
your work. Instead, you may place all your tables and figures at the end of the file. All tables and
figures should be labeled (e.g. Table 1, Figure 5, etc.) and must have a title. When you discuss the
results in the text, use table and figure numbers to refer to them. In Section 5, refer to relevant
tables and figures when discussing the results as follows:
“The results of running the regression. ……. can be found in Table 2. The parameter estimate for
education is statistically insignificant….”
If you move all tables and figures to the Appendix, the Appendix should have two separate
sections: one for tables and one for figures. Tables and figures should not be copy-pasted from the
software output. You should create your own tables and graphs.
Instructions for Section 5 “Results”
What independent variables should you include in your models?
Sometimes you have a theory that determines which independent variables you should include in
your model (for example, you may try to quantify the parameters in a Cobb-Douglas production
function: the independent variables are labor and physical capital). In other cases, you do not have
such a theory but you have a large set of independent variables that you believe can be used to
explain your dependent variable. In such a case you are not sure which variables you should
include and which not. You can proceed in the following way:
Begin with all independent variables that you think are relevant and that do not have near
multicollinearity issues. If applicable, you may consider cross products of these variables
(you need to justify why you did so).
Estimate your model with all of them.
Check the output. Some of the parameter estimates may be statistically insignificant.
Consider the hypothesis that they are jointly insignificant (F-test). If so, remove the
independent variables associated with these parameters. Otherwise, consider all the
hypotheses that are related to subgroups of these coefficients.
Estimate your model again and repeat the hypothesis testing. The coefficient estimates
should not be considerably different (especially their sign), compared to the previous
models, or else you may have an omitted variables bias. Examine again if the OLS
assumptions are satisfied.
This procedure is called general-to-specific approach.
Introduction to Econometrics I Project
In Section 5:
1. Estimate your regression model as discussed above.
2. What is the adjusted R-squared? Do you deem it to be large enough? Could you add more
independent variables? Note that if you decide to explore the case of using more
independent variables by including them in the “updated” model, please use the generalto-specific approach.
3. Perform misspecification testing (RESET, Breusch-Pagan and White’s tests) in the
following order:
If functional form is found to be a problem when using RESET, change the
specification by applying logarithms to suitable variables or by adding squared terms
of some of the independent variables.
Estimate the “updated” model and check again for functional form using RESET.
Select the model that looks less misspecified.
Check the “updated” model for heteroskedasticity (B-P and White’s tests).
If heteroskedasticity is not an issue then you are ready to discuss your regression
output results.
If heteroskedasticity is found then re-estimate your “updated” model using robust
standard errors.
Note: In misspecification tests, the null hypothesis is that there is no problem with the
specification.
4. Discuss the regression output.
Report the result of the goodness-of-fit test and the adjusted R-squared from the
STATA regression output.
Individual parameter estimates: statistical interpretation
o If you found a variable to be significant, report it and mention the significance level.
In the regression output, the null is that the individual parameter is equal to zero
(statistically insignificant).
o If a variable of interest is insignificant, report it as well (do not delete it from the
model). It means that based on the data set, you have found no statistical evidence
of an effect of this independent variable on the dependent variable.
o Reminder: if you have two or more independent variables that are individually
insignificant and you consider removing them from the specification, check
whether they are jointly insignificant. If the F-test of this restriction does not reject
the null, you can remove them from the model.
Individual parameter estimates: economic interpretation
o You can now interpret the parameter estimates in economic terms, i.e., what effects
the independent variables have on the dependent variable. Do you think that the
effects are large?
o Provide economic interpretation for both statistically significant and insignificant
parameter estimates. Interpret them accordingly.
Important: whenever you perform hypothesis testing, the p-value is relevant to the null hypothesis.
a. p-value<1%: reject the null hypothesis at the 1% significance level
b. 1%<p-value<5%: reject the null hypothesis at the 5% significance level
c. 5%<p-value<10%: reject the null hypothesis at the 10% significance level
d. P-value>10%: do not reject the null hypothesis.
Project Rubric
Approximate weight (%)
Technical Model description 10
Data description 15
Selection of independent variables (including F-tests) 15
Misspecification testing 10
Statistical significance (discussion) 10
Qualitative Motivation of the research question 10
Literature review 10
Economic significance of the results (discussion) 10
Layout / format 10
Introduction to Econometrics I Project