Effects of Seasonality on The Corporate Housing Market in Seattle Presentation

Description

10 attachmentsSlide 1 of 10attachment_1attachment_1attachment_2attachment_2attachment_3attachment_3attachment_4attachment_4attachment_5attachment_5attachment_6attachment_6attachment_7attachment_7attachment_8attachment_8attachment_9attachment_9attachment_10attachment_10.slider-slide > img { width: 100%; display: block; }
.slider-slide > img:focus { margin: auto; }

Unformatted Attachment Preview

Final Project Guideline
1/14
Read this at least four times
Now
When writing proposal
When preparing presentation
When writing final paper
Read very carefully. Lots of information here.
2/14
Goal of the course/final paper
Putting econometric theory to practice!
I
I
I
Think about social science questions
Answer a research question using econometric methods
Criticize yourself
You need
I
I
I
a research question
dataset
econometric method to analyze the dataset to answer the question
3/14
What to do
You need to address an original question, or use an innovative econometric
question. It’s hard for an undergrad to do the latter, although you are more
than welcome to try!
Note:
Pure replication of an existing paper is NOT OK.
It’s totally fine even if you fail to find a significant correlation between key
variables. If you succeed, this is really great!
I will NEVER deduct points just because you fail to establish statistical
significance between variables.
4/14
Example research questions
“What is the effect of alcohol taxes on rates of sexually transmitted diseases?:
Evidence from the border counties”
“What are the effects of the opening of a Wal-mart store on county-level
retail employment and earnings?”
“Do consumers optimize usage of their existing vehicle fleet in response to a
fuel price shock?”
“What is the effects of mitigating lead-paint hazards on public health
outcomes?: Evidence of state-level abatement mandates”
It doesn’t have to be economic questions.
5/14
Good sources for ideas
Any answerable economic questions are good.
Blogs
I
I
I
I
I
I
Free exchange (The Economist)
Tyler Cowen: marginalrevolution.com
Dubner-Levitt: freakonomics.com
Mankiw: gregmankiw.blogspot.com
Krugman: krugman.blogs.nytimes.com
Freakonometrics: http://freakonometrics.hypotheses.org/
Existing papers: Many journals (Econometrica, JPE, AER, etc.) require
authors to post their data online for replicability purpose. You can simply use
this dataset if you cannot find a good public data source. But, remember that
you need to answer different questions with that dataset.
If you have hard time locating good dataset, take a look at datasets on
”Kaggle.com”. Some are just fake dataset. But it’s fine to use them for this
course’s purpose.
6/14
IMPORTANT: Types of questions for project (1)
Best format: ”the (quantitative) effect of A on B”
Examples
I
I
I
the effect of weather on bike sharing demand
the effect of board gender diversity on the company’s financial performance
the effect of gender, race, and education on the level of income
Don’t do the above examples. Yet, it’s OK to pick one and change slightly.
7/14
IMPORTANT: Types of questions for project (2)
NOT a good idea to have a question like:
I
I
What are the factors that affect B?
What is the most important factor that explains B?
These types of questions are hard to answer at this level. Statisticians do have
an OK answer for these questions, but they are outside the scope of this
course. Machine learning techniques are helpful to answer these questions.
Take ECON 484 if you are interested.
8/14
Presentation of results
Do not present too many results. Pick the strongest one. Report four main
findings maximum. Of course, IN THE PAPER, you may want to compare
many different regressions for robustness check, and carefully compare them
(overselling is not good).
Explain which specification is used, which specification is NOT used, and why
you do so.
Your conclusion/message (in the presentation) must be simple, which makes
it powerful. Do not use technical terms in your message.
9/14
Timeline for project (1)
IMPORTANT!!
I never accept submission overdue. Plan to submit on Canvas a day or two
earlier than the deadline, so there’s no “accident”.
Any late submission results in ZERO point. Make sure to know the deadline
in advance.
If you have a good reason, let me know IN WRITING (that is, email; use
djeun@uw.edu) at least 2 WEEKS PRIOR TO DEADLINE. I will give you
permission or not depending on situation.
10/14
Timeline for project (2)
IMPORTANT!!
Submit Proposal (1-2 pages) on Canvas by 1/23 11pm PST.
Present your proposal (3-5 min, 1-5 slide) in class: (tentative) 1/24 and 1/26
(Semi-) final presentation (10-15min, 4-10 slides) in class
I
I
I
I
3 min out of this 10-15min should be reserved for question/answering
questions/comments, etc.
First few minutes should be used for introducing questions and background so
your classmates can understand what you’ve been doing.
Then, focus on econometric methods, finding, discussion.
(tentative) 2/23, 2/28, 3/2, 3/7, 3/9
Submit final paper (5 (min)-20 (max) pages including figures/tables/graphs)
on Canvas by 3/16 11pm.
I
Ideally, use one page on discussion about methodology (pros/cons, assumption,
limitation).
11/14
Proposal and final paper
Proposal: 1-2 pages. Must includes:
Questions
Description of the dataset to be used
I Proposed methodology (e.g., show your regression equation)
X It’s totally OK if final paper topic is different from proposal topic.
I
I
Final paper: 5 (min)-20 (max) pages including all figures, graphs, and tables.
Must be written in a typical research paper style. If you had written a paper
from a similar class before this quarter, you cannot use that paper for this
project. However, can be written in combination with requirement of another
class in THIS quarter. In this case, you need to have permission from the
other instructor.
I will upload model proposals and model papers from previous classes.
12/14
Grade
Read syllabus very carefully.
When it comes to the project:
I
I
I
I
Proposal
Proposal in-class presentation
(Semi-)final in-class presentation
Final paper
Don’t ever be absent on presentation dates even if you don’t present that day.
Read the attendance policy in syllabus.
13/14
EFFORT is the most important for grade
I understand its your first or second experience working on your own
econometric project. So, I will try to appreciate your effort.
If you want to show me your failed effort, use footnote in the final paper (even
if you already talk to me about it in person). For example, you might have
changed your topics because you failed to get good dataset to answer the
question. In this case, you can write in the first footnote about how you tried
to get dataset, which part of the dataset was missing/unsatisfactory, and why
you had to change the topic.
14/14
Elevating Affordable Housing
Abstract
I use DEMs of King County to examine the relationship between elevation and the presence of
affordable housing. I derive two indicators of elevation, the mean slope and standard deviation of
slope, in order to characterize elevation in select geographic regions. I find that elevation is indeed
significant in predicting affordable housing.
I
Introduction
data on its almost 400 census tracts. These census tracts are geographic regions, defined by the
In 2015, 32.3% of all households in the United
United States, for the purpose of a census allowing
States were spending more than 30% of household
for accurate data collection.
income on housing, and so considered cost burAll demographic and social variable data were
dened [3]. Seattle itself is facing record high home
sales and shows no signs of slowing its growth [1]. taken from the U.S. Census Bureau [3]. The elAlthough rising rents, home prices, and other hous- evation data comes from the Washington State
ing expenses have not gone unnoticed by many, Geospatial Data Archive made available through
they may not be the only cause of this deficit of the University of Washington. All elevation data
affordable housing. Geographic features play a role were in the form of 10 meter Digital Elevation Modin both the availability of land for residential de- els (DEMs) for King County surveyed in 2010 [4].
velopment and the costs of residential development. Using satellite imagery, elevation was measured evHow does geography, particularly elevation, effect ery 10 meters and stored in a grid. I was able to lay
this grid over King County and perform statistical
affordable housing?
This study was partly inspired by Albert Saiz’s operations for elevation with respect to the bounds
“The Geographic Determinants of Housing Sup- of each census tract. Although I was able to gather
ply.” Saiz examined the relationship between ge- data for every census tract, there are several disographic restraints, such as steep land and bodies advantages to my elevation data.
of water, on the housing supply of a region. He
finds that the presence of geographical restraints
not only curtails residential development, but is
also highly correlated with the elasticity of the
housing supply [2]. I hope to expand upon Saiz’s
focus of urban development by analyzing the effect
geographic restraints have on affordable housing.
Firstly, census tracts are drawn to reflect population rather than geographic area. Census tracts
within downtown Seattle and other urban areas are
small and may not have enough elevation points to
obtain accurate indicators of elevation. In future
studies I might exclude observations smaller than a
certain area or gather data for other counties. Secondly, because of data formatting, the exact shape
of King County and its census tracts is correct, but
distance and area are not measured in conventional
II Data
units (because of my lack of experience with GIS
I chose 2015 statistics from King County due to data I was unable to determine the exact converits relative homogeneity. In particular, I gathered sion from their projection to sq. miles). I am still
1
able to control for the area and density of a census tract but it will be useless to try and interpret
the exact effect an unit increase in area or density
has on my dependent variable. Thirdly, the elevation data includes bodies of water, which may
skew any averages I take over a particular census
tract. Lastly, 10 meter is more than 30 feet and,
especially in more urban areas, this discretization
may be too large to accurately capture changes in
elevation.
III
I then created two measures of measuring elevation:
mean and standard deviation.
Mean is the mean degree change in elevation,
or the mean slope of the census tract. Building
houses on hills is often more expensive resulting in
higher prices and lowering the overall affordability.
The mean slope in an area will be a good estimate
of the average slope houses are built on. Standard deviation is the mean change in slope of the
census tract. This is to differentiate census tracts
that may have the same mean slope but completely
different landscapes. Using these indicators separately, it would be difficult to differentiate census
tracts which would lead to biased results.
Experimental Design
There were three ways I could have designed this
study. The “golden standard” design would have
been to observe housing affordability through a
natural experiment; some major regrading project
performed independently of housing needs. Although this would have eliminated some selection
and endogeneity bias, it is difficult to find information on regrading projects, especially ones done
independently of housing. In addition, any major regrade would most likely involve tearing down
houses and constructing new ones, both of which
are lengthy tasks. It would have been a challenge
to ensure that the same or even homogeneous populations lived in the same area before and after the
project.
The second design would have been to use time
series data and observe housing affordability and
elevation changes over time. However, it is difficult
to acquire accurate, frequent, and historical elevation data. Gathering elevation data is a timely
and costly process, along with the fact that elevation does not change quickly means that elevation surveys are performed infrequently. It would
also have been difficult to adjust for migration and
other omitted variables.
Due to these various implementation hurdles I
chose to study cross-sectional data. Combining the
latest complete elevation survey with demographic
data I was able to observe how differences in elevation contribute to differences in affordable housing.
IV
Statistical Methods
I examine five different models. Model 1 and
Model 2 estimate the effectiveness of these two elevation indicators separately. Model 3 estimates
the effectiveness of these two elevation indicators
together. Model 4 attempts to control for some of
the unobservable differences between census tracts
by adding the distance from the center of each census tract to downtown Seattle. I argue that census tracts closer to downtown Seattle share similar
characteristics among themselves differently with
census tracts farther away. These omitted variables are likely correlated with distance and will
help control for them. Model 5 omits some of the
larger and farther away census tracts. These census
tracts are large and have fairly low population and
housing densities. They simply share too few similarities with the majority of my observations and
are very likely contributing to skewed estimates.
Each model controls for the same variables.
These include the median value of households and
median income measured in 2015 dollars, the total
population of seniors ages 65 and up, the population density, the unemployment rate, the poverty
rate, the percentage of people ages 18 and up holding a bachelors degree, the total number of housing
units, and the median number of rooms per household.
2
Table 1: Descriptive Statistics for Selected Variables
Variable
Cost burdened (%)
Mean slope
Standard deviation slope
Vacancy rate (%)
Housing units
Obs
396
396
396
396
396
Mean
.309
1.517
1.425
5.748
2196.306
Std. Dev.
.068
.918
.732
3.82
824.8647
Min
.162
.085
.276
0
126
Max
.566
8.721
5.780
40.4
5390
Note: Mean and standard deviation slope are measured in degrees. Housing units is measured in
1000s. The standard deviation of mean slope is not the standard deviation of slope.
Table 2: Estimated Impact of Elevation on Affordable Housing for Selected Variables
Model
Variable
Mean slope
Std. dev. slope
Vacancy rate
Housing units
Median rooms
Distance
(1)
.003
(.003)
—
-.003
(.0008)
-1.18e-5
(4.33e-6)
-.008
(.003)
—
(2)
—
?0
(.003)
-.003
(.0008)
-1.15e-5
(4.37e-6)
-.007
(.003)
—
(3)
.013
(.006)
-.014
(.007)
-.003
(.0008)
-1.13e-5
(4.31e-6)
-.008
(.003)
—
(4)
.013
(.005)
-.013
(.007)
-.003
(.0008)
-1.04e-5
(4.47e-6)
-.007
(.003)
-8.31e-8
(9.72e-8)
(5)
.016
(.007)
-.014
(.007)
-.003
(.0008)
-1.11e-5
(4.45e-6)
-.009
(.004)
-2.51e-8
(1.04e-7)
Note: Mean and standard deviation slope are measured in degrees and degrees. Housing units is
measured in 1000s. Because of the format of my elevation data I do not know the units of distance.
V
these indicators have opposite effects on housing.
Model 3 shows that the mean slope is positively correlated with percentage of cost burdened
households. Not only is it statistically significant
at ? = .05, but a unit increase in mean slope, an
increase of one degree, raises the percentage of cost
burdened households by 1.3%. On the other hand,
standard deviation of slope is statistically significant at ? = .05 and negatively correlated with the
percentage of cost burdened households. A unit
increase in the standard deviation of slope, an increase of one degree, lowers the percentage of cost
burdened households by 1.4%.
Model 4 shows that there are no significant
changes when controlling for distance from down-
Regression Results
The results of the regressions are presented in
Table 2. I find there is a significant relationship
between elevation and affordable housing.
Although the mean slope and standard deviation of slope are highly correlated with each other,
Model 1 and Model 2 show that these indicators
of elevation are indeed statistically and economically insignificant when tested individually. Future
studies of the impact of elevation should avoid including only one of these variables. Not only does
it become difficult to differentiate landscapes, but
3
town Seattle. There are two explanations. One,
distance is a poor indicator for any omitted variables. If the differences between urban and rural census tracts are not correlated with distance
from downtown Seattle we should substitute distance with either another variable that may be correlated or attempt to gather data on these omitted
variables. On the other hand, both the significance
and coefficients of mean slope and standard deviation of slope were unaffected. This could indicate that elevation effects housing independently
of these omitted variables.
Model 5 shows that there are no significant
changes when outliers are taken out of the regression. This may provide more evidence from the
data that elevation effects affordable housing independently of any omitted variables. Or these omitted census tracts were simply larger and had no
significant differences in the context of affordable
housing.
VI
development can still take place in the nearby valley without additional costs incurred by the nearby
steep slope. As the standard deviation of slope increases it is likely the amount of flat land increases
as areas of a steep region flatten and others steepen,
decreasing housing and other costs creating an effect opposite to that of the mean slope.
All together it seems mean slope effects the average slope houses on built on and standard deviation of slope effects the amount of flat land available for development.
While it is unknown exactly how, the data show
that elevation is indeed significant when estimating housing affordability. Throughout these census tracts, mean slope and standard deviation of
slope had standard deviations of around .8. With
the relatively large magnitude of both coefficients
there seem to be significant gains to be had in addressing housing affordability. These results have
important implications for future studies and local
city planning.
City planners might focus residential development on flat regions or provide incentives for developers to regrade the land before construction
begins. Low income housing and urban development may be focused towards flat land. Funding
may also simply go towards regrading projects.
However, policy makers would benefit from understanding the exact pathway from elevation to
affordable housing. It might simply be that elevation increases the cost of construction which then
increases the cost of housing. However, Saiz found
that geographic restraints were correlated with the
elasticity of housing supply. It may be that by constraining the amount of land suitable for development, elevation decreases the housing supply which
also raises housing prices. More data could be gathered on development cost, elasticity of the housing supply, transportation costs, and other theorized ways elevation effects housing. Testing to
see whether or not these results hold for smaller
or larger geographic boundaries, such as blocks or
counties, may also provide some insight into the
effectiveness of these estimates.
I find it interesting every model found housing
units, median rooms, and vacancy rate are significant in estimating the percentage of cost burdened
households. All three of these variables are related
to the elasticity of housing supply of a region. More
Discussion
The coefficient for the mean of slope lends itself
to an intuitive interpretation. As the mean slope
of a census tract increases, the available space for
residential development decreases and the steeper
the average slope a house will be built on. These
contribute to a lower housing supply, but not necessarily to a lower housing demand, resulting in
higher housing prices and a decrease in the overall housing affordability. The mean slope of an
area also effects transportation, plumbing, irrigation, and other indirect determinants of affordable
housing. As transportation costs increase it may
limit the job possibilities and restrict other areas
of spending. Rural areas more dependent on agriculture spend more money to irrigate steeper lands.
The coefficient for the standard deviation of
slope is interesting in determining what shapes of
landscapes are more suitable to affordable housing.
The negative correlation tells us that as the slope
of a census tract becomes more heterogeneous, the
percentage of cost burdened households actually
decreases. This may be a result of steep areas, such
as mountains, ravines, and cliffs, being offset by flat
areas, such as valleys and plains. Though mountains and ravines completely obstruct any housing,
4
tests could be performed to see if the relationship
between elevation and these indicators of elasticity
of housing supply produce similar results, for King
County, to Saiz’s study. In turn, Saiz included several other geographic features in addition to steep
sloped terrain. It would interesting to determine
whether elevation is unique in estimating housing
affordability or if adding other constraints such as
bodies of water have the same or opposite effect.
VII
spending more than 30% of income on housing, includes high income families and others who are not
generally considered to be under the same financial
burden as other cost burdened homeowners.
These problems arise more from data collection and model specification and can be fixed if
researchers are able to gather the right data.
Secondly, there was no randomization of elevation and homeowners/renters resulting in some
selection bias. If houses built on steep slopes are
indeed more expensive, then overtime there may be
a migration of higher income homeowners to these
steeper regions. Income is one of the main determinants of housing affordability, and it would not
be unlikely that these higher income homeowners
would be more likely to afford their housing. The
same for lower income individuals. If housing is
cheaper in flat regions, there may be a migration
of lower income homeowners to these flat regions.
We would observe that housing affordability is flat
regions is lower even though housing is less expensive.
Another source of selection bias is the tendency
of urban areas to be flatter than rural areas. Higher
income or skilled individuals are more likely to live
in the city, and therefore in flat regions, in order to
maximize their possible earnings. These individuals are already more likely to afford their housing. On the other hand, these same individuals
may choose to move out of the city and into these
steep sloped regions, and are still more likely to
afford their housing. Instead of the elevation having an effect on affordable housing, it may just be
income and skill.
Regardless, these would bias our estimates of
the true effect elevation has on affordable housing.
As mentioned earlier, a natural experiment of
some kind would be effective in addressing this
bias. If the elevation of an area was changed without altering the housing, and relatively homogeneous populations occupied the area before and after the redevelopment, housing affordability could
be measured before and after. By assuming that
the homeowners before and after were homogeneous, we would control for any variation between
income brackets and other omitted variables and
eliminate this selection bias.
Thirdly, this model assumes that as the elevation of an area changes it brings about changes
Econometric Concerns
There are several areas of this study that can be
improved upon in order to produce more accurate
estimates.
First of all, there are several flaws in the geographic component of this study. As noted earlier,
census tracts become very small the closer they are
to city centers. This combined with the large 10
meter discretization might provide fewer than necessary elevation points to create an accurate slope
map of these census tracts.
Although these census tracts are primarily
drawn to reflect populations, they are influenced
by geographic factors. Many census tracts follow
rivers, mountains, and other boundaries that are
related to elevation. These DEMs also take measurements for bodies of water. In most cases, elevation values taken from bodies of water should be
excluded from our data because they are mostly
independent of elevation but may have an effect on
available residential development land and affordable housing. The easiest fix might be to control
for the percentage of dry land in a census tract.
An implicit assumption in this model is that the
elevation of the entire county effects housing. In reality, especially in larger census tracts, housing is
rarely spread out across the entire area. However,
taking elevation data strictly from residential areas
omits the geographic constraints created by steep
slopes. A possible solution may to observe census
tracts with similar housing unit densities. Ultimately it falls on how we theorize elevation will
affect housing.
In addition, improvements could be made to the
demographic variables. Housing affordability is estimated by the percentage of cost burdened households. However, our definition of cost burdened,
5
to housing affordability, but the reverse is just as
likely. Residential development almost always flattens rather than steepens which may result in reverse causality bias. The additional costs of developing on steep sloped terrain are greater for larger
buildings such as multi-family units and other
forms of affordable housing. Therefore, before constructing affordable housing developers may regrade the land. Therefore we would observe that
affordable housing creates flatter areas, rather than
flatter areas effecting housing affordability.
Likewise, developers may actually seek to develop hillsides to charge more for unobstructed
views. In return for these views, houses would sell
for more and may present a larger financial burden
for the homeowner. Especially if geographic restraints have limited available housing options. In
order to prevent these biases, we would have to observe regrading projects performed independently
of any residential development.
Lastly, there are likely many omitted variables.
One such variable, particularly hard to find data
for, is public and private money spent on the development of affordable housing. The more funding
that goes towards building multi family and other
cheaper housing options, the more likely we are to
observe flat land. Data could also be collected on
average rainfall. More rainfall will gradually flatten
out the land, and at the same time help production
in the agriculture sector, helping families to afford
their housing.
The only solution for any possible omitted variable bias is to either collect data on these omitted
variables or another variable closely related and
correlated.
VIII
Conclusion
This paper set out to expand upon the work
of Albert Saiz. Saiz examined the relationship between geographic constraints and housing supply
and inspired by his findings I chose to examine in
particular how elevation effects affordable housing.
I began by collecting data on various regions
of King County. I then compared how the elevation in each region affected the percentage of cost
burdened households. To do so I had to characterize the entire topographic plane into two different
point values: mean of slope and standard deviation
of slope. I find that these indicators, and in general elevation, are indeed significant in predicting
affordable housing. Despite likely sources of bias in
my model, I believe these findings will add to the
existing literature and allow future studies to further delve into the relationship between geography
and housing.
References
[1] Rosenburg, Mike, Seattle and Eastside home prices, after brief slowdown, surge to record highs. The
Seattle Times 2017. http://www.seattletimes.com/business/real-estate/seattle-and-eastside-homeprices-after-brief-slowdown-surge-to-record-highs/
[2] Saiz, Albert, The Geographic Determinants of Housing Supply. The Quarterly Journal of Economics
(2010) 125 (3): 1253-1296. http://qje.oxfordjournals.org/content/125/3/1253.abstract
[3] US Census Bureau. [Online]. Available: http://factfinder.census.gov/. Accessed: Feb. 2016.
[4] “Washington 10-meter DEMS.” Washington State – GIS Data,
http://gis.ess.washington.edu/data/raster/tenmeter/byquad/index.html
6
21
Sept.
2010.
ECON 483 FINAL PROJECT
Income Differences across
Genders, Races,
and Education Levels
Introduction
This research project is intended to answer the question “What are the effects of different
genders, races, education levels and areas of occupations on people income?”, and investigate
how income differentiates across genders, races, and education levels. The data is from the
Panel Study of Income Dynamics (PSID) 2013 [4], a longitudinal panel survey of American
families, conducted by the Survey Research Center at the University of Michigan. Robust
Ordinary Least Square Regression will be used to estimate the effect of earning a bachelor
degree, being a female, and so on. Survey data of genders, races, and education levels are
converted from categorical data to binary data (dummy variables).
There are many literature investigating similar topics. For example, in the paper Gender
Wage Gap and Its Associated Factors: An Examination of Traditional Gender Ideology,
Education, and Occupation [1], the author reiterated the existence of a myriad of social
factors that influence wage gap, and illuminates the importance of traditional ideology as
a predictor of wage with Logistic Regression. Also the paper by Jean Heiwege, Sectoral
Shifts and Interindustry Wage Differentials [2] had divided all jobs into a large number
of categories, and investigate the wage difference over time across categories. The paper
Education, Occupation, and Wage Differences Between White and Black Men [3] also deeply
and comprehensively analyzed the relationship between race, education and wage.
Income differences across genders and races are important indications of gender and racial
inequality or potential discrimination. Income differences across different education levels
are crucial knowledge for advising, career developing, making long-run decision and many
other applications.
1
Data
The raw data came from the Panel Study of Income Dynamics (PSID) 2013 [4]. This survey
data contains in total 9063 observations and 5257 variables. New variables are generated
based on the answer to some of the questions in the survey. Observations are dropped due
to missing values, and refusals to answer certain questions.
The descriptive statistics are shown in Table. 1. Variables are divided into 5 different
categories, general information (with no prefix, “info” for short later in this paper), gender
(with g_ prefix), race (with r_ prefix), education (with e_ prefix), and area of occupation
(with j_ prefix).
. sum log_wage age working retired g_female r_white r_black r_indian_or_ak r_asian r_hawaiia
> torate j_management j_business
j_architecture j_life_phy_social j_community j_
Table 1. j_financial
Descriptive Statistics
> s
Variable
Obs
Mean
Std. Dev.
Min
Max
log_wage
age
working
retired
g_female
6,051
7,559
7,559
7,559
7,559
10.33069
45.32028
.7502315
.1580897
.3022887
1.116473
16.70362
.4329076
.3648492
.45928
3.688879
17
0
0
0
15.65606
100
1
1
1
r_white
r_black
r_indian_o~k
r_asian
r_hawaiian
7,559
7,559
7,559
7,559
7,559
.6043127
.3475327
.0070115
.0115095
.0006615
.4890302
.4762182
.0834462
.10667
.0257121
0
0
0
0
0
1
1
1
1
1
e_GED
e_high_sch~l
e_associate
e_bachelor
e_master
7,559
7,559
7,559
7,559
7,559
.0777881
.45932
.0719672
.1837545
.0676015
.2678555
.4983754
.2584507
.3873094
.2510775
0
0
0
0
0
1
1
1
1
1
e_doctorate
j_management
j_business
j_financial
j_architec~e
7,559
7,559
7,559
7,559
7,559
.0215637
.0646911
.0170657
.0105834
.0215637
.1452635
.2459963
.129525
.1023366
.1452635
0
0
0
0
0
1
1
1
1
1
j_life_phy~l
j_community
j_legal
j_arts
j_health
7,559
7,559
7,559
7,559
7,559
.0091282
.0124355
.0023813
.014023
.0275169
.0951108
.1108264
.0487433
.1175934
.1635947
0
0
0
0
0
1
1
1
1
1
j_protective
j_food
j_building
j_sales
7,559
7,559
7,559
7,559
.0267231
.0400847
.0256648
.0535785
.1612837
.1961708
.1581436
.225199
0
0
0
0
1
1
1
1
*Note: Variable names may be shortened due to space constraint. Variable list can be found in the Appendix Table 3.
.
Besides log_wage and age, all other variables are binary dummy variables. 1 indicates
“true”, and 0 indicate “false”. The correlation