Comparing housing prices over time is difficult because of all the heterogeneity in housing that can both affect prices and vary over time. The typical solution to this problem is to use a hedonic price index that attempts to model the relationship between housing characteristics and price, and uses this to control the change in these characteristics over time.
Although there are several approaches to building a hedonic price index, the double imputation approach is usually seen as the best. Every presentation of this method that I have seen calculates the index by fitting two regression models and combining the fitted values from these models in a geometric index (e.g., Aizcorbe 2014; ILO et al. 2013; IMF 2020). What I want to show here is that the double imputation index can be recovered from the coefficient in a linear regression. This makes it simpler to calculate the index and provides an easy strategy to estimate conventional standard errors.
Turning two regressions into one
The hedonic imputation model starts with two linear models, one for transactions in period 0,
where \(x_{it}\) is a vector of characteristics that confound a change in price over time with a change in the composition of housing.
The Laspeyres hedonic imputation index, \(I\), compares the predicted prices from Equation 2 against those from Equation 1 over the distribution of characteristics in period 0
where \(\bar{x}_{0}\) is the vector of average period 0 characteristics. This is equation 5.19 in ILO et al. (2013).
Equation 1 and Equation 2 can be nested to get a single linear model. Subtracting \(\bar{x}_{0}\) from the interaction term then gives Equation 3 as a coefficient on a time dummy variable
where \(\varepsilon_{it} = u_{it} + t(e_{it} - u_{it})\). The Paasche hedonic imputation index follows along the time lines, replacing \(\bar{x}_{0}\) with \(\bar{x}_{1}\). The Fisher hedonic imputation index combines these two, which is the same as replacing \(\bar{x}_{0}\) with \(\bar{x}_{0} / 2 + \bar{x}_{1} / 2\). Note that this reduces to the classic time-dummy hedonic index when \(\beta_{1} = \beta_{0}\).1
An example
Let’s go through an example to see this in action. As the goal is just to give an example, I’ll make up some typical-looking transaction data for property sales with a few characteristics over two periods.
The two-regression approach for making the Laspeyres-imputation index fits two separate models for each time period and compares the average ratio of prediceted prices over the distribution of housing characteristics for sales in period 0.
sales2 <-split(sales, ~period)mdl1 <-lm(log(price) ~ age + area +I(area^2) + amenity_score, sales2[[1]])mdl2 <-lm(log(price) ~ age + area +I(area^2) + amenity_score, sales2[[2]])laspeyres <-exp(mean(predict(mdl2, sales2[[1]]) -predict(mdl1, sales2[[1]])))laspeyres
[1] 0.9367763
The Paasche-imputation index does the same thing, just over the distribution of characteristics for sales in period 1, and the Fisher-imputation index combines the Laspeyres and Paasche indexes.
To make these indexes with one regression we need to remove the appropriate mean from the characteristics variables and interact them with the time dummy in the linear model. In each case the coefficient on the time dummy variable gives the same index as using two regressions. For simplicitly, I’ll just show the case of the Fisher index.
dm <-function(x) { m0 <-mean(x[eval(substitute(period ==0), parent.frame())]) m1 <-mean(x[eval(substitute(period ==1), parent.frame())]) x -mean(c(m0, m1))}mdl_fisher <-lm(log(price) ~ period + age + area +I(area^2) + amenity_score + period:(dm(age) +dm(area) +dm(area^2) +dm(amenity_score)), sales)all.equal(exp(coef(mdl_fisher)["period"]), fisher, check.attributes =FALSE)
[1] TRUE
Representing the Fisher index as the coefficient in a linear model also makes it easy to get the (delta method) standard error for the index. 2
As noted by Aizcorbe (2014, chap. 3), constructing the hedonic-imputation index with two regressions is more flexible than recovering an index from a regression coefficient. The trick of doing it with one regression works because the model is linear; using a non-linear regression model (however rare) may require fitting two models separately. Explicitly using the fitted values from two models to make price relatives also allows for the use of other index-number formulas, not just geometric ones.
One reason that I’ve not seen for the two-regression approach is that it’s also more convenient to calculate product contributions when doing two regressions.
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. MIT Press.
Footnotes
This type of regression model comes up in the literature on program evaluation as a way to estimate average treatment effects; see, e.g., Wooldridge (2002, sec. 18.3.1).↩︎
These standard error are approximate when the mean of the characteristics is calculated from the sample of data used to the fit the model.↩︎
@online{martin2025,
author = {Martin, Steve},
title = {Hedonic Imputation with One Regression, Not Two},
date = {2025-10-27},
url = {https://marberts.github.io/blog/posts/2025/hedonics/},
langid = {en}
}