Research note

Bayesian Spatially Varying Coefficient Model for Prague Housing Prices

A Gaussian-process model inspired by A Bayesian approach to hedonic price analysis, allowing local price dynamics to emerge smoothly across Prague while estimating how the effect of apartment size varies by neighborhood.

Gen 0 represents my initial modeling attempt, in which only one parameter is allowed to vary spatially due to the high numerical cost of Markov chain simulations. This version serves as a computationally tractable foundation for more complex spatial models in later generations.

Model

The model expresses the log-price \(y_i\) of listing \(i\) at location \(\mathbf{s}_i\) as

$$y_i \;=\; \underbrace{\alpha(\mathbf{s}_i)}_{\text{spatial intercept}} \;+\; \underbrace{\beta_a(\mathbf{s}_i)\, a_i}_{\text{spatial slope for size}} \;+\; \underbrace{\mathbf{x}_i^\top \boldsymbol{\beta}}_{\text{global linear effects}} \;+\; \varepsilon_i,$$ $$\alpha(\mathbf{s}) \sim \mathcal{GP}\!\left(0,\, k_\alpha\right), \qquad \beta_a(\mathbf{s}) \sim \mathcal{GP}\!\left(0,\, k_\beta\right), \qquad \varepsilon_i \sim \mathcal{N}(0,\sigma^2).$$

The Gaussian processes \(k_\alpha\) and \(k_\beta\) capture smooth spatial variation in baseline price and size effect. Both use squared-exponential (RBF) kernels with learned length-scales and half-normal priors on variance parameters.

Features

  • Target: \(\log(\text{price})\)
  • Spatial inputs: standardized latitude/longitude \((\text{lat}_{std}, \text{lon}_{std})\)
  • Spatially varying slope: standardized area \(a_i=\text{area}_{std}\)
  • Global covariates: floor, elevator, loggia, parking, distance to metro, distance to public transport, disposition, ownership type
The term \(\alpha(\mathbf{s})\) represents the local baseline price level, while \(\beta_a(\mathbf{s})\) measures how much an additional square meter contributes to value depending on location.

Benchmark

Observed vs Estimated and Predicted, log-price
Estimated (train) and Predicted (test) vs Observed log-prices (45° dashed).
Spatial maps: size coefficient βa(s) and intercept α(s)
Left: posterior mean of the spatial size coefficient \(\beta_a(\mathbf{s})\). Right: posterior mean of the spatial intercept \(\alpha(\mathbf{s})\).

Training & Inference

Interpretation

Conclusion

The current model underfits. Residual spread remains substantial even on the training set. In the next iteration we will (i) tune the kernel bandwidth to better capture local structure and (ii) consider allowing additional features to vary spatially. In the spirit of geographically weighted/locally weighted formulations, weights of observations around a focal location \(\mathbf{s}\) can be written as

$$w_j(\mathbf{s}) \;=\; \exp\!\left(-\frac{d_{sj}}{\gamma}\right),$$

where \(d_{sj}\) is the distance between \(\mathbf{s}\) and observation \(j\), and \(\gamma\) is the bandwidth controlling locality. Smaller \(\gamma\) increases local sensitivity; larger \(\gamma\) smooths more aggressively. Calibrating \(\gamma\) (e.g., via CV/LOO) and enabling spatial variation for additional coefficients (e.g., floor or distance-to-metro) should improve fit.

Gen 0 Gen 1