Research note

Bayesian Spatially Varying Coefficient Model for Prague Housing Prices

This is a compact, interpretable spatial model inspired by A Bayesian approach to hedonic price analysis. It lets the baseline price and the effect of apartment size change smoothly across Prague, so we can see how neighborhoods differ while keeping the model accessible and explainable.

Gen 0 is my first workable spatial prototype, trading off some flexibility for computational tractability. It provides a clear baseline to compare more sophisticated, and more expensive, spatial methods in later generations.

Model

The model expresses the listing outcome $y_i$ at location $\mathbf{s}_i$ as

$$y_i \;=\; \underbrace{\alpha(\mathbf{s}_i)}_{\text{spatial intercept}} \;+\; \underbrace{\beta_a(\mathbf{s}_i)\, a_i}_{\text{spatial slope for size}} \;+\; \underbrace{\mathbf{x}_i^\top \boldsymbol{\beta}}_{\text{global linear effects}} \;+\; \varepsilon_i,$$ $$\alpha(\mathbf{s}) \sim \mathcal{GP}\!\left(0,\, k_\alpha\right), \qquad \beta_a(\mathbf{s}) \sim \mathcal{GP}\!\left(0,\, k_\beta\right), \qquad \varepsilon_i \sim \mathcal{N}(0,\sigma^2).$$

The Gaussian processes $k_\alpha$ and $k_\beta$ capture smooth spatial variation in baseline price and size effect. Both use squared-exponential (RBF) kernels with learned length-scales and half-normal priors on variance parameters.

Features

Target: price, modeled on the log scale when appropriate (see equation above).
Spatial inputs: standardized latitude and longitude $(\text{lat}_{std}, \text{lon}_{std})$
Spatially varying slope: standardized area $a_i=\text{area}_{std}$
Global covariates: floor, elevator, loggia, parking, distance to metro, distance to public transport, disposition, ownership type

The term $\alpha(\mathbf{s})$ represents the local baseline price level, while $\beta_a(\mathbf{s})$ measures how much an additional square meter contributes to value depending on location.

Benchmark

Observed vs Estimated and Predicted, prices (log-transformed) — Estimated (train) and predicted (test) versus observed prices, plotted on the log scale for clarity.

Spatial maps: size coefficient βa(s) and intercept α(s) — Left: posterior mean of the spatial size coefficient $\beta_a(\mathbf{s})$. Right: posterior mean of the spatial intercept $\alpha(\mathbf{s})$.

Training & Inference

Bayesian inference performed using PyMC with the NumPyro (JAX) backend and NUTS sampler.
Latent Gaussian processes for both intercept and size slope, with RBF covariance kernels.
Train/Test split: 80/20 with fixed seed for reproducibility and consistent coordinate standardization.
Posterior mean fields are evaluated on a dense grid over Prague for visualization.

Interpretation

Spatial intercept $\alpha(\mathbf{s})$: expected log-price baseline across the city.
Size coefficient $\beta_a(\mathbf{s})$: marginal contribution of an additional m² in each neighborhood.

Conclusion

The current model underfits. Residual spread remains substantial even on the training set. In the next iteration we will (i) tune the kernel bandwidth to better capture local structure and (ii) consider allowing additional features to vary spatially. In the spirit of geographically weighted/locally weighted formulations, weights of observations around a focal location $\mathbf{s}$ can be written as

$$w_j(\mathbf{s}) \;=\; \exp\!\left(-\frac{d_{sj}}{\gamma}\right),$$

where $d_{sj}$ is the distance between $\mathbf{s}$ and observation $j$, and $\gamma$ is the bandwidth controlling locality. Smaller $\gamma$ increases local sensitivity; larger $\gamma$ smooths more aggressively. Calibrating $\gamma$ (e.g., via CV/LOO) and enabling spatial variation for additional coefficients (e.g., floor or distance-to-metro) should improve fit.

Gen 0 Gen 1 Gen 2