Research note

Bayesian Spatially Varying Coefficient Model — Gen 1

Gen 1 builds on the first prototype by letting every coefficient change smoothly over space, using latent Gaussian processes on standardized coordinates. Correlation follows the exponential form \(r(d;\,\phi)=\exp(-d/\phi)\) with a shared length scale \(\phi\), and per-coefficient magnitudes \(\rho_k\) set the relative variability. Inference is performed with PyMC, using NUTS and the NumPyro/JAX backend when available for speed and stability.

This richer formulation helps reveal neighborhood-dependent effects, such as how floor premium or metro proximity change across the city, while keeping the model interpretable.

Model

The listing outcome \(y_i\) for observation \(i\) at location \(\mathbf{s}_i\) is modeled as a spatially varying linear combination of features:

$$y_i \;=\; \sum_{k=0}^{K-1} \beta_k(\mathbf{s}_i)\, x_{ik} \;+\; \varepsilon_i,\qquad \varepsilon_i \sim \mathcal{N}(0,\sigma^2),$$ $$\beta_k(\mathbf{s}) \sim \mathcal{GP}\!\big(0,\; \rho_k^2\, k_\phi(\mathbf{s},\mathbf{s}')\big), \quad k_\phi(\mathbf{s},\mathbf{s}') \;=\; \exp\!\Big(-\tfrac{\|\mathbf{s}-\mathbf{s}'\|}{\phi}\Big).$$

Each coefficient—including the intercept—has its own latent GP sharing the same exponential length scale \(\phi\), while \(\rho_k\) scales per-coefficient variability. Coordinates are standardized for numerical stability.

Features

  • Target: price, modeled on the log scale where helpful (see equation above).
  • Spatial inputs: standardized \((\text{lat}_{std}, \text{lon}_{std})\)
  • Design (all vary spatially): intercept, area\(_{std}\), disposition, ownership, loggia size, parking, floor\(_{std}\), elevator, distance to public transport\(_{std}\), distance to metro, time trend
Compared to Gen 0 (where only the size coefficient varied), Gen 1 lets the data learn location-dependent effects for every covariate, e.g., floor premium or metro proximity varying by neighborhood.

Benchmark

Observed vs Predicted, prices (log-transformed) — Gen 1
Gen 1 observed vs predicted (test), shown on the log scale for interpretability. See the saved metrics for numeric summaries.
Example spatial coefficient maps — Gen 1
Example posterior mean fields of spatial coefficients (e.g., intercept and size effect). Darker tones indicate larger magnitude.

Training & Inference

Tools

Model fitting and diagnostics are driven by project scripts, using PyMC for inference and the NumPyro/JAX backend when available for faster execution. Artifacts such as the trace, posterior summaries, plots and metrics.txt are saved to the experiment directory for inspection.

Notes & Next Steps

Gen 0 Gen 1 Gen 2