Research note

Bayesian Spatially Varying Coefficient Model — Gen 1

Gen 1 builds on the first prototype by letting every coefficient change smoothly over space, using latent Gaussian processes on standardized coordinates. Correlation follows the exponential form $r(d;\,\phi)=\exp(-d/\phi)$ with a shared length scale $\phi$, and per-coefficient magnitudes $\rho_k$ set the relative variability. Inference is performed with PyMC, using NUTS and the NumPyro/JAX backend when available for speed and stability.

This richer formulation helps reveal neighborhood-dependent effects, such as how floor premium or metro proximity change across the city, while keeping the model interpretable.

Model

The listing outcome $y_i$ for observation $i$ at location $\mathbf{s}_i$ is modeled as a spatially varying linear combination of features:

$$y_i \;=\; \sum_{k=0}^{K-1} \beta_k(\mathbf{s}_i)\, x_{ik} \;+\; \varepsilon_i,\qquad \varepsilon_i \sim \mathcal{N}(0,\sigma^2),$$ $$\beta_k(\mathbf{s}) \sim \mathcal{GP}\!\big(0,\; \rho_k^2\, k_\phi(\mathbf{s},\mathbf{s}')\big), \quad k_\phi(\mathbf{s},\mathbf{s}') \;=\; \exp\!\Big(-\tfrac{\|\mathbf{s}-\mathbf{s}'\|}{\phi}\Big).$$

Each coefficient—including the intercept—has its own latent GP sharing the same exponential length scale $\phi$, while $\rho_k$ scales per-coefficient variability. Coordinates are standardized for numerical stability.

Features

Target: price, modeled on the log scale where helpful (see equation above).
Spatial inputs: standardized $(\text{lat}_{std}, \text{lon}_{std})$
Design (all vary spatially): intercept, area$_{std}$, disposition, ownership, loggia size, parking, floor$_{std}$, elevator, distance to public transport$_{std}$, distance to metro, time trend

Compared to Gen 0 (where only the size coefficient varied), Gen 1 lets the data learn location-dependent effects for every covariate, e.g., floor premium or metro proximity varying by neighborhood.

Benchmark

Observed vs Predicted, prices (log-transformed) — Gen 1 — Gen 1 observed vs predicted (test), shown on the log scale for interpretability. See the saved metrics for numeric summaries.

Example spatial coefficient maps — Gen 1 — Example posterior mean fields of spatial coefficients (e.g., intercept and size effect). Darker tones indicate larger magnitude.

Training & Inference

Bayesian inference with PyMC (NUTS); NumPyro/JAX backend used when available.
Exponential covariance $r(d;\phi)=\exp(-d/\phi)$ with shared length scale $\phi$; per-coefficient scales $\rho_k$.
80/20 train–test split with fixed seed; standardized coordinates and key numerics.
Artifacts saved: trace.nc, posterior_summary.csv, obs_vs_pred.png, metrics.txt.

Tools

Model fitting and diagnostics are driven by project scripts, using PyMC for inference and the NumPyro/JAX backend when available for faster execution. Artifacts such as the trace, posterior summaries, plots and metrics.txt are saved to the experiment directory for inspection.

Notes & Next Steps

Shared $\phi$ simplifies computation and encourages comparable smoothness across coefficients; per-feature $\phi_k$ is a possible extension.
Model complexity increases posterior coupling; careful diagnostics (ESS, $\hat{R}$) and re-centering/standardization remain important.
Future: hierarchical shrinkage on $\rho_k$, alternative kernels (Matérn), and sparse/inducing-point approximations for scalability.

Gen 0 Gen 1 Gen 2