..

Principled Argo Modeling using Vecchia-based Gaussian Processes


Nian Liu, Jian Cao
[stat.CO,stat.ME]

Argo is an international program that collects temperature and salinity observations in the upper two kilometers of the global ocean. Most existing approaches for modeling Argo temperature rely on spatial partitioning, where data are locally modeled by first estimating a prescribed mean structure and then fitting Gaussian processes (GPs) to the mean-subtracted anomalies. Such strategies introduce challenges in designing suitable mean structures and defining domain partitions, often resulting in ad hoc modeling choices. In this work, we propose a one-stop Gaussian process regression framework with a generic spatio-temporal covariance function to jointly model Argo temperature data across broad spatial domains. Our fully data-driven approach achieves superior predictive performance compared with methods that require domain partitioning or parametric regression. To ensure scalability over large spatial regions, we employ the Vecchia approximation, which reduces the computational complexity from cubic to quasi-linear in the number of observations while preserving predictive accuracy. Using Argo data from January to March over the years 2007-2016, the same dataset used in prior benchmark studies, we demonstrate that our approach provides a principled, scalable, and interpretable tool for large-scale oceanographic analysis.

Read more