Survival analysis under label shift
Yuxiang Zong, Yanyuan Ma, Ingrid Van Keilegom
[stat.ME]
Let P represent the source population with complete data, containing covariate $\mathbf{Z}$ and response $T$, and Q the target population, where only the covariate $\mathbf{Z}$ is available. We consider a setting with both label shift and label censoring. Label shift assumes that the marginal distribution of $T$ differs between $P$ and $Q$, while the conditional distribution of $\mathbf{Z}$ given $T$ remains the same. Label censoring refers to the case where the response $T$ in $P$ is subject to random censoring. Our goal is to leverage information from the label-shifted and label-censored source population $P$ to conduct statistical inference in the target population $Q$. We propose a parametric model for $T$ given $\mathbf{Z}$ in $Q$ and estimate the model parameters by maximizing an approximate likelihood. This allows for statistical inference in $Q$ and accommodates a range of classical survival models. Under the label shift assumption, the likelihood depends not only on the unknown parameters but also on the unknown distribution of $T$ in $P$ and $\mathbf{Z}$ in $Q$, which we estimate nonparametrically. The asymptotic properties of the estimator are rigorously established and the effectiveness of the method is demonstrated through simulations and a real data application. This work is the first to combine survival analysis with label shift, offering a new research direction in this emerging topic.