Perturbation-aware representation learning for <i>in vivo</i> genetic screens


Loading...

Date

2025-10-15

Publication Type

Working Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

CRISPR-based genetic perturbation screens paired with single-cell transcriptomic readouts (Perturb-seq) offer a powerful tool for interrogating biological systems. Yet the resulting datasets are heterogeneous—particularly in vivo—and currently used cell-level perturbation labels reflect only CRISPR guide RNA exposure rather than perturbation state; further, many perturbations have a minimal effect on gene expression. For perturbations that do alter the transcriptomic state of cells, intracellular guide RNA abundance exhibits a dose-response association with perturbation efficacy. We combine (i) per-perturbation, expression-only classifiers trained with non-negative negative–unlabeled (nnNU) risk to yield calibrated scores reflecting the perturbation state of single cells and (ii) a monotone guide abundance prior to yield a per-cell pseudo-posterior that supports both assignment of perturbation probability and selection of affected gene features. To obtain a low-dimensional representation that allows for the accurate reconstruction of gene-level marginals for counterfactual decoding, we train an autoencoder with a quantile–hurdle reconstruction loss and feature-weighted emphasis on perturbation-affected genes. The result is a perturbation-aware latent embedding amenable to downstream trajectory modeling (e.g., optimal transport or flow matching) and a principled probability of perturbation for each non-control cell derived jointly from its guide counts and transcriptome.

Publication status

published

Editor

Book title

Journal / series

Volume

Pages / Article No.

Publisher

Cold Spring Harbor Laboratory

Event

Edition / version

v1

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

02150 - Dep. Informatik / Dep. of Computer Science

Notes

Funding

Related publications and datasets