Advancing multi-environment genomic prediction with explainable deep learning in apple


METADATA ONLY
Loading...

Date

2024

Publication Type

Other Conference Item

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Multi-environment genomic prediction is a useful tool for plant breeding which can help to estimate breeding values of genotypes across diverse environments. For an accurate prediction, methods must integrate phenotypic, genotypic, and environmental data effectively. Yet, the diverse structure of this data poses a challenge for its analysis. However, this complexity is well-suited for deep learning methods because of their modularity. Here, we present an explainable multimodal deep learning method to perform genomic prediction on a multi-year and multi-environment apple REFPOP dataset of eleven quantitative traits. To implement the modelling approach, genotypic data was subjected to feature selection to reduce its dimensionality and improve training performance. Conversely, environmental data was processed as daily mean values. To effectively use environmental time-series data, our model employed long-short term memory (LSTM) layers, alongside dense layers for other data inputs. Different data types were processed through separate multi-layer streams within the architecture and concatenated just before the final regression output layer. The proposed methodology outperformed its statistical counterparts for three out of the eleven traits present in the dataset when performing a five-fold cross-validation repeated five times. These traits were harvest date, titratable acidity and red over colour, with an increase in predictive ability measured with the Pearson’s correlation coefficient r of 0.05, 0.08 and 0.09, respectively. The remaining eight traits showed similar performance as the compared statistical models. Furthermore, we also incorporate an approach to explain the model predictions based on Shapley additive explanations, commonly referred to as SHAP values. Using this approach, we have been able to pinpoint the most important genetic variants as well as relevant time frames during which environmental variables influence trait predictions. Given the increasing amount of data generated in every field, our results provide a framework to integrate differentially structured data and produce accurate and interpretable predictions, using deep learning-based multi-environment genomic prediction models.

Publication status

published

External links

Book title

“Global Challenges for Crop Improvement” - Book of Abstracts

Journal / series

Volume

Pages / Article No.

76 - 76

Publisher

Leibniz Institute Plant Genetics and Crop Plant Research

Event

EUCARPIA 22nd General Congress

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03969 - Studer, Bruno / Studer, Bruno check_circle

Notes

Conference lecture held on August 22, 2024.

Funding

Related publications and datasets