Advancing multi-environment genomic prediction with explainable deep learning in apple
METADATA ONLY
Loading...
Author / Producer
Date
2024
Publication Type
Other Conference Item
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
Multi-environment genomic prediction is a useful tool for plant breeding which can help to estimate breeding values of genotypes across diverse environments. For an accurate prediction, methods must integrate phenotypic, genotypic, and environmental data effectively. Yet, the diverse structure of this data poses a challenge for its analysis. However, this complexity is well-suited for deep learning methods because of their modularity. Here, we present an explainable multimodal deep learning method to perform genomic prediction on a multi-year and multi-environment apple REFPOP dataset of eleven quantitative traits. To implement the modelling approach, genotypic data was subjected to feature selection to reduce its dimensionality and improve training performance. Conversely, environmental data was processed as daily mean values. To effectively use environmental time-series data, our model employed long-short term memory (LSTM) layers, alongside dense layers for other data inputs. Different data types were processed through separate multi-layer streams within the architecture and concatenated just before the final regression output layer. The proposed methodology outperformed its statistical counterparts for three out of the eleven traits present in the dataset when performing a five-fold cross-validation repeated five times. These traits were harvest date, titratable acidity and red over colour, with an increase in predictive ability measured with the Pearson’s correlation coefficient r of 0.05, 0.08 and 0.09, respectively. The remaining eight traits showed similar performance as the compared statistical models. Furthermore, we also incorporate an approach to explain the model predictions based on Shapley additive explanations, commonly referred to as SHAP values. Using this approach, we have been able to pinpoint the most important genetic variants as well as relevant time frames during which environmental variables influence trait predictions. Given the increasing amount of data generated in every field, our results provide a framework to integrate differentially structured data and produce accurate and interpretable predictions, using deep learning-based multi-environment genomic prediction models.
Permanent link
Publication status
published
External links
Book title
“Global Challenges for Crop Improvement” - Book of Abstracts
Journal / series
Volume
Pages / Article No.
76 - 76
Publisher
Leibniz Institute Plant Genetics and Crop Plant Research
Event
EUCARPIA 22nd General Congress
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03969 - Studer, Bruno / Studer, Bruno
Notes
Conference lecture held on August 22, 2024.