Proteogenomic heterogeneity of localized 1 human prostate cancer progression 2

36 Tumor-specific genomic aberrations are routinely determined by high throughput genomic 37 measurements. However, it is unclear how complex genome alterations affect molecular networks 38 through changing protein levels, and consequently biochemical states of tumor tissues. Here, we 39 investigated how tumor heterogeneity evolves during prostate cancer progression. In this study, we 40 performed proteogenomic analyses of 105 prostate samples, consisting of both benign prostatic 41 hyperplasia regions and malignant tumors, from 39 prostate cancer (PCa) patients. Exome sequencing, 42 copy number analysis, RNA sequencing and quantitative proteomic data were integrated using a network43 based approach and related to clinical and histopathological features. In general, the number and 44 magnitude of alterations (DNA, RNA and protein) correlated with histopathological tumor grades. 45 Although common sets of proteins were affected in high-grade tumors, the extent to which these proteins 46 changed their concentrations varied considerably across tumors. Our multi-layer network integration 47 identified a sub-network consisting of nine genes whose activity positively correlated with increasingly 48 aggressive tumor phenotypes. Importantly, although the effects on individual gene members were barely 49 detectable, together the perturbation of this sub-network was predictive for recurrence-free survival time. 50 The multi-omics profiling of multiple tumor sites from the same patients revealed cases of likely shared 51 clonal origins as well as the occasional co-existence of multiple clonally independent tumors in the same 52 prostate. Overall, this study revealed molecular networks with remarkably convergent alterations across 53 tumor sites and patients, but it also exposed a diversity of network effects: we could not identify a single 54 sub-network that was perturbed in all high-grade tumor regions. 55 56 57 Proteogenomic heterogeneity of prostate cancer progression Page 3 / 37

Introduction 58 Prostate cancer (PCa) represents one of the most common neoplasm among men with almost 59 1,300,000 new cases and 360,000 deaths in 2018 1 accounting for 15% of all cancers diagnosed. PCa is the 60 fifth leading cause of cancer death in men and represents 6.6% of total cancer mortality in men 1 . Despite 61 earlier detection and new treatments, the lifetime risk to die of PCa has remained stable at approximately 62 3% since 1980. (National Cancer Institute SEER data: https://seer.cancer.gov/statfacts/html/prost.html). 63 In many patients, PCa is indolent and slow growing. The challenge is to identify those patients who are 64 unlikely to experience significant progression while offering radical therapy to those who are at risk. PCa compared to other cancers 6,11 . A major complicating factor is that around 80% of PCas are multifocal 74 and harbor multiple spatially and often morphologically distinct tumor foci 12,13 . Several recent studies have 75 suggested that the majority of topographically distinct tumor foci appear to arise independently and show 76 few or no overlap in driver gene alterations [14][15][16] . Therefore, a given prostate gland can harbor clonally 77 independent PCas. 78 To allow for a more functional assessment of the biochemical state of PCa, it is necessary to go 79 beyond genomic alterations and comprehensively catalogue cancer specific genomic, transcriptomic and 80 proteomic alterations in an integrated manner [17][18][19] . Such an approach will provide critical information for 81 basic and translational research and could result into clinically relevant markers. While hundreds of PCa 82 genomes and transcriptomes have been profiled to date 20 , little is known about the PCa proteome. 83 Although recent work has emphasized the need for integrated multi-omics profiling of PCa, we still lack 84 understanding about how genomic changes impact on mRNA and protein levels [17][18][19] . Especially the 85 complex relationship between tumor grade, tumor progression and multi-layered molecular network 86 changes remains largely elusive. 87 For example, previous work has shown that copy number changes may alter transcript levels of 88 many genes, whereas the respective protein levels remain relatively stable 21 . Indeed, there is compelling 89 evidence across multiple tumor types that many genomic alterations are 'buffered' at the protein level 90 and are hence mostly clinically inconsequential 22,23 . To better understand the evolution of PCa and to 91 identify core networks perturbed by genomic alterations and thus central for the tumor phenotype, it is 92 therefore essential to investigate the transmission of CNAs to the transcriptomic and proteomic level. 93 To this end, it is important to decipher which genomic alterations impact PCa proteomes, which 94 of those proteomic alterations are functionally relevant, and how molecular networks are perturbed at 95 the protein level across tumors. 96 To address these open questions, we performed a multi-omics profiling of radical prostatectomy 97

112
Proteogenomic analysis of the sample cohort identifies known PCa biomarkers.

113
In this study, we analyzed 39 PCa patients (Supplementary Fig. 1) belonging to three groups who 114 underwent laparoscopic robotic-assisted RP. The patients were from the PCa Outcomes Cohort (ProCOC) 115 study 26 27 . Tumor areas were graded using the ISUP (International Society of Urological Pathology) grade 116 groups 28 , which range from ISUP grade group G1 (least aggressive) to G5 (most aggressive). The more 117 advanced grade groups G4 and G5 are considered jointly (G4/5). The cohort tested included 12 low-grade 118 (G1), 17 intermediate-(G2 and G3), and 10 high-grade (G4/5) patients (Fig. 1a, Supplementary Fig. 1,  119 Supplementary Table 1). For low-grade PCa patients, we selected two representative regions, one of 120 benign prostatic hyperplasia (BPH) and one of malignant tumor (TA). Since PCa often presents as a 121 multifocal disease with heterogeneous grading within each prostate specimen 25 we analyzed two different 122 Molecular perturbations correlate with tumor grade.

149
As a first step towards a cross-layer analysis, we asked if high-grade PCa would be generally 150 affected by stronger alterations (compared to low-grade PCa) at the genome, transcriptome, and 151 proteome layer 30 . Thus, we devised molecular perturbation scores that quantified the number of affected 152 genes/proteins and the extent to which these genes/proteins were altered in the tumor specimens 153 compared to their benign controls (see Methods for details). Higher-grade tumors (G3 and G4/5) exhibited 154 significantly higher molecular perturbation scores than lower-grade tumors (G1 and G2). Those differences 155 were statistically significant in all but one case (P value < 0.05, one-sided Wilcoxon rank sum test, Fig. 2). 156 The CNA perturbation magnitude exhibited the highest correlation with the PCa grading, confirming prior 157 studies documenting the tight association between CNA, histopathological grade and risk of 158 progression 4,5,31 . Previous work suggested that copy number changes are to some extent buffered at the 159 protein level 32 . Interestingly, we observed that proteins known to be part of protein complexes were 160 significantly less strongly correlated with the fold changes (FCs) of their coding mRNAs than proteins not 161 known to be part of protein complexes (P value < 2.6e-11, one-sided t-test, Supplementary Fig. 6). This 162 result is consistent with the concept that protein complex stoichiometry contributes to the buffering of 163 mRNA changes at the level of proteins 22,33-35 . Thus, molecular patterns in high-grade PCa are more strongly 164 perturbed at all layers and the effects of genomic variation are progressively but non-uniformly attenuated 165 along the axis of gene expression. 166 Inter-patient heterogeneity decreases along protein biosynthesis.

167
Our analysis of CNA profiles (above and Supplementary Text) already revealed many shared CNAs 168 across patients, suggesting that such common CNAs might represent genomic driver changes. We 169 therefore investigated if such a convergence towards common molecular endpoints could also be 170 observed at the transcript and protein level. To address this question, we first computed a reference 171 molecular signature that is characteristic of the molecular perturbations of tumors in a given grade group. 172 These 'centroid vectors' were obtained by computing the average tumor-to-benign FCs across all samples 173 within a grade group. Consistent with the observation above, we found that the average effect sizes 174 (averaged absolute centroid FCs) were increasing with the grade group for all three layers (CNA, mRNA, 175 and protein; Supplementary Table 6). Next, we compared each individual sample within a group against 176 the matching centroid of the same group. For the quantification of the similarity between a tumor sample 177 and the corresponding centroid we used four similarity/distance measures: Pearson correlation, Mutual 178 Information (MI) 36 , Manhattan distance and Euclidean distance. While the first two measures (i.e. Pearson correlation and MI) quantify the degree to which the tumor sample and centroid vector co-vary, the other 180 two measures (i.e. Manhattan and Euclidean distance) also take into account the magnitude of the FCs in 181 the two vectors. To illustrate this difference, imagine two patients having perturbations of the same 182 genes/proteins, but one of them exhibiting overall two-fold greater FCs (i.e. all FCs are increased by a 183 factor of two compared to the other patient). In such a scenario Pearson correlation and MI would yield 184 identical results for the two patients when compared to the centroid, whereas Manhattan and Euclidean 185 distance would identify them as different. Using the Pearson correlation and MI, we found that high-grade 186 PCa (G4/5) were more similar to their respective centroid than low-grade PCa (G1) to their centroid (Fig.  187 3). This effect was particularly pronounced for protein-level changes. This is consistent with the notion 188 that protein levels (and not mRNA levels) are subjected to stronger selection. Interestingly, when the 189 Euclidean distance and the Manhattan distance were used to characterize tumor similarity, we found that 190 the high-grade tumors were more dissimilar to each other than the low-grade tumors (Supplementary 191 To further corroborate the notion of common endpoints, we identified the 10 proteins with the 199 largest average absolute FCs across all tumor specimens (Supplementary Table 6). Among them was PSA 200 (KLK3), and several other well established PCa-associated proteins like AGR2 37 , MDH2 38 , MFAP4 39 and 201 FABP5 40 . We observed that for some of these top 10 proteins, FCs were more extreme in the higher-grade 202 tumors (G3 and G4/5) compared to lower-grade tumors (G1 and G2), such as MDH2 and SEPHS1 (up-203 regulation; Fig. 3c). RABL3 was one of the most strongly down-regulated proteins (Fig. 3c), which is a 204 surprising finding as RABL3 is known to be up-regulated in other solid tumors 41,42 . Interestingly, in most 205 cases these proteins were from loci that were not subject to CNAs (Supplementary Fig. 7, Supplementary 206 Table 6). Taken together, these findings suggest that tumor mechanisms in different patients converged 207 on common protein endpoints and that the expression levels of these proteins were progressively more 208 strongly affected during tumor evolution. Thus, we next aimed to identify molecular networks that 209 underlie these alterations and thus indicate altered biochemical states of the respective tissues. 210 Gene networks are regulated on CNA, mRNA and protein level.

211
Although the analyses described above suggested convergent proteomic changes, we did not 212 detect a single mutation that was common to all high-grade tumors, let alone common to all 39 PCa cases 213 tested. It has previously been suggested that mutations affecting different genes could impact common 214 molecular networks if the respective gene products interact at the molecular level 43 . However, previous 215 analyses were mostly restricted to individual molecular layers. For example, it was shown that genes 216 mutated in different patients often cluster together in molecular interaction networks 43 . But, effects of 217 these mutations on transcript and protein levels remained unexplored in this case. Here, we aimed at a 218 multi-layer network analysis, involving the genome, transcriptome and proteome. We applied network 219 analysis at each layer separately and the layers were combined afterwards. To identify sub-networks 220 commonly perturbed, we mapped our data onto the STRING gene interaction network 44 , and employed 221 network propagation 45,46 separately to the CNA, transcriptome and proteome data for each of the tumor 222 samples. We excluded point mutations from this analysis as their frequency was too low in our cohort. By 223 combining published molecular interactome data with a network propagation algorithm 43,46 , we aimed to 224 'enrich' network regions with many perturbed genes/proteins. We reasoned that the convergent 225 consequences of genomic variants on common network regions would be indicative of specific 226 biochemical functions that are important for the tumor biology. We therefore identified genes/proteins 227 in network regions that showed a higher score (or a lower score) in high-grade (G4/5) relative to lower-228 grade (G1) tumor groups at all three levels ( Fig. 4a, b; Methods). This analysis identified sub-networks 229 consisting of over-and under-expressed genes (relative to the benign controls). We found 57 amplified 230 genes (Supplementary Table 6) for which transcripts and proteins were often over-expressed in high-231 grade PCa (Fig. 4a) and 21 genes with copy number loss (Supplementary Table 6) for which transcripts 232 and proteins were often down-regulated compared to lower-grade tumors (Fig. 4b). 233 Among the up-regulated network nodes, we observed genes modulating the stability of 234 chromatin, such as chromatin-binding protein Chromobox 1 (CBX1) 47 , SET Domain Bifurcated 1 235 (SETDB1) 48 , a function linking to H3K27me3 and H3K9me3 in chromatin, and CBX3 (known as HP1-γ) 49 . 236 SETDB1 is an oncogene in melanoma 50 and has also been found to be over-expressed in PCa and cell lines 237 51 . Further, we found genes involved in DNA damage repair, such as SMG7 52 and ATR 53 , and PRKCZ 54 , which 238 had already been suggested as a biomarker prognostic for survival in PCa 55 . Multiple actin related proteins 239 including ARPC1B 56 , ARPC5 57 , ACTL6A 58 , and CFL1 59 , which are markers for aggressive cancers, were part 240 of the up-regulated network nodes. Moreover, the up-regulated genes contained proteins related to the 241 cell cycle like BANF1 and proteins interacting with the centrosome including LAMTOR1 and RAB7A that 242 had already been associated with PCa 60 . Finally, several signaling molecules with known roles in PCa were 243 up-regulated, such as the transcription factor Yin Yang 1 (YY1) 61 Table 6). The parallel DNA amplification of these genes is therefore the result of multiple 276 independent CNA events, while the signal on any single gene alone was too weak to be significant in 277 isolation. In some but not all cases, the amplifications led to a small, but consistent increase in mRNA 278 expression of the amplified gene loci (Fig. 4d). Unfortunately, only three out of the nine proteins were 279 detected in our proteomics experiments (Fig. 4e) Aarhus 77 ), together comprising a total of 513 patients with known clinical outcome. We found, that over-294 expression of genes from Network Component 1 was a significant predictor of reduced RFS in the TCGA 295 cohort (P value = 8.8e-4, log-rank test), which was the cohort with the largest number of patients. In the 296 other two cohorts we observed the same trend, although the difference in RFS was not statistically 297 significant (P value = 0.11 and 0.093 for MSKCC, and Aarhus; Fig. 4f). 298 In conclusion, our findings suggest that relatively weak but broad CNAs of entire network 299 components are associated with high-grade tumors and that the presence of some of these perturbations 300 in lower-grade tumors may be predictive of the future development of a more aggressive phenotype. 301 Analysis of distinct tumor nodules defines intra-patient heterogeneity (TA1 versus TA2 302 comparison).

304
suggest that different tumor areas from the same patient shared several mutations. Such common 305 signatures are expected if different tumor nodules originate from a common clone. If this was true, we 306 would expect mutational signatures to be more similar between different nodules from the same patient 307 than between patients, even though mutated genes may be shared across patients. To compare the intra-308 and inter-patient molecular heterogeneity at the levels of CNAs, transcript, and protein FCs, we computed 309 the Pearson correlation between tumor area 1 (TA1) and its paired tumor area 2 (TA2) for each layer and 310 all of the 27 patients with two characterized tumor areas (25 for the mRNA, see Methods and 311 Supplementary Text). As a control, we also computed all pairwise Pearson correlations between the 312 samples within each of the grade groups (i.e. inter-patient correlation). As expected, paired TA1 and TA2 313 from the same patient were on average more strongly correlated to each other compared to samples from 314 different patients within the same grade group. This finding was consistent for all omics layers (Fig. 5a), 315 and was more pronounced at the CNA and mRNA layers compared to the protein layer. 316 Next, we tested whether a high correlation at the level of CNA also implies a high correlation at 317 the level of mRNA and proteins. We tested this idea by 'correlating the correlations', i.e. we correlated the 318 TA1-TA2 correlation of CNA profiles with the correlation between the mRNA and protein profiles of the 319 same tumor areas (Fig. 5b). Indeed, a higher correlation of two tumor areas at the level of CNA correlated 320 significantly with a higher correlation at the level of mRNA (r=0.49, P value=0.014). In other words, 321 knowing how similar two tumor areas of a patient are at the CNA level supports a prediction of their 322 similarity at the mRNA level (and conversely). Although the correlation between protein and CNA was not 323 statistically significant, it followed the same trend (r=0.35, P value=0.076). 324 Comparing molecular similarity across omics layers allowed us to identify specific types of patients. 325 The patients H2, H4, M13 had highly correlated tumor areas at all three layers (upper right corner in all 326 scatterplots of Fig. 5b). Likely, the tumor areas of these patients have a common clonal origin 327 ( Supplementary Fig. 3). In contrast, patients M12 and M14 had weakly correlated tumor areas at all levels 328 (bottom left corner in all scatterplots of Fig. 5b). These tumor nodules either have independent clonal 329 origins or they diverged at an earlier stage during tumor evolution (Supplementary Fig. 3) 16 . For example, 330 in the case of patient M12 large parts of the genome were not affected by CNAs in the benign sample as 331 well as in TA1 and TA2. However, as shown on Supplementary Fig. 3, a large region was amplified in TA1, 332 whereas the same region was deleted in TA2. This is consistent with a scenario in which TA1 and TA2 show 333 parallel evolution. A third class of patients is exemplified by the patients M9 and M17, who showed a high 334 correlation between their tumor areas on the CNA and mRNA levels, but not on the protein level. Yet other 335 patterns were apparent in patients M6 and H3. They showed very similar protein patterns in the two 336 tumor areas, but not on the other two layers. We confirmed that protein-level similarity correlated with 337 similar histological characteristics of the tumor areas. Supplementary Fig. 9 shows formalin-fixed paraffin-338 embedded (FFPE) tissue microarray images (duplicates) from the analyzed tumor nodules (TA1 and TA2, 339 diameter 0.6 mm), further underlining the hypothesis that ultimately protein-level alterations are 340 responsible for common cellular phenotypes. Although we cannot fully exclude the possibility that some 341 of these results were affected by technical noise in the data, our findings suggest that transcript alterations 342 can frequently be buffered at the level of proteins (patients M9, M17, Supplementary Fig. 6) and that 343 convergent evolutionary processes may lead to the alteration of common proteins (patients M6, H3). We 344 also note that our findings are specific to the two tumor areas available in this study and could be different In our study, samples were generated from small, less than 1 mm diameter punches in immediate 357 spatial proximity in the tumor and subsequently profiled at all three 'omics layers' (DNA, RNA, proteome). 358 Due to the large spatial heterogeneity of PCa 14,25 , this design -which is so far uncommon for studies 359 profiling multiple layers from tumor specimens -was instrumental for increasing the comparability of the 360 various omics layers and thus facilitated the analysis of molecular mechanisms. Our key findings are: (1) 361 we confirmed the importance of CNAs for PCa biology and the alteration of many known PCa-associated 362 genes at the transcript-and protein-level; (2) we revealed a generally elevated molecular alteration of 363 high-grade tumors compared to lower-grade tumors; (3) although our study confirmed large within-and 364 between-patient genomic heterogeneity, (4) we detected molecular networks that were commonly 365 altered at the mRNA and protein-level. The fact that many of those target molecules are known drivers of 366 PCa tumorigenesis, supports the notion that these proteins/transcripts are subject to convergent 367 evolutionary mechanisms. 368 We integrated the three omics layers using a network-based approach as opposed to directly 369 comparing gene perturbations (mutations) to gene products (transcripts and proteins). Using genome data 370 only, it had previously been hypothesized that whereas the identity of specific mutated genes may differ 371 between tumors, those mutations might still affect common molecular networks 43 . In other words, tumor 372 phenotypes are more dependent on which network regions are perturbed rather than which individual 373 genes are mutated. Our study provides experimental evidence that such network effects are indeed 374 propagated to subsequent molecular layers and that this effect propagation may be clinically relevant. 375 Our multi-omics network analysis revealed that high-grade PCa tumors distinguished themselves 376 from low-grade tumors in two aspects. The first is a generally higher heterogeneity and loss of controlled 377 gene regulation, which increased the molecular differences among high-grade tumors. It had previously 378 been shown that gene expression in tumors is often less coordinated than in normal samples 30  subject to an evolutionary process, that progressively 'moves' protein levels towards a more aggressive 414 state. Generally, and at all molecular layers tested, the two paired tumor areas were more similar to each 415 other compared to two samples from the same grade group but different patients, suggesting common 416 evolutionary origins. Although the two tumor areas seemed to mostly originate from the same clone, this 417 was not always the case. In some patients, different nodules exhibited different molecular patterns at all 418 omics layers, suggesting early evolutionary separation. Thus, for the first time, current diagnostic, expert-419 level consensus guidelines 29 are supported by detailed proteogenomic data. Our findings support earlier 420 claims that clonality itself might be a prognostic marker with implications for future, more tumor-specific 421 treatment when targeted therapies become available also for PCa 16,78 . 422 Our study shows that all three molecular layers (genome, transcriptome and proteome) 423 i.e. reads that cover areas outside of the exon amplicons. "Off-target" reads are produced due to 474 inefficient enrichment strategies. In our case on average 28.5% of the total reads were not on target. 475 Briefly, CopywriteR removes low quality and anomalous read pairs, then peaks are called in the respective 476 blood reference, and all reads in this region are discarded. After mapping the reads into bins, those peak 477 regions, in which reads had been removed, were compensated for. Additionally, read counts are corrected 478 based on mappability and GC-content. Finally, a circular binary segmentation is carried out and for each 479 segment the log count ratios between tissue samples and the respective blood sample are reported as 480 copy number gain or loss. The copy number of each gene in each sample was reported based on the log 481 count ratio of the respective segment in which the gene was located. The overall performance of this 482 CNA-calling approach was evaluated by comparing the results of the TA1 (and TA) samples with CNA 483 results obtained by applying the OncoScan Microarray pipeline to FFPE samples from the same tumors 484 (Supplementary Fig. 10). 485

486
OncoScan copy number assays were carried out and analyzed as described previously 88 . Briefly, 487 DNA was extracted from punches of FFPE cancer tissue blocks. Locus-specific molecular inversion probes 488 were hybridized to complementary DNA and gaps were filled in a nucleotide-specific manner. After 489 amplification and cleavage of the probes, the probes were hybridized to the OncoScan assay arrays.  Fig. 11, Supplementary Table 4). In order to correct for this 3' bias, 3 tag counting was 503 performed as described by Sigurgeirsson et al 93 using a tag length of 1,000. After 3' bias correction, three 504 samples still showed a clear 3' bias: the two tumor regions (TA1 and TA2) of the patient M5 and TA2 from 505 patient M8 (Supplementary Fig. 11). These samples were excluded from subsequent analyses. 506 Additionally, the BPH region of the patient M5 was excluded due to the exclusion of both its tumor regions. 507 FeatureCounts 94 was used to determine read counts for all genes annotated in ENSEMBL v75. 508 Genes for which no read was observed in any of the samples in the original data were excluded from the 509 analysis. Further, after 3 tag counting, all genes with without at least 1 read per million in N of the samples 510 were removed. We chose N to be 10 which corresponds to the size of the smallest grade group (G2). In a 511 last reduction step, all genes with more than one transcript were excluded, yielding a final set of 14,281 512 genes. 513 Read count normalization and differential gene expression analysis was performed using the R 514 packages sva 95 and DESeq2 96 . All benign tissues were considered biological replicates and differential 515 gene expression for the individual tumor samples was determined against all benign tissues. Gene 516 expression changes with an adjusted P value < 0.1 were considered significant. 517

518
The 3 tag counting approach for 3' bias correction was used on the RNA-seq dataset 93 . This 519 approach requires changing of the annotation file in two steps: 1) isoform filtering and 2) transcript length 520 restriction. As proposed in 93 for each gene we determined the highest expressed isoform within a set of 521 high quality samples. As high quality samples we used all samples with an mRIN score greater than or 522 equal to 0.02. This set contains 7 benign and 15 tumor samples. Isoform expression was determined using 523 cufflinks 97 . As transcript length we chose 1,000bp. 524

525
FusionCatcher (version 0.99.5a beta) was used to determine gene fusions for all samples. Fusions 526 classified as "probably false positive" are discarded unless they are also classified as "known fusion". 527 PCT assisted sample preparation for SWATH-MS

551
To build a comprehensive library for SWATH data analysis, we analyzed unfractionated prostate 552 tissue digests prepared by the PCT method using Data Dependent Acquisition (DDA) mode in a tripleTOF 553 mass spectrometer over a gradient of 2 hours as described previously 19 . We spiked iRT peptides 51 into 554 each sample to enable retention time calibration among different samples. We then combined these data 555 with the DDA files from the pan-human library project 99 . All together we analyzed 422 DDA files using 556 X!Tandem 52 and OMSSA 53 against three protein sequence databases downloaded on Oct 21, 2016 from 557 UniProt, including the SwissProt database of curated protein sequences (n=20,160), the splicing variant 558 database (n=21,970), and the trembl database (n=135,369). Using each database, we built target-decoy 559 protein sequence database by reversing the target protein sequences. We allowed maximal two missed 560 cleavages for fully tryptic peptides, and 50 p.p.m. for peptide precursor mass error, and 0.1 Da for peptide 561 fragment mass error. Static modification included carbamidomethyl at cysteine, while variable 562 modification included oxidation at methionine. Search results from X!Tandem and OMSSA were further 563 analyzed through Trans-Proteomic Pipeline (TPP, version 4.6.0) 54 using PeptideProphet and iProphet, 564 followed by SWATH assay library building procedures as detailed previously 19,55 . Altogether, we identified 565 167,402 peptide precursors, from which we selected the proteins detected in prostate tissue samples, and 566 built a sample-specific library. SWATH wiff files were converted into mzXML files using ProteoWizard 56 567 msconvert v.3.0.3316, and then mzML files using OpenMS 57 tool FileConverter. OpenSWATH 100 was 568 performed using the tool OpenSWATHWorkflow with input files including the mzXML file, the TraML 569 library file, and TraML file for iRT peptides. 570 Peptide quantification using OpenSWATH

571
To obtain consistent quantification of the SWATH files, we obtained the all annotated b and y 572 fragments from the sp, sv and tr libraries. About ten thousand redundant and low-quality assays were 573 removed. Then we extracted the chromatography of these fragments and MS1 signals using 574 OpenSWATHWorkflow, followed by curation using DIA-expert 101 . Briefly, the chromatography of all 575 fragments and MS1 signals were subject to scrutiny by empirically developed expert rules. A reference 576 sample with best q value by pyprophet was picked up to refined fragments. The peptide precursors are 577 further filtered based on the following criteria: i) remove peptide precursors with a q value higher than 578 1.7783e-06 to achieve a false discovery rate of 0.00977 at peptide level using SWATH2stats 102 ; ii) peptides 579 with a FC higher than 2 between the reference sample and its technical replicate were removed; iii) 580 peptides matching to multiple SwissProt protein sequences were removed. The data matrix was first 581 quantile normalized, log2 transformed, followed by batch correction using the ComBat R package 103 . 582 Finally, for each protein and pair of technical replicates the average value was computed. 583

584
All plots were produced with R. Kaplan-Meier estimators were used for RFS analysis. Differences 585 between survival estimates were evaluated by the log-rank test. 586 Computation of molecular perturbation scores  Table 6) we used the R/Bioconductor package STRINGdb 106 . The gene symbols 613 with no matching STRING identifier were removed, while for those that mapped to multiple STRING 614 identifiers, the first mapping was kept (default choice in the package). From the multiple gene symbols 615 that mapped to the same STRING identifier, the first mapping was kept. The genes that were not present 616 in the network were removed from the datasets, while those that were present in the network but not in 617 the corresponding dataset were initially filled in with 0's. 618 Genes with very small, 'smoothed' (absolute) FCs were filtered out as follows: after the network 619 propagation, only network nodes that had protein measurements themselves or at least one direct 620 neighbor (on the filtered STRING network) with protein measurements were considered in the next steps 621 of this analysis. I.e. network nodes without measured FCs at the protein layer that had no direct neighbor 622 with measured protein values were removed from the subsequent analyses. 623 For significance testing, the one-sided Wilcoxon rank sum test comparing the smoothed FCs 624 between the groups G4/5 and G1 was applied to each network node (after filtering) and layer, once for 625 up-regulation and once for down-regulation. The resulting sub-networks (up-regulated and down-626 regulated) consisted of those genes that were significant (P value below 0.05) at all three layers and all of 627 the edges connecting them on the filtered STRING network. 628 Network Component 1 analysis 629 For each tumor sample at the CNA layer, a one-sided, one-sample t-test has been applied testing 630 if its average FC over the genes of the Network Component 1 (and in particular those that have been 631 measured at the CNA) is significantly greater than 0. Due to the presence of outliers in some samples, the 632 non-parametric, one-sided Wilcoxon signed-rank test has been applied as well yielding very similar results 633 (data not shown). A result is considered to be significant if the corresponding P value is below 0.05. The 634 analysis has been repeated for the mRNA and protein layer. We reduced all three datasets to the nine genes of Network Component 1. In each of the datasets, 649 we computed for each sample an average z-score across the nine genes of Network Component 1 650 (combined risk score). Subsequently, we used these combined risk scores to split the samples of each 651 study into two groups: samples with a combined risk score larger or equal to the median combined risk 652 score of the study were considered as 'altered' and the rest as 'unaltered'. Kaplan-Meier curves were 653 generated for the two groups. 654

655
In order to compute the MI between each individual sample within a group against the matching 656 centroid of the same group, the R package infotheo (http://cran.r-657 project.org/web/packages/infotheo/index.html) has been used. For the MI, the data needs to be discrete. 658 For that, we discretize the tumor-to-benign FCs and centroids by setting 1, if the value is above a threshold 659 thr, -1, if the value is below -thr and 0 else. (thr is equal to 1 for the mRNA and proteins and 0.5 for the 660   to the low-grade tumors. There is a significant down-regulation of the FCs of the depicted genes (colored 950 in blue) after network smoothing in the group G4/5 compared to the group G1 in all three layers (CNA, 951 mRNA and protein). Functional annotation of the sub-networks in (a) and (b) with more than one node is 952 given below. c, Heatmap of the CNA matrix reduced to the Network Component 1 genes. The 953 columns/samples are ordered based on the grade group. The bottom colorbar depicts the effect size of 954 each sample, i.e. its average FC across the genes of the Network Component 1. The next colorbar 955 represents the negative logarithm in base 10 of the P value from the t-test. The above colorbar shows 956 whether the result is significant: significant results are colored in black and the rest in white. The top 957 colorbar depicts the grade group of each sample. A black box has been drawn around the 'interesting' area