TY - JOUR
T1 - Integrated analysis of multiple microarray datasets identifies a reproducible survival predictor in ovarian cancer
AU - Konstantinopoulos, Panagiotis A.
AU - Cannistra, Stephen A.
AU - Fountzilas, Helen
AU - Culhane, Aedin
AU - Pillay, Kamana
AU - Rueda, Bo
AU - Cramer, Daniel
AU - Seiden, Michael
AU - Birrer, Michael
AU - Coukos, George
AU - Zhang, Lin
AU - Quackenbush, John
AU - Spentzos, Dimitrios
PY - 2011
Y1 - 2011
N2 - Background: Public data integration may help overcome challenges in clinical implementation of microarray profiles. We integrated several ovarian cancer datasets to identify a reproducible predictor of survival. Methodology/Principal Findings: Four microarray datasets from different institutions comprising 265 advanced stage tumors were uniformly reprocessed into a single training dataset, also adjusting for inter-laboratory variation ("batch-effect"). Supervised principal component survival analysis was employed to identify prognostic models. Models were independently validated in a 61-patient cohort using a custom array genechip and a publicly available 229-array dataset. Molecular correspondence of high- and low-risk outcome groups between training and validation datasets was demonstrated using Subclass Mapping. Previously established molecular phenotypes in the 2nd validation set were correlated with high and low-risk outcome groups. Functional representational and pathway analysis was used to explore gene networks associated with high and low risk phenotypes. A 19-gene model showed optimal performance in the training set (median OS 31 and 78 months, p<0.01), 1st validation set (median OS 32 months versus not-yet-reached, p = 0.026) and 2nd validation set (median OS 43 versus 61 months, p = 0.013) maintaining independent prognostic power in multivariate analysis. There was strong molecular correspondence of the respective high- and low-risk tumors between training and 1st validation set. Low and high-risk tumors were enriched for favorable and unfavorable molecular subtypes and pathways, previously defined in the public 2nd validation set. Conclusions/Significance: Integration of previously generated cancer microarray datasets may lead to robust and widely applicable survival predictors. These predictors are not simply a compilation of prognostic genes but appear to track true molecular phenotypes of good- and poor-outcome.
AB - Background: Public data integration may help overcome challenges in clinical implementation of microarray profiles. We integrated several ovarian cancer datasets to identify a reproducible predictor of survival. Methodology/Principal Findings: Four microarray datasets from different institutions comprising 265 advanced stage tumors were uniformly reprocessed into a single training dataset, also adjusting for inter-laboratory variation ("batch-effect"). Supervised principal component survival analysis was employed to identify prognostic models. Models were independently validated in a 61-patient cohort using a custom array genechip and a publicly available 229-array dataset. Molecular correspondence of high- and low-risk outcome groups between training and validation datasets was demonstrated using Subclass Mapping. Previously established molecular phenotypes in the 2nd validation set were correlated with high and low-risk outcome groups. Functional representational and pathway analysis was used to explore gene networks associated with high and low risk phenotypes. A 19-gene model showed optimal performance in the training set (median OS 31 and 78 months, p<0.01), 1st validation set (median OS 32 months versus not-yet-reached, p = 0.026) and 2nd validation set (median OS 43 versus 61 months, p = 0.013) maintaining independent prognostic power in multivariate analysis. There was strong molecular correspondence of the respective high- and low-risk tumors between training and 1st validation set. Low and high-risk tumors were enriched for favorable and unfavorable molecular subtypes and pathways, previously defined in the public 2nd validation set. Conclusions/Significance: Integration of previously generated cancer microarray datasets may lead to robust and widely applicable survival predictors. These predictors are not simply a compilation of prognostic genes but appear to track true molecular phenotypes of good- and poor-outcome.
UR - http://www.scopus.com/inward/record.url?scp=79953192505&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0018202
DO - 10.1371/journal.pone.0018202
M3 - Article
C2 - 21479231
AN - SCOPUS:79953192505
SN - 1932-6203
VL - 6
SP - e18202
JO - PLoS ONE
JF - PLoS ONE
IS - 3
M1 - e18202
ER -