TY - JOUR
T1 - Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
AU - Hsu, Lauren L.
AU - Culhane, Aedin C.
N1 - Publisher Copyright:
© Copyright © 2020 Hsu and Culhane.
PY - 2020/6/23
Y1 - 2020/6/23
N2 - Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered.
AB - Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered.
KW - data integration
KW - data preprocessing
KW - matrix factorization
KW - normalization
KW - scRNA-seq
KW - single cell
KW - standardization
UR - http://www.scopus.com/inward/record.url?scp=85087496565&partnerID=8YFLogxK
U2 - 10.3389/fonc.2020.00973
DO - 10.3389/fonc.2020.00973
M3 - Review article
AN - SCOPUS:85087496565
SN - 2234-943X
VL - 10
SP - 973
JO - Frontiers in Oncology
JF - Frontiers in Oncology
M1 - 973
ER -