Inferring differentially active biological processes, from small sample gene expression data sets, with a transfer learning approach to matrix factorization

David Hirst
PhD at NSDB, Aix-Marseille Université
https://www.marseille-medical-genetics.org/a-baudot/

Date(s) : 29/11/2021 iCal
15h00 - 16h00

Matrix factorization can be applied to RNA gene expression data to identify sets of genes that jointly participate in biological processes. This can help in inferring the extent to which the activity of these processes varies across biological conditions. When a gene expression data set has only a limited number of samples, the effectiveness of matrix factorization is restricted. Therefore, a transfer learning approach to matrix factorization has been proposed. Such an approach involves, for a small target data set, inferring scores associated with a latent space that has been learned from a large heterogeneous learning data set.

In this study, I used simulated data to explore how a transfer learning approach to matrix factorization might improve the detection of differentially active gene sets. The matrix factorization methods I evaluated were sparse principal components, independent component analysis, non-negative matrix factorization and iCluster. In all cases the transfer learning approach outperformed the direct factorization of target data sets with limited numbers of samples.

I then applied matrix factorization to a subset of a large, heterogeneous compendium of RNA-Seq data, to learn a latent space representative of functionally related gene sets. A small RNA-Seq data set, comprised of samples taken from patients with either Facioscapulohumeral muscular dystrophy or Bosma arhinia microphthalmia syndrome, or from healthy controls, was projected onto the learned latent space. This approach led to the detection of biological processes inferred as differentially active across disease groups.

Emplacement
I2M Luminy - Ancienne BU, Salle Séminaire CIELL (1er étage)

Catégories