TAGC, Aix-Marseille Université
Date(s) : 13/05/2019 iCal
11 h 00 min - 12 h 00 min
Cis-regulatory elements (CREs) are genomic regions regulating gene expression by binding proteins called Transcriptional Regulators (TRs). TR binding is mostly studied experimentally, via ChIP-Seq, but these experiments have false positives, and there is no method to discern them. However, TRs are known to be co-occurent, and many replica datasets exist. As such, we use common TR and/or datasets combinations to identify “atypical” peaks. We use the ReMap database to learn such correlations.
CREs are represented as 3D tensors of peak presence (namely ‘position’, ‘TR’, and ‘dataset’). We use an autoencoder to perform a lossy compression of each, to keep common patterns and discard rare elements (atypical peaks). The regions are viewed by the model through convolutional filters to focus on the correlations. Each peak gets an anomaly score corresponding to the autoencoder reconstruction error.
We use artificial data to confirm the model’s ability to discover correlation groups of TR/datasets and label lonely/anomalous peaks. Application to ReMap is in progress, currently on a curated subset of data. To our knowledge, our research shows the first use of a large-scale meta-analysis to corroborate different ChIP-Seq datasets, using deep learning to integrate them in complex combinations and eliminate atypical peaks.