Date(s) - 26/04/2019
14 h 00 min - 15 h 00 min
It has been shown that while a single genomic data source might not be sufficiently informative, fusing several complementary genomic data sources delivers more accurate predictions. In this regard, genomic data fusion has garnered much interest across biological research communities. Consequently, finding efficient and effective techniques for fusing heterogeneous biological data sources has gained growing attention over the past few years.
Kernel methods, in particular, are an interesting class of techniques for data fusion. We look into the possibility of using the geometric mean of matrices instead of the arithmetic mean for kernel data fusion. While computing geometric means of matrices is challenging, it hints at an intriguing research direction in data fusion. We will discuss the application of geometric kernel data fusion in protein fold recognition and gene prioritization.
Our kernel data fusion frameworks offer a significant improvement over multiple kernel learning approaches proposed for protein fold recognition. Furthermore, our kernel-based protein fold recognizers, which were developed by fusing twenty-six different protein features through the geometric mean of their corresponding kernel matrices, improve the state of the art.
Moreover, the experimental results demonstrate that geometric kernel fusion can effectively improve the accuracy of the state-of-the-art kernel fusion models for prioritizing disease-associated genes. In particular, for gene prioritization, we design a geometric kernel data fusion model using the log-Euclidean mean of kernel matrices, which offers scalability to large data sets. Moreover, to deliver more accurate gene prioritization predictions, we introduce a heuristic weighted approach for integrating kernel matrices using a log-Euclidean mean of kernel matrices.