Finding Structures in High-Dimensional Biomedical Data

May 23, 2024, 10:00 am11:30 am
Friend Center Room 004



Event Description

High-dimensional datasets often contain unlabeled structures with insightful knowledge. Neighbor embedding algorithms are unsupervised algorithms that identify groups of related data to visualize this structure. The algorithms are regarded as nonlinear dimensionality reduction methods, as they allow visualization of high-dimensional data in a low-dimensional space (typically 2-D). A popular neighbor embedding algorithm is Uniform Manifold Approximation and Projection (UMAP). UMAP utilizes a k-nearest neighbor (k-NN) graph to establish a pairwise metric in a high-dimensional space, which it uses to align a lower-dimensional representation. This dissertation explores techniques for improving UMAP and utilizing it to design new algorithms. We analyze the UMAP algorithm and better explain its optimization scheme and cluster formation, enhance the consistency of embeddings with respect to initialization, and improve its out-of-sample embedding. Then, we apply UMAP for aligning manifolds and analyzing large biomedical image datasets. In particular, we analyze chest x-rays and show that dimensionality reduction can discover 1) different phenotypes of COVID-19 response and 2) outliers in image datasets.

Overall, the methodologies presented in this dissertation provide tools to analyze any iterative dimensionality reduction algorithms to demystify their inner workings and design methods for dis­ covering unlabeled patterns.

Adviser: Jason Fleischer