Data is the currency for modern scientists and engineers. Data-related problems are growing in size, driving the demand for scalable learning tools and efficient optimization algorithms. In recent years, we have been encountering a vast increase in the amount of large and complex data generated through mobile devices, social networking websites, business and medical platforms. Revealing patterns and drawing insights from massive amounts of growing data results in significant advancements in various fields such as science and engineering. There are challenges in dealing with large-scale data using existing data analytics tasks. First, the high dimensionality and large volumes of data lead to I am working on novel approaches to reduce the computational, storage, and communication burden in dealing with large-scale data sets. I focus on revealing underlying structures of large-scale data sets using various data analysis and machine learning tasks such as principal component analysis and clustering used in quantitative disciplines. The strategy is to use randomized techniques for dimensionality reduction and sampling motivated by the classical Johnson-Lindenstrauss lemma. The idea behind this approach is to provide a simple low-dimensional representation or sketch of the data that preserves key properties of the original data. The work of learning is then done using the lower-dimensional sketches instead of the original data set, leading to substantial savings of memory and computation time. |