Dissertation Defense on Visual Data Science

Friday, May 11, 2018 - 12:00 to 13:30
NCS, Room 120

Title: Organizing High-Dimensional Data through Semantics Discovery

Candidate: Salman Mahmood

Time and place: tomorrow, Friday noon. NCS 120


The ubiquity of High Dimensional Data means that it is very important to structure the data in a way that allows the user to understand the data. All the current work in organizing high dimensional data focuses on the statistical aspect of data. We propose the use of semantics to aid the understanding of the data. The semantics of a high dimensional dataset can be estimated using metadata such as attribute labels and other textual descriptions of the dataset. To quantize the semantic distances between different data attributes we use the power of word embeddings.

We describe three visual tools that use the semantic aspect of data to aid understanding. Firstly, we present a tool for subspace exploration. Subspace exploration allows the user to view relationships that are hidden in the various subspaces of the original attribute space. Currently used subspace exploration algorithms return a huge amount of subspaces that they deem interesting. It can be overwhelming for a user to sift through the huge amount of information returned by the algorithm. We use the semantic aspect of the data to limit the number of subspaces by using only those subspaces that are semantically consistent. A visual tool is presented to the user to guide the subspace learning process and analyze the subspaces. Secondly, we present a framework, called Taxonomizer, that takes a visual analytics approach to organizing the attributes of a dataset in the form of a taxonomy. It consists of a set of visual tools that starts out with an automatically computed hierarchical tree where the leaf nodes are the original data attributes. The tree is generated by combining the semantic and statistical aspects of the data. It allows the users to sculpt a taxonomy for any high dimensional dataset.

Finally, we present, D-BIAS, a visual analytics framework and tool for algorithmic bias assessment and mitigation. D-BIAS leverages human understanding to manipulate data and mitigate the effects of bias. We use causal analysis and correlation to identify sources of bias and debias it. Our visual tool identifies semantic relations between the attributes of the data, and it uses them to aid in understanding the factors in the dataset that are contributing to the bias.

In addition, we present an interactive technique which uses the paradigm of exploded views to make small multiples visualizations more intelligible for unacquainted users. Small multiples are great at showing pieces of data individually, however, they do not explain how the different pieces fit together. They can also be difficult to understand for unacquainted users. We use the exploded view paradigm to create various animation designs for multi-class data. The designs are then compared using the Elo ranking scheme.

Computed Event Type: 
Event Title: 
Dissertation Defense on Visual Data Science