Sensory Substitution & Enhancement for Perceiving High Dimensional Data

As humans, we are severely limited in our capability to understand, visualize or perceive dimensions beyond the spatial 3 dimensions. We can also perceive changes in time, which gives us the ability to visualize 4-D changes and patterns. However, it is extremely unintuitive for us to understand spaces that are any higher than 3D. I remember reading about this first in Michio Kaku’s Hyperspace, where he constructs an analogy between us and fictional 2D beings, that can only perceive two dimensions. These beings would look at everything in 2D, so we (3D beings) would look as contours constantly changing shapes, since they are projected onto the 2D plane that those beings can see.

Today, since we have very different kinds of data that is often high dimensional – “HD” –  (different from Big Data!) there is an increasing appetite for methods that allow us to visualize patterns and changes in these HD spaces. Areas such as topological data analysis (TDA) attempts to solve these problems using tools from Topology, a field in mathematics that has been very popular since the 1930s. Ayasdi a data startup founded by Gunner Carlsson of Stanford, attempts to provide solutions to data problems using topological tools, it recently raised $55M in funding, and is seeing ~400% growth!

High dimensional data visualization is important and needs more attention from mathematicians and engineers. Interestingly a recent TED talk this year spoke about Sensory Substitution, which is basically the process where you take high dimensional real world signals such as images, audio etc. and map them to a much lower dimensional space and feed the low dimensional signal to the brain via different sensory inputs such as electrodes on the tongue, or tactic feedback on the skin etc. The claim is that the brain can automatically learn to “see” or “hear” given a few weeks of training with these new “eyes” or “ears”, since all perception happens in the brain. It is very inspiring and gives a lot of hope for people who have one or more of their primary senses which do not work as they were designed.

Now imagine, instead of “visualizing” 3D data all the time what if we could map dimension #4 to an auditory signal, #5 to a tactile signal, #6 to some other type of signal and have a human “perceive” the data (here I refer to perception as the generalization of visualization). How effective are we, as humans, in obtaining all this information at once and understanding the patterns in the high dimensional space? We will encounter cognitive overload, where beyond a certain point our brain cannot make any sense of the patterns. But as long as we operate within that limit, it will be very exciting to see how we understand 6- dimensional data. An added benefit is that we can map very HD data to 6 dimensions which a whole lot better than mapping it to 3 dimensions (see curse of dimensionality).

Going even further, if we perfect the art of data perception in HD spaces, can we also perceive why some machine learning algorithms fail and some don’t? Can we perceive boundaries between classes in massive datasets? The interesting thing about these questions is that I am not sure who could answer it better – a neuroscientist or an engineer!

Learning Under an Extremely Weak Supervision Framework

There’s no doubt that we have a data deluge problem, while at the same time also going through an AI revolution, it is possible one influences the other.  Solving these issues together will possibly be one of the biggest challenges of the 21st century. I want to talk about one small aspect of one small step towards a solution. But first, a short detour..

Group Testing was a statistical sampling technique that emerged during WW II as a method to rapidly estimate the number of soldiers who suffered from syphilis. The doctors of the time, suggested pooling blood samples into pools which could be tested for the virus. The advantage of such a method was that all the soldiers who belonged to a pool, were free to go if the test for the virus was negative for that pool. Alternatively, if it was positive, then at least one soldier was infected. As it turns out, when the rate of infection (percentage of population that has the virus) is very low (~ 1-5%), probability theory affords us very accurate estimates of the rate of infection, by observing very few samples. Since then, a large community of bio-statisticians have studied the problem and applied it to a wide variety of applications ranging from estimating the spread of a crop disease to HIV.  Interestingly, there has been a lot of work studying group testing’s relation to compressive sensing as well.

We recently used this technique as an easy way to obtain “group labels”  within a Human-in-the-Loop system to estimate a classifier’s performance in the absence of labeled data (paper) .

Proportion-SVM was proposed a couple of years ago, as very weak supervised learning setting that allows you to learn a classifier, when the only supervision available is the ratio of one class to another. It is a fascinating problem, because it reduces the effort required for learning down to estimating proportions! A paper which I have been reading, solves this problem by setting all the samples to random labels (in a binary classification setting) and flips them to observe if their cost function reduces. They also have an analytic solution, which solves it exactly after a convex relaxation of the objective function.  The cost function resembles a typical SVM one, with the addition of an extra term that adds a penalty when the current proportion of classes is different from what was provided during training.

Imbalanced Learning is increasingly becoming relevant within our world deluded by data. More often than not, we have more (much more) of one thing than another. To be more precise, let us look at the example of spam filtering. For every 100 emails you receive, probably 10 are spam which you want your inbox to filter out automatically. In a realistic scenario, the numbers can be even more skewed (1:99), so how do we learn under this constrained setting? It is known that classifiers perform poorly when they are trained on such skewed datasets, because they tend to optimize the cost function almost entirely for the dominant class, ignoring the other class because it doesn’t significantly affect the cost (There’s a joke about the wealthiest 1% in there, but I’ll keep this discussion purely academic). There are many strategies to account for, and correct this imbalance in training data to achieve the best performance on test data. A nice survey of all the approaches is found in this paper.

Here’s the case I’m making:

Group testing works splendidly when the rate (or proportions) are very low, of the order of 1-5%. As it turns out, much of the real world data tends to be skewed in favor of one class, and classifiers that need to be learned will generalize poorly. Proportion SVMs don’t generalize well when the proportions are skewed.

In the field, one can estimate proportions very well using group testing, estimate a classifier on the data using only proportion information between classes. This not only reduces the effort required on labeling, but provides a path towards having intelligent systems in the wild jungle that is the internet. The hurdle that needs to be crossed before we can achieve this, is to develop techniques to change/modify or propose new methods to learn proportion-classifiers on extremely skewed datasets, taking ques from the theory of imbalanced learning.