Saturday, October 7, 2017

Compressed sensing and the information bottleneck

For those that don't know, my day job is actually in signal processing research and development in the aerospace sector. As I document in my book, I came by economics research via a circuitous route. One subject I worked on for awhile (and still do to some extent) is called compressed sensing (Igor Carron's blog is a great way to keep up with the state of the art in that field, and his Google site provides a nice introduction to the subject).

One of the best parts about Igor's blog is that he brings together several lines of research from machine learning, matrix factorization, compressed sensing, and other fields and frequently finds connections between them (they sometimes appear in his regular feature "Sunday Morning Insight").

In that spirit — although more of  a Saturday Afternoon Insight — I thought I'd put a thought out there. I've been looking at how the price mechanism relates to the information bottleneck (here, here), but I've also mused about a possible connection between the price mechanism and compressed sensing. I think now there might be a connection between compressed sensing and the information bottleneck.

In compressed sensing, you are trying to measure a sparse signal (a signal that appears in only a sparse subset of your space x, like a point of light in a dark image or a single tone in a wide bandwidth). To do so, you set up your system to make measurements in what is called the dense domain — through some mechanism (Fourier transform, random linear combinations, labeled with Φ) you make the variable you wish to measure appear throughout the space y. Therefore a few random samples of the entire dense space give you information about your sparse signal, whereas a few random samples of an image with a single bright point would likely only return dark pixels with no information about the point.

Is this how the information bottleneck works? We have some domain X in which our signal is just a small part (the set of all images vs the set of images of cats), and we train a feedforward deep neural network (DNN, h1 h2 → ... hm) that creates a new domain Y where our signal information is dense (cat or no cat). Every sample of that domain tells us information about whether there is an image of a cat being fed into the DNN (i.e. if it identifies cats and dogs, a result of dog tells us it's not a cat).

In compressed sensing, we usually know the some properties about the signal that allow us to construct the dense domain (sparse images of points can be made dense by taking a 2D Fourier transform). However, random linear combinations can frequently function as a way to make your signal dense in your domain. In training a DNN, are we effectively constructing a useful random projection of the data in the sparse domain? As we push through the information bottleneck, are we compressing the relevant information into a dense domain?

The connection between compressed sensing and the structure of a neural net has been noted before (see e.g. here or here), the new part (for me at least) is the recognition of the information bottleneck as a useful tool to understand compressed sensing — "opening the black box" of compressed sensing.

1. Wow, something I've heard of before! I actually wrote a SBIR Phase I proposal on this topic back in ... maybe 2008 or so? I didn't win... it was a shot in the dark because I had to read up on compressed sensing before writing the proposal. Lol. An RF design engineer helped me come up with the proposed system architecture.

It's funny because I thought the same thing reading your information bottleneck pieces earlier: "I wonder if this has something to do with that proposal I wrote years ago?" ... I'd forgotten the name "compressed sensing." So thanks for reminding me.

1. That was a big time for CS. I actually was in on a review of a Rice University project implementing a CS sensor and wrote a big report about CS while on my government fellowship.

Everybody seemed to want to use it incorrectly. CS is for situations where you either a) are already sampling the dense domain with few measurements, or b) a small reduction in samples is a big "utility" gain. The prime example is MRI (already sampling dense domain and a 50% reduction in collection time [usually hours] is a bid deal).

2. Regarding finding a sparse signal in noise, I also recently (a year ago?) worked on an implementation of a "sparse FFT" in FPGAs. I don't think the papers I based it on mentioned "compressed sensing" but it seemed very similar. The idea was to use much smaller FFTs to do the work of locating sparse bandlimited signals that you'd normally need to do a large FFT to find (in frequency).

1. One of the things that happened was that the "hype" of compressed sensing started to reach the "trough of disillusionment" and people started using different words -- one of which was sparse reconstruction. The sparse FFT from MIT is actually a compressed sensing approach.