For those that don't know, my day job is actually in signal processing research and development in the aerospace sector. As I document in my book, I came by economics research via a circuitous route. One subject I worked on for awhile (and still do to some extent) is called compressed sensing (Igor Carron's blog is a great way to keep up with the state of the art in that field, and his Google site provides a nice introduction to the subject).
One of the best parts about Igor's blog is that he brings together several lines of research from machine learning, matrix factorization, compressed sensing, and other fields and frequently finds connections between them (they sometimes appear in his regular feature "Sunday Morning Insight").
In that spirit — although more of a Saturday Afternoon Insight — I thought I'd put a thought out there. I've been looking at how the price mechanism relates to the information bottleneck (here, here), but I've also mused about a possible connection between the price mechanism and compressed sensing. I think now there might be a connection between compressed sensing and the information bottleneck.
In compressed sensing, you are trying to measure a sparse signal (a signal that appears in only a sparse subset of your space x, like a point of light in a dark image or a single tone in a wide bandwidth). To do so, you set up your system to make measurements in what is called the dense domain — through some mechanism (Fourier transform, random linear combinations, labeled with Φ) you make the variable you wish to measure appear throughout the space y. Therefore a few random samples of the entire dense space give you information about your sparse signal, whereas a few random samples of an image with a single bright point would likely only return dark pixels with no information about the point.
Is this how the information bottleneck works? We have some domain X in which our signal is just a small part (the set of all images vs the set of images of cats), and we train a feedforward deep neural network (DNN, h1 → h2 → ... hm) that creates a new domain Y where our signal information is dense (cat or no cat). Every sample of that domain tells us information about whether there is an image of a cat being fed into the DNN (i.e. if it identifies cats and dogs, a result of dog tells us it's not a cat).
In compressed sensing, we usually know the some properties about the signal that allow us to construct the dense domain (sparse images of points can be made dense by taking a 2D Fourier transform). However, random linear combinations can frequently function as a way to make your signal dense in your domain. In training a DNN, are we effectively constructing a useful random projection of the data in the sparse domain? As we push through the information bottleneck, are we compressing the relevant information into a dense domain?
The connection between compressed sensing and the structure of a neural net has been noted before (see e.g. here or here), the new part (for me at least) is the recognition of the information bottleneck as a useful tool to understand compressed sensing — "opening the black box" of compressed sensing.