• Naftali (Tali) Tishby נפתלי תשבי   

    Physicist, professor of computer science and computational neuroscientist

    The Ruth and Stan Flinkman professor of Brain Research

    I work at the interfaces between computer science, physics, and biology which provide some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations.

    News

    Our Information Bottleneck Theory of Deep Learning has recently been noticed - at last!

    See the Quanta-Magazine article on our work and my June 2017 Berlin Deep Learning Workshop talk which triggered it.

    A longer online talk given at Yandex, Moscow, October 10, 2017.

  • Courses given this year

    I'm teaching only during the fall semester this year.

  • Research Projects

    We work at the interface between computer science, physics, and biology which provides some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations. An example is the Information Bottleneck method that provides a general principle for extracting relevant structure in multivariate data, characterizes complex processes, and suggests a general approach for understanding optimal adaptive biological behavior

    A Deeper Theory of Deep Learning

    Information Bottleneck theory of Deep Neural Networks

    The success of artificial Neural Networks, in particular Deep Learning (DL), poses a major challenge for learning theory. Over the recent years we have developed a fundamental theory of Deep Neural Networks (DNN) which is based on a complete correspondence between supervised Deep Neural Networks, trained by Stochastic Gradient Decent (SGD), and the Information Bottleneck framework. This correspondence provide a - much needed - mathematical theory of Deep Learning, and a "killer application" with a large scale implementation algorithm for the information bottleneck theory. The essence of our theory is that stochastic gradient decent training, in its popular implementation through error back-propagation, pushes the layers of any deep neural network - one by one - to the information bottleneck optimal tradeoff between sample complexity and accuracy, for large enough problems. This happens in two distinct phases. The first can be called "memorization", where the layers "memorize" the training examples with a lot of irrelevant details with respect to the labels. In the second phase, which starts when the training error essentially saturates, the noise in the gradients pushes the weights, for every layer, to a Gibbs - maximum entropy - distribution subject to the training error constrain. This causes the layers to "forget" irrelevant details of the inputs, which dramatically improves the generalization ability of the network.

     

    Our theory has the following predictions, which are also our main research thrusts of this project:

    • The sample-complexity and accuracy of the DNN is determined by the mutual information of the encoder and decoder of the last hidden layer.  For large enough problems they achieve the information theoretic optimal tradeoff, which depends only on the input-label distribution. In that sense DNN are optimal learning machines.
    • The convergence time is dominated by diffusion (in a non-convex space!). The compression time is exponentially boosted by the hidden layers!
    •  The hidden layers converge to very special points in the information plane (see figure), which depend on the phase transitions (bifurcations) of the information bottleneck theory.
    • How much of this theory is specific to the SGD optimization? 
    • How much of it is relevant for biological learning and "real brains"?
    Figure from the September 21 issue of  Quanta-Magazine article on our work .

    Information constrained control and learning

    Information flows governs sensing-acting and control. We develop the theory to understand how.  

    We study how information constrains on sensory perception, working memory, and control capacity, affect optimal control and reinforcement learning in biological systems. Our basic model is a POMDP, represented by a directed graphical model consists of world states, W, organism's memory states, M, local observations O, and actions. A. We consider such typical models that achieve a give value (expected future rewards), by minimizing the information flow in all adaptable channels, under the value constraint. This is equivalent to the simplest organism

    that achieves a certain value through interactions with its environment. It is also the most robust or fastest to evolve organism, according to the information bottleneck framework. The optimal performence of the organism is determined by the past-future information bottleneck tradeoff, or by the predictive information of the environment.

     

    The simplest organism of this type is the Szilard information engine, with a thermal bath as the environment and extracted mechanical work as value. In this case the observation, memory, and action channels have single bit capacities. We also study how sub-extensivity of the predictive information can explain both discounting of rewards and the emergence of heirarchical internal representations.

     

    Figure taken from Ortega at. al. (2016), based on Tishby and Polani (2009).

    The Information Bottleneck approach in Brain Sciences

    Cognitive functions, such as perception, decision making and planing, memory, and language, are dominated by information constrains and quantify by the Information Bottleneck framework. Learn how.

    We argue that perception, memory, and cognitive representations of the world (semantics) are governed by information theoretic tradeoffs between complexity and accuracy, more than any other any other metabolic or physical constrains. In a recent study we show color names in different languages can be explained by this principle, as part of an on going study on the semantic structure of natural languages, which goes all the way to our original ideas on distributional representations of words (an early version of word2vec) and the first formalization of the information bottleneck as distributional clustering.

     

    Figure from Zaslavsky et. al. (2017).

  • Lab Alumni: graduate students

    Lab Alumni: Postdocs