Naftali Tishby

**Professor, Hebrew University**

**Office: **Rothberg B412, Safra Campus

**eMail**: tishby@cs.huji.ac.il

**Phone & Fax**: +972-2-54-94569

**Mobile:** +972-525-274698

Naftali (Tali) Tishby נפתלי תשבי

Physicist, professor of computer science and computational neuroscientist

**I work at the interfaces between computer science, physics, and biology which provide some of the most challenging problems in today’s science and technology. We focus on organizing computational**__principles__that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations.News

**Our Information Bottleneck Theory of Deep Learning has recently been noticed - at last!****See the Quanta-Magazine article on our work and my June 2017 Berlin Deep Learning Workshop talk which triggered it.****A longer online talk given at Yandex, Moscow, October 10, 2017.**My current Lab

Courses given this year

I'm teaching only during the fall semester this year.

ELSC 76915 (fall 2017-18)

The Information Bottleneck seminar

ELSC 76929 (fall 2017-18)

Research Projects

We work at the interface between computer science, physics, and biology which provides some of the most challenging problems in today’s science and technology. We focus on organizing computational

__principles__that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations. An example is the Information Bottleneck method that provides a general principle for extracting relevant structure in multivariate data, characterizes complex processes, and suggests a general approach for understanding optimal adaptive biological behaviorA Deeper Theory of Deep Learning

Information Bottleneck theory of Deep Neural Networks

The success of artificial Neural Networks, in particular Deep Learning (DL), poses a major challenge for learning theory. Over the recent years we have developed a fundamental theory of Deep Neural Networks (DNN) which is based on a complete correspondence between supervised Deep Neural Networks, trained by Stochastic Gradient Decent (SGD), and the Information Bottleneck framework. This correspondence provide a - much needed - mathematical theory of Deep Learning, and a "killer application" with a large scale implementation algorithm for the information bottleneck theory. The essence of our theory is that stochastic gradient decent training, in its popular implementation through error back-propagation, pushes the layers of any deep neural network - one by one - to the information bottleneck optimal tradeoff between sample complexity and accuracy, for large enough problems. This happens in two distinct phases. The first can be called "memorization", where the layers "memorize" the training examples with a lot of irrelevant details with respect to the labels. In the second phase, which starts when the training error essentially saturates, the noise in the gradients pushes the weights, for every layer, to a Gibbs - maximum entropy - distribution subject to the training error constrain. This causes the layers to "forget" irrelevant details of the inputs, which dramatically improves the generalization ability of the network.

Our theory has the following predictions, which are also our main research thrusts of this project:

- The sample-complexity and accuracy of the DNN is determined by the mutual information of the encoder and decoder of the last hidden layer. For large enough problems they achieve the information theoretic optimal tradeoff, which depends only on the input-label distribution. In that sense DNN are optimal learning machines.
- The convergence time is dominated by diffusion (in a non-convex space!). The compression time is exponentially boosted by the hidden layers!
- The hidden layers converge to very special points in the information plane (see figure), which depend on the phase transitions (bifurcations) of the information bottleneck theory.
- How much of this theory is specific to the SGD optimization?
- How much of it is relevant for biological learning and "real brains"?

Figure from the September 21 issue of Quanta-Magazine article on our work .Information constrained control and learning

Information flows governs sensing-acting and control. We develop the theory to understand how.

We study how information constrains on sensory perception, working memory, and control capacity, affect optimal control and reinforcement learning in biological systems. Our basic model is a POMDP, represented by a directed graphical model consists of world states, W, organism's memory states, M, local observations O, and actions. A. We consider such

*typical*models that achieve a give*value*(expected future rewards), by minimizing the information flow in all adaptable channels, under the value constraint. This is equivalent to the*simplest organism*that achieves a certain value through interactions with its environment. It is also the most

*robust*or fastest to evolve organism, according to the*information bottleneck*framework. The optimal performence of the organism is determined by the*past-future information bottleneck tradeoff,*or by the*predictive information*of the environment.The

*simplest organism*of this type is the*Szilard information engine*, with a thermal bath as the environment and extracted mechanical work as value. In this case the observation, memory, and action channels have single bit capacities. We also study how*sub-extensivity*of the predictive information can explain both*discounting*of rewards and the*emergence of heirarchical internal representations*.Figure taken from Ortega at. al. (2016), based on Tishby and Polani (2009).

The Information Bottleneck approach in Brain Sciences

Cognitive functions, such as perception, decision making and planing, memory, and language, are dominated by information constrains and quantify by the Information Bottleneck framework. Learn how.

We argue that perception, memory, and cognitive representations of the world (semantics) are governed by information theoretic tradeoffs between complexity and accuracy, more than any other any other metabolic or physical constrains. In a recent study we show color names in different languages can be explained by this principle, as part of an on going study on the semantic structure of natural languages, which goes all the way to our original ideas on distributional representations of words (an early version of word2vec) and the first formalization of the information bottleneck as distributional clustering.

Figure from Zaslavsky et. al. (2017).

Lab Alumni: graduate students

- Roy Fox (PhD 2016)
- Nori Jacoby (PhD 2014, Co-advisor: Merav Ahissar)
- Jonathan Rubin (PhD 2013, Co-advisor: Eli Nelken)
- Sivan Sabato (PhD 2012)
- Asaf Gal (PhD 2012, Co-advisor: Shimon Marom)
- Yuval Tassa (PhD 2010, Co-advisor: Emo Todorov)
- Ohad Shamir (PhD 2010)
- Dan Rosenbaum (MSc 2010)
- Uri Heinemann (MSc 2009)
- Naama Parush (PhD 2009. Co-advisor: Hagai Bergman)
- Yevgeny Seldin (PhD 2009, MSc. 2002)
- Roi Weiss (MSc 2007)
- Eyal Krupka (PhD 2008)
- Meital Rabani (MSc 2007)
- Hani Neuvirth (Co-advisor: Gideon Schreiber)
- Amir Navot (PhD 2006)
- Ran Gilad-Bachrach (PhD 2005)
- Amir Globerson. (PhD 2005) (Co-advisor: Eilon Vaadia)
- Yaki Engel (PhD 2005) (Advisor: Ron Meir)
- Shmuel Brody (MSc 2005)
- Amit Rosner (Co-advisor: Udi Shapiro)
- Gill Bejerano (PhD 2003. Co-advisor: Hanah Margalit)
- Gal Chechik (PhD 2003. Co-advisor: Eli Nelken)
- Noam Slonim (PhD 2002)
- Elad Schneidman (PhD 2001. Co-advisor: Idan Segev)
- Adi Schreibman (MSc 2000)
- Shai Fine (MSc 1996, PhD 1999)
- Itay Gat (MSc 1995, PhD 1999. Co-advisor: Moshe Abeles)
- Golan Yona (PhD 1998. Co-advisors: Nati & Michal Linial)
- Lidror Troyansky (PhD 1997)
- Shlomo Dubnov (PhD 1996. Co-advisor: Dalia Cohen)
- Dana Ron (PhD 1995)
- Yoram Singer (PhD 1995)
- Tzvika Svinik (MSc 1994)·

Past Courses

- Introduction to Information Processing and Learning, 76915 (Noga Zaslavsky, Fall 2014).
- Music and Brain, 76939 (Roni Granot, Naphtali Wagner, Israel Nelken, Naftali Tishby, Nori Jacoby. Fall 2009).
- Introduction to Linear Systems, 67310 (Tal El-Hai, spring 2010).
- Principled models of Perception-Action-Cycles 76911 (Spring 2009).
- Machine learning seminar 67168 (2009-10).
- Dynamical Systems and Control, 76929 (Fall 2009).
- Intro to Information Theory 67548 (Talya Meltzer, spring 2006)
- Statistical and Computational Learning Theory, 67583 (Ofer Dekel, spring 2006).
- Workshop in Neural Coding (For ICNC students – with data) 76928.
- The learning club

© 2017

Powered by Strikingly - How to make your own website in minutes

Create your own website with Strikingly