In the Machine Learning Tutorial Series, external guest speakers will give tutorial lectures on focused machine learning topics. The target audience are undergraduates, MSc and PhD students, post-docs and interested faculty members.

All talks will be announced via the **ml-talks mailing list**.

If you are looking for previous tutorials, check out the **ML Tutorials Archive**.

## Autumn 2017

Normally, the talks will be on **Wednesdays, 14:00 – 16:00**.

### Sponsors

### Schedule

Date | Speaker | Title | ||||

2017-11-08 | Cedric Archambeau (Amazon) | Bayesian Optimization with Tree-Structured Dependencies | slides | video | ||

2017-11-15 | Manfred Opper (TU Berlin) | Approximate Probabilistic Inference | video | |||

2017-11-22 | Carl Henrik Ek (U Bristol) | Bayesian Nonparametrics and Priors over Functions |

### Abstracts

#### Bayesian Optimization with Tree-Structured Dependencies (Cedric Archambeau, 2017-11-08)

Bayesian optimization has been successfully used to optimize complex black-box functions whose evaluations are expensive and is routinely used hyperparameter optimization. In many applications, like in deep learning and predictive analytics, the optimization domain is itself complex and structured. In this talk, I will first give an overview of Bayesian optimization and then focus on use cases where this domain exhibits a known dependency structure. The benefit of leveraging this structure is twofold: we explore the search space more efficiently and posterior inference scales more favorably with the number of observations than Gaussian Process-based approaches published in the literature. We introduce a novel surrogate model for Bayesian optimization which combines independent Gaussian Processes with a linear model that encodes a tree-based dependency structure and can transfer information between overlapping decision sequences. We also design a specialized two-step acquisition function that explores the search space more effectively. Our experiments on synthetic tree-structured functions and the tuning of feedforward neural networks trained on a range of binary classification data sets show that our method compares favorably with competing approaches.

Joint work with Rodolphe Jenatton, Javier Gonzalez and Matthias Seeger.

#### Approximate Probabilistic Inference (Manfred Opper , 2017-11-15)

Probabilistic, Bayesian models explain the complexity of observed data by a set of hidden, unobserved random variables. Such models allow for the inclusion of uncertainty in predictions and provide a conceptually simple approach to

model selection. On the other hand, explicit computations for Bayesian models usually require high-dimensional sums or integrals involving the posterior probability distribution of the unobserved variables. Unfortunately, there are not many models for which these integrals can be calculated exactly. Monte Carlo methods, which lead to asymptotically unbiased approximate results for such computations could be too time consuming when the number of random variables is large.

To overcome this problem, variational approximations to Bayesian inference are frequently used in the field of machine learning. These methods replace integration /summation by a simpler optimisation problem. The intractable posterior distribution is approximated by a distribution from a simpler family, for which the necessary computations can be performed exactly. The approximate distribution is chosen to be close to the exact one, minimising a certain dissimilarity measure, the so-called Kullback-Leibler divergence.

In my lecture I will explain the basic idea behind the variational approach. I will work through a few example studies and discuss a case (related to Gaussian

process models) where the number of variables is infinite. I will also discuss the limitations of the variational approach and mention possible extensions.

#### Bayesian Nonparametrics and Priors over Functions (Carl Henrik Ek, 2017-11-22)

The fundamental principle facilitating learning is the capability to

make assumptions. The science of machine learning is about developing

methodologies that allows us to formulate assumptions into explicit

mathematical objects (modelling) and integrate them with observed data

(inference). To facilitate learning in more domains we continously

strive to make stronger and stronger assumptions such that we can

become more data efficient. One of the most challenging scenarios is

that of unsupervised learning where we are aiming to explain the data

independent of task. This being a very ill-constrained problem which

requires strong assumptions to provide a satisfactory explanations. In

this lecture I will focus on Gaussian process priors which are objects

that allows us to specify structure on continous infinite parameter

spaces. We will discuss the use of these priors for latent variable

models and the mathematical tools that are needed to combine our

assumptions with data. I will try to provide motivation and intuitions

behind these models and show why I think they are becoming ever more

important as a tool that could provide understanding of assumptions

needed when learning composite functions.