Program / Abstracts
Gareth W. Peters and Pavel V. Shevchenko
Machine Learning Methods and Data Analytics in Risk and Insurance
(main course of the summer school - about 25 hours)
Machine learning and data analytics is an emerging field that is beginning to have a strong influence on the field of actuarial science practice. The onset of big data applications in insurance has driven the profession to explore new ways to understand data and modelling. Unlike in Google and Facebook type technology applications where huge data bases of labelled data are available, in the insurance context we are often considering unsupervised learning methods. This course will address core methodology to tackle unsupervised problems of relevance to insurance applications.
- Introduction to unsupervised machine learning with context of insurance
- Brief statements of axiomatisation of clustering and impossibility theorem results for clustering
- Preparing data for clustering
- K-means clustering, K-centroids
(with examples in cyber risk and insurance)
- Information theoretic interpretations and Bregmann bias
- Hard vs. soft assignment methods and probabilistic clustering
- Expectation-maximization methods for clustering.
- Frequentist and Bayesian-EM variants
- Applications of EM algorithm and variants.
- (claims reserving examples)
- Feature maps and kernel maps.
- Non-linear clustering via kernel k-means
- Families of kernels
- Kernel target alignments and hyper-parameter tuning
- Feature extraction methods
- Probabilistic PCA and robust PPCA factor models
- Mortality modelling examples
- Un-supervised multi-kernel learning
- Classification trees and random forests
(home insurance - aggregators)
- Ada boost and bagging
The speakers will use the statistical software R with the editor RStudio during the lectures. Participants can (but do not have to) bring along their own laptop with the most recent version of R installed and either a good R programmer editor or R IDE (e.g., the open source edition of RStudio).
Prof. Dr. Allan Hanbury
Sentiment Analysis in Finance
(special invited lecture - 1 hour on Friday morning)
Some information in the finance domain is in the form of text, such as annual reports for shareholders, or reports submitted to a regulator. Information extracted from this text by sentiment analysis can be used to complement numerical financial information. Sentiment Analysis is usually done by an Explicit Semantics approach, in which manually created financial sentiment lexica are used. A Statistical Semantics approach will be described, which attempts to enrich the lexica through the analysis of large amounts of text. Applications will be shown in the analysis of annual bank reports and of US Form 10-K submissions.
Dr. Oleg Szehr
Methods of optimal transport in machine learning
(special invited lecture - 1,5 hours on Wednesday afternoon)
Recently methods of optimal transport have found wide applications in machine learning. In this talk we review some basics of statistical learning theory and optimal transportation. We present several applications ranging from music transcription to so-called Wasserstein GAN that rely on optimal transportation. We discuss the duality method of optimization and show a representer theorem for a Wasserstein loss. The talk concludes with a review of Otto's calculus on the Wasserstein-2 manifold of probability measures. We show how gradient descent in this space can be interpreted as a reinforcement learning process, giving a new perspective to Jordan-Otto-Kinderlehrer steepest decent scheme.
Prof. Dr. Josef Teichmann
Machine Learning Methods in Finance
(special invited lecture - 3 hours on Thursday afternoon)
We explain in detail two recent applications of machine learning techniques in mathematical Finance, namely learning algorithms of a calibration functional and solving a real world risk management problem (based on joint works with Hans Buehler, Christa Cuchiero, Lukas Gonon, Wahid Khosrawi-Sardroudi, and Ben Wood).