UnivIS
Informationssystem der Universität Kiel © Config eG 
Semester: SS 2024 

Mathematics of Reinforcement Learning (MathReinf) (060329)

Dozent/in
Prof. Dr. Sören Christensen

Angaben
Vorlesung, 2 SWS
Praesenzveranstaltung, Unterrichtssprache Englisch
Zeit und Ort: Mi 12:15 - 13:45, HHP6 - R.EG.001; Fr 8:15 - 9:45, HHP6 - R.EG.001
vom 17.4.2024 bis zum 31.5.2024

Voraussetzungen / Organisatorisches
In addition to the basics of stochastics, no special prior knowledge is required.
Modulhandbuch:
https://www.math.uni-kiel.de/de/studium_und_lehre/studienverlauf-module
Modulcode: mathAKdS5-01a:
https://www.math.uni-kiel.de/de/studium_und_lehre/studienverlauf-module/module/mathAKdS5-01a.pdf
Modulcode: mathAKdF-01a:
https://www.math.uni-kiel.de/de/studium_und_lehre/studienverlauf-module/module/mathAKdF-01a,.pdf
Modulcode: mathAKdW-01a:
https://www.math.uni-kiel.de/de/studium_und_lehre/studienverlauf-module/module/mathAKdW-01a.pdf
Modulcode: mathAKaNuF-01a, 5 LP:
https://www.math.uni-kiel.de/de/studium_und_lehre/studienverlauf-module/module/mathAKaNuF-01a.pdf
Zielgruppe:
1-Fach-Master Mathematik
2-Fach-Master Mathematik
1-Fach Master Finanzmathematik
Link auf Internetseite:
https://lms.uni-kiel.de/url/RepositoryEntry/5434671278

Inhalt
Reinforcement learning refers to a set of machine learning methods in which future decisions are made based on past successes and failures. It is assumed that the decision maker does not know (exactly) the underlying environment. Such methods play a central role in many modern applications, for example in the training of Google's "AlphaZero". The classic example of the "bandit" problem illustrates the basic issues: You are in a casino and want to choose one of the many slot machines ("one-armed bandits") in each round. However, you do not know the payoff distribution of the machines. In the beginning, you will probably just try the machines ("exploration") and then, after some learning, choose the (apparently) best ones ("exploitation"). However, the problem is that if you use one machine a lot, you may not learn anything about the others, and you may not even find the best machine ("Exploration-Exploitation Dilemma"). So what should you do? In this course, we will introduce basic mathematical ideas and notations, and also examine algorithms for solving them. We consider mathematical methods to describe important concepts in reinforcement learning, such as bandit problems, Markov decision processes, and deep learning.

Empfohlene Literatur
Lecture notes and additional literature will be provided

Zusätzliche Informationen
Erwartete Teilnehmerzahl: 15

Zugeordnete Lehrveranstaltungen
UE: Übung zu Mathematics of Reinforcement Learning (060063)
Dozent/in: Prof. Dr. Sören Christensen
Zeit und Ort: n.V.

UnivIS ist ein Produkt der Config eG, Röttenbach