Kursplan för

Scalable Data Science and Distributed Machine Learning
Skalbar data science och distribuerad maskininlärning

EDA080F, 6 högskolepoäng

Gäller från och med: Autumn 2020
Beslutad av: Professor Thomas Johansson
Datum för fastställande: 2022-01-19

Allmänna uppgifter

Avdelning: Computer Science (LTH)
Kurstyp: Ren forskarutbildningskurs
Undervisningsspråk: English

Syfte

The student should become familiar with: scalable data processes and partitioning methods such as random forrest; scaling up neural networks such as CNN, RNN and GANs; and scalable machine learning pipelines for typical decision problems, such as prediction, A/B testing and anomaly detection.

Mål

Kunskap och förståelse

För godkänd kurs skall doktoranden Show in assignments that the introduced concepts (see course content) have been understood and can be applied to a given problem.

Färdighet och förmåga

För godkänd kurs skall doktoranden Solve given, real-world or realistic problems in respective assignments using the concepts and theories introduced in the course.

Värderingsförmåga och förhållningssätt

För godkänd kurs skall doktoranden Be able to determine which method to apply in a given problem context. Be able to determine the quality of a result from applying the taught methods.

Kursinnehåll

The course is given in three modules. In addition to lectures by the organizers there will be invited guest speakers from industry. Module 1 – Introduction to Data Science: Introduction to fault-tolerant distributed file systems and computing. The whole data science process illustrated with industrial case-studies. Practical introduction to scalable data processing to ingest, extract, load, transform, and explore (un)structured datasets. Scalable machine learning pipelines to model, train/fit, validate, select, tune, test and predict or estimate in an unsupervised and a supervised setting using nonparametric and partitioning methods such as random forests. Introduction to distributed vertex-programming. Module 2 – Distributed Deep Learning: Introduction to the theory and implementation of distributed deep learning. Classification and regression using generalised linear models, including different learning, regularization, and hyperparameters tuning techniques. The feedforward deep network as a fundamental network, and the advanced techniques to overcome its main challenges, such as overfitting, vanishing/exploding gradient, and training speed. Various deep neural networks for various kinds of data. For example, the CNN for scaling up neural networks to process large images, RNN to scale up deep neural models to long temporal sequences, and autoencoder and GANs. Module 3 – Decision-making with Scalable Algorithms Theoretical foundations of distributed systems and analysis of their scalable algorithms for sorting, joining, streaming, sketching, optimising and computing in numerical linear algebra with applications in scalable machine learning pipelines for typical decision problems (eg. prediction, A/B testing, anomaly detection) with various types of data (eg. time-indexed, space-time-indexed and network-indexed). Privacy-aware decisions with sanitized (cleaned, imputed, anonymised) datasets and datastreams. Practical applications of these algorithms on real-world examples (eg. mobility, social media, machine sensors and logs). Illustration via industrial use-cases. The ﬁrst course module, we aim to ensure that all students understand the basic concepts and tools in deep learning.

Kurslitteratur

Specific material and literatur is announced and distributed in connection to the course instances.

Kursens undervisningsformer

Undervisningsform: Föreläsningar. Lectures are given module wise in block sessions.

Kursens examination

Examinationsform: Inlämningsuppgifter. Hand-ins (assignments) can include practical parts.
Betygsskala: Underkänd, godkänd
Examinator:

Antagningsuppgifter

Kurstillfällesinformation

Kontaktinformation och övrigt

Kursansvarig: Elin A. Topp <elin_a.topp@cs.lth.se>

Fullständig visning

Scalable Data Science and Distributed Machine Learning Skalbar data science och distribuerad maskininlärning