OSIRIS - Onderwijsaanbod INFOMDA2 2022

Sluiten

Help

Cursus: INFOMDA2

INFOMDA2

Battling the curse of dimensionality

Cursus informatie

Cursuscode		INFOMDA2
Studiepunten (EC)		7,5

Cursusdoelen

At the end of this course, students are able to apply and interpret the theories, principles, methods and techniques related to contemporary data science and understand and explain different approaches to data analysis:

1. apply data visualization and dimension reduction techniques on high dimensional data sets
2. apply, implement, understand and explain methods and techniques that are associated with advanced data modeling, including regularized regression, principal components, correspondence analysis, neural networks, clustering, time series, text mining and deep learning
3. evaluate the performance of these techniques with appropriate performance measures.
4. select appropriate techniques to solve specific data science problems
5. motivate and explain the choice for techniques to investigate data problems
6. implement and understand generic data science tools, such as model evaluation, visualization and validation techniques
7. interpret and evaluate the results of analyses and explain these techniques in simple terminology to a broad audience
8. understand and explain the principles of high-dimensional data visualization and the grammar of graphics.
9. construct appropriate visualizations for each data analysis technique in R

In this course, skills and knowledge are evaluated on these separate occasions:

With the exam and the assignments the knowledge from methodological and statistical concepts is evaluated (learning goals 1b, 1c, 1f, 1g), as well as the application of these concepts to research scenarios (learning goals 1d and 1e). During the exam students need to interpret, evaluate and explain statistical software output and results (learning goal 1g).
With the practical lab and the assignments it is tested if the student has sufficient skills to solve analysis problems and execute the relevant methodology on real-life data sets (learning goals 1a, 1b, 1c, 1f, 1h and 1i).

Assessment
Digital exam (test weight 50% of the final mark) and two assignments (each 25%). Minimum grade for passing: 5,5.

Digital exam: Assessment. Aspects of student academic development
•  Information study and analysis
•  Synthesizing and structuring of information
•  Knowledge leverage in a wider context
•  Numeric skills
•  Research preparation / set-up
•  Research reports - written
•  Mastering of methods and techniques

In order tot pass the course, all practical exercises must be submitted and be sufficient.

Pre-requisites
The period 1 course INFOMDA1 (or equivalent) serves as a sufficient entry requirement for this course.
Any questions regarding enrollment in this course should be directed at the course coordinator (not the profile coordinator).

Inhoud

The ever-growing influx of data allows us to develop, interpret and apply an increasing set of learning techniques.
However, with this increase in data comes a challenge: how to make sense of the data and identify the components that really matter in our modeling efforts.
This course gives a detailed and modern overview of statistical learning with a specific focus on high-dimensional data.
In this course we emphasize the tools that are useful in solving and interpreting modern-day analysis problems.
Many of these tools are essential building blocks that are often encountered in statistical learning.
We also consider the state-of-the-art in handing machine learning problems. We will not only discuss the theoretical underpinnings of supervised learning, but focus also on the skills and experience to rapidly apply these techniques to new problems.

During this course, participants will actively learn how to apply the main statistical methods in data analysis and how to use machine learning algorithms and visualization techniques, especially on high-dimensional data problems.
The course has a strongly practical, hands-on focus: rather than focusing on the mathematics and background of the discussed techniques, you will gain hands-on experience in using them on real data during the course and interpreting the results.
This course provides an in-depth discussion of contemporary statistical learning and visualization. Topics include:

Feature selection and regularization
Low-dimensional vizualisation
Dimension reduction for prediction
Neural networks
Clustering
Model-based clustering
Time series
Text mining and natural language processing

Students will learn to implement these techniques in their way of thinking about analyses problems.
This course makes students better equipped for a further career (e.g. junior researcher or research assistant) or education in research, such as a (research) master program, or a PhD.

The course is a part of the master profile "Applied Data Science". Students who have chosen to register for the profile, will receive preference when registering for this course.
Other interested (non-profile) students, from any faculty, are also welcome in this course, provided they are a master student that meets the prerequisites.

Course form

Lectures. Class session preparation: the assigned literature must be studied before each lecture.
Practicals. Every week an individual exercise must be made and submitted.

Literature
Excerpt from these freely available texts:

James, Witten, Hastie & Tibshirani (2015)."An introduction to statistical learning with applications in R". New York: Springer. http://www-bcf.usc.edu/~gareth/ISL/
Silge, J., & Robinson, D. (2017). "Text mining with R: A tidy approach". O'Reilly Media, Inc. https://www.tidytextmining.com/
Jurafsky, D., & Martin, J. H. (2021). "Speech and Language Processing". 3rd ed. draft, https://web.stanford.edu/~jurafsky/slp3/
Hastie, Tibshirani & Wainwright (2015). "Statistical learning with sparsity". New York: Springer. https://web.stanford.edu/~hastie/StatLearnSparsity/

Sluiten

Help