OSIRIS - Course offerings INFOMDA1 2021

Help

Course module: INFOMDA1

INFOMDA1

Supervised learning and visualization

Course info

Course code		INFOMDA1
EC		7.5

Course goals

At the end of this course, students are able to apply and interpret the theories, principles, methods and techniques related to contemporary data science, and understand and explain different approaches to data analysis:

apply data wrangling and preprocessing techniques to tidy data sets
apply, implement, understand and explain methods and techniques that are associated with statistical learning, including regression, trees, clustering, classification techniques and learning ensembles in R
evaluate the performance of these techniques with appropriate performance measures.
select appropriate techniques to solve specific data science problems
motivate and explain the choice for techniques to investigate data problems
implement and understand generic data science tools, such as bootstrapping, cross validation, bagging, boosting and error evaluation in R
interpret and evaluate the results of analyses and explain these techniques in simple terminology to a broad audience
understand and explain the basic principles of data visualization and the grammar of graphics.
construct appropriate visualizations for each data analysis technique in R

In this course, skills and knowledge are evaluated on these separate occasions:

With the exam and the assignments the knowledge from methodological and statistical concepts is evaluated (learning goals 1b, 1c, 1f, 1g), as well as the application of these concepts to research scenarios (learning goals 1d and 1e). During the exam students need to interpret, evaluate and explain statistical software output and results (learning goal 1g).
With the practical lab and the assignments it is tested if the student has sufficient skills to solve analysis problems and execute the relevant methodology on real-life data sets (learning goals 1a, 1b, 1c, 1f, 1h and 1i).

Assessment
Digital exam (test weight 50% of the final mark) and two assignments (each 25%). Minimum grade for passing: 5,5.

Digital exam: Assessment. Aspects of student academic development•  Information study and analysis
•  Synthesizing and structuring of information
•  Knowledge leverage in a wider context
•  Numeric skills
•  Research preparation / set-up
•  Research reports - written
•  Mastering of methods and techniques

In order tot pass the course, all practical exercises must be submitted and be sufficient.

Pre-requisites and general enrollment information - please read before registering

This course is a part of the GSNS Profile "Applied Data Science".

If you have chosen to register for the profile, you will receive preference when registering for this course.

To register for this course as a profile student and receive placement preference, you must use the special registration form before June.

For the form and for more information about the procedure see https://students.uu.nl/en/science/academics/applied-data-science.

Other interested (non-profile) students, from any Faculty, are also welcome in this course, provided you are a UU master student that meets the prerequisites.

Any questions regarding the ADS profile and registration for the profile should be directed to the ADS profile coordinator (not the course coordinator)

Content

Supervised learning is such an integral part of contemporary data science, that you will most likely use it dozens of times a day, without knowing it. In this class you will learn about the most effective supervised learning techniques and you will acquire the skills to implement them to work for you.
We will not only discuss the theoretical underpinnings of supervised learning, but focus also on the skills and experience to rapidly apply these techniques to new problems.

During this course, participants will actively learn how to apply the main statistical methods in data analysis and how to use machine learning algorithms and visualizing techniques.
The course has a strongly practical, hands-on focus: rather than focusing on the mathematics and background of the discussed techniques, you will gain hands-on experience in using them on real data during the course and interpreting the results.
This course provides a broad introduction to supervised learning and visualization. Topics include:

Data manipulation and data wrangling with R
Data visualization
Exploratory data analysis
Regression and classification
Non-linear modeling
Bagging, boosting and ensemble learning

Students will learn to adapt these techniques in their way of thinking about analyses problems. This course makes students better equipped for a further career (e.g. junior researcher or research assistant) or education in research, such as a (research) Master program, or a PhD.

Course form

Lectures. Class session preparation: the assigned literature must be studied before each lecture.
Practicals. Every week an individual exercise must be made and submitted.

Literature
Excerpt from these freely available texts:

James, Witten, Hastie & Tibshirani (2015). "An introduction to statistical learning with applications in R." New York: Springer. http://www-bcf.usc.edu/~gareth/ISL/
Wickham (2016). "R for Data Science". O’Reilly. http://r4ds.had.co.nz/
Van Buuren (2018). "Flexible imputation of missing data". https://stefvanbuuren.name/fimd/

Help