CloseHelpPrint
Kies de Nederlandse taal
Course module: 201900026
201900026
ADS: Fundamental techniques in data science with R
Course infoSchedule
Course code201900026
ECTS Credits7.5
Category / Level2 (2 (Bachelor Elaborating))
Course typeCourse
Language of instructionEnglish
Offered byFaculty of Social Sciences; Methods and Statistics;
Contact persondr. G. Vink
E-mailg.vink@uu.nl
Lecturers
Course contact
dr. G. Vink
Other courses by this lecturer
Teaching period
2  (09/11/2020 to 05/02/2021)
Teaching period in which the course begins
2
Time slotC: C (MON-afternoon, TUE-aftern,THU-morn)
Study mode
Full-time
Enrolment periodfrom 02/06/2020 18:00 up to and including 28/06/2020
Enrolling through OSIRISYes
Enrolment open to students taking subsidiary coursesYes
Pre-enrolmentNo
Post-registrationYes
Post-registration openfrom 28/10/2020 18:00 up to and including 01/11/2020
Waiting listNo
Course goals
At the end of this course, students are able to:
 
1. apply and interpret the basic methodological and statistical concepts that are associated with doing predictive and/or inferential research;
a. explain concepts from inferential statistics, such as probability, inference and modeling; and apply them in practice. 
b. make an informed choice for research designs that are suitable for regression analyses.
c. apply and explain the choice for techniques to investigate data problems.
d. apply and explain the concepts of linearity and non-linearity.
e. interpret statistical software output and report software output following APA reporting guidelines.
f. explain and conceptualize statistical inference and its relation to statistical theory.
g. perform the different steps in solving basic regression analysis problems and report on these steps.
 
2. apply and interpret important techniques in linear and logistic regression analysis;
a. perform, interpret and evaluate quantitative (causal) analyses on data with the statistical software platform R.
b. perform analyses in statistical software.
 
Relation between assessment and objective
In this course, skills and knowledge are evaluated on three separate occasions:
  1. With the exam the knowledge from methodological and statistical concepts is evaluated (learning goals 1a, 1d, 1f), as well as the application of these concepts to research scenarios (learning goals 1b and 1c). During the exam students need to interpret statistical software output (learning goal 1e).
  2. With the practical lab it is tested if the student has sufficient skills to solve basic analysis problems and execute quantitative analyses on real-life data sets (learning goals 2a and 2b).
  3. The work groups focuses on applying the newly gained knowledge and skills through a series of motivating real-world case studies aimed at solving relevant data analysis problems and reporting on the steps taken to obtain a solution (learning goal 1g).
Content
Regression techniques are widely used to quantify the relationship between two or more variables. In data science it is very common to investigate this relation and linear and logistic regression are proven to be very powerful techniques. However, it is essential to understand how and when it is appropriate to apply these regression techniques. In this course, students will learn exactly how to do that with the statistical software package R. 
 
This course gives students a new set of tools to explore the issues and problems so many people care about. The course will help students get acquainted with the principles of analytical data science, linear and logistic regression and introduces the basics of statistical learning. These techniques will be presented in the context of estimation, testing and prediction. Students will learn to adapt these techniques in their way of thinking about statistical inference, which will help students to quantify the uncertainty and measure the accuracy of statistical estimates. Students will develop fundamental R programming skills and will gain experience with tidyverse, visualize data with ggplot2 and perform basic data wrangling techniques with dplyr. This course makes students better equipped for a further career (e.g. junior researcher or research assistant) or education in research, such as a (research) Master program, or a PhD. 

In nine weeks you will learn the basics of data handling with R and the details about regression techniques in the context of statistical inference, as well as the connection to research philosophy. During every lecture we will treat a different theoretical aspect. Following each lecture there will be a computer lab exercise that connects the statistical theory to practice, as well as a workgroup meeting wherein you will work on solving motivating real-world case studies. 

Note that you need to register for this course during the FSW registration periods (page is in Dutch). Note also that if you are not an FSS student, the registration period may differ from your habitual one. This course is part of the minor Applied Data Science. If you also want to register for this minor you can do so via OSIRIS student.

Students who cannot comply with the entrance requirements mentioned are advised to take the pre-course  for the ADS minor ADS: Basis van Onderzoeksmethoden en Statistiek (code 201900025, Dutch taught). Students that cannot comply with entrance requirements, but believe to have the necessary background and skills are asked to provide further information on their eligibility. The course coordinator will decide on their eligibility. 
Competencies
-
Entry requirements
-
Prerequisite knowledge
You should be familiar with the basic principles of applied statistics (up to regression). Some familiarity with interpreting basic statistical software output (such as e.g. SAS/STATA/SPSS) is required. Some familiarity with a scripting or programming language, such as SPSS syntax, (preferably) R or Python is desirable, but not necessary.
Required materials
Literature
Excerpt from the freely available text: James, Witten, Hastie & Tibshirani (2015). An introduction to statistical learning with applications in R. New York: Springer. http://www-bcf.usc.edu/~gareth/ISL/
Literature
Excerpt from the freely available text: Wickham. R for Data Science (2016). O’Reilly. http://r4ds.had.co.nz/
Software
All software used (Rstudio, R) is open source and freely available online, as is the mandatory literature.
Recommended materials
Literature
Additional literature and references are provided during the course
Instructional formats
Lecture

Class session preparation
The assigned literature must be studied before each lecture

Practical

General remarks
An individual exercise must be made every week in order to prepare for the workgroup.

Small-group session

Class session preparation
The computer practical exercise must be made before each workgroup meeting

Tests
Assignment 1: Linear regression
Test weight25
Minimum grade5.5

Assessment
Assignment 1 on Linear Regression.

Aspects of student academic development
Information study and analysis
Collaboration, working in a team
Research preparation / set-up
Material / data analysis and processing
Research reports - written
Mastering of methods and techniques

Assignment 2: Logistic regression
Test weight25
Minimum grade5.5

Assessment
Assignment 2 on Logistic Regression

Aspects of student academic development
Information study and analysis
Collaboration, working in a team
Research preparation / set-up
Material / data analysis and processing
Research reports - written
Mastering of methods and techniques

Digital Exam
Test weight50
Minimum grade5.5

Assessment
Exam

Aspects of student academic development
Information study and analysis
Synthesizing and structuring of information
Knowledge leverage in a wider context
Numeric skills
Research preparation / set-up
Research reports - written
Mastering of methods and techniques

CloseHelpPrint
Kies de Nederlandse taal