Switch to English
Data science and society
Cursus informatieRooster
Studiepunten (ECTS)7,5
Categorie / NiveauM (M (Master))
CursustypeCursorisch onderwijs
Aangeboden doorFaculteit Betawetenschappen; Graduate School of Natural Sciences; Graduate School of Natural Sciences;
Contactpersoondr. M.R. Spruit
Telefoon+31 30 2533708
dr. A.A.A. Qahtan, PhD
Overige cursussen docent
Contactpersoon van de cursus
dr. M.R. Spruit
Overige cursussen docent
dr. M.R. Spruit
Overige cursussen docent
1-GS  (01-09-2020 t/m 06-11-2020)
TimeslotC: C (MA-mid/namiddag,DI-middag, DO-ocht)
OpmerkingPlease read HERE
for the latest information.
Cursusinschrijving geopendvanaf 02-06-2020 t/m 28-06-2020
AanmeldingsprocedureOsiris Student
Inschrijven via OSIRISJa
Inschrijven voor bijvakkersJa
Na-inschrijving geopendvanaf 17-08-2020 t/m 14-09-2020
Plaatsingsprocedureadministratie onderwijsinstituut
This is the starting and obligatory course for the Business Informatics (MBI) programme as well as the Applied Data Science profile. As such, its primary objective is to inspire and introduce you to the exciting domain of Applied Data Science. At the end of this course, you will be able to:
  1. Understand the role of data science and its societal impact
  2. Recognise the knowledge discovery processes in applied data science
  3. Identify trends and developments in big data technologies
  4. Apply selected big data technologies to solve real-world problems
  5. Analyse unstructured data using natural language processing techniques
  6. Understand the need for self-service data science
The short url for the official course page is:
The official course schedule overview is available at:


The final grade will be determined based on the following course components:
[A] Mid-term exam
[B] End-term exam
[C] Optional bonus (or penalty) for extraordinary (or poor) participation/performance

Grade = [A]*0.50 + [B]*0.50 + [C]

Note that the minimum grade of each of these exams is a 5.0. If for one of the exams your grade is between a 4.0 and a 5.5, you can repair that specific exam during the “second chance” session. Note that it is not possible to repair both exams. You need to have a final grade of 6.0 or higher to PASS the course.

All course materials are examined, including all lecture slides, assignments and weekly readings.

In order to qualify for the Repair Exam, ALL grade components need to be 5.0 or higher, and you also need to have PASSed at least 65% of the assignments.


Applied Data Science

The first course topic that we cover is Applied Data Science (ADS) as positioned in (Braschler et al., 2019) and defined in (Spruit & Lytras, 2018) as “the knowledge discovery process in which analytic systems are designed and evaluated to improve the daily practices of domain experts”. Being the core theme of this course, we cover the need for data scientists (e.g. Davenport & Patil, 2012) and relate this novel topic with the well-known domain of knowledge discovery processes (Chapman et al., 2000). We refer to standardised NIST definitions (Pritzker & May, 2015) to properly ground our ADS perspective.

Data Analytics

Data analytics is the multidisciplinary field which aims to make sense of data and observations from everyday life. Its data-driven approach to problem solving includes various methods and techniques. In this theme we focus on discussing why certain approaches work, what common mistakes are made, and so on, using (Lazer et al., 2014; Broniatowski et al., 2014) as a running example. We will also discuss data analytics tasks from both statistical and machine learning perspectives.

Big Data & Cloud Computing

The original course trigger was the inability of researchers to analyse datasets which were simply too big to process on a laptop. On the one hand they can use someone else’s bigger computer (e.g. Cloud Computing) and on the other hand they can employ other data analysis techniques that are designed to be limitlessly scalable. The prime example of such an analysis technique is MapReduce, which we will discuss both from the original Hadoop perspective (Dean & Ghemawat, 2008) as well as from its successors within the increasingly more popular Spark environment (Chambers & Zacharia, 2018). Furthermore, we also note the more philosophical implications of Big Data technologies using (Ambrose, 2015). How do we know that we know? What are the epistemological implications of Big Data analyses on the theory of knowledge? Would a historical perspective be helpful?

Natural Language Processing

We introduce the field of Natural Language Processing (NLP) as a key technology within data science and artificial intelligence. Applications of NLP are everywhere where people communicate, including web search, scientific papers, emails, customer service, language translation, and clinical reports. Recently, deep learning approaches have obtained very high performance across many different NLP tasks. However, for decades NLP has mostly been based on symbolic approaches instead. Current NLP research aims to meaningfully integrate these two paradigms to better understand human language. Therefore, we will introduce you first to some classical linguistic theories before moving into more recent neural network-based NLP approaches, based on (Clark et al., 2013). Furthermore, the computational experiment assignment will allow you to experiment more in-depth with a state-of-the-art approach within this fast moving field of NLP.

Automated Machine Learning

As identified in (Spruit & Jagesar, 2016), one of the major challenges in correctly applying Machine Learning techniques in Applied Data Science projects is the so-called Selection vs Configuration dilemma. Often it is quite hard to select the best algorithm for a given data analysis task, and even harder to properly configure its (hyper-)parameters. Even for data scientists. One promising solution might be Automated Machine Learning (Hutter et al., 2019). Thus, AutoML promises to reduce the human effort necessary for applying machine learning, improve the performance of machine learning algorithms, and improve the reproducibility and fairness of scientific studies.

Self-Service Data Science

In the Do-It-Yourself week you will work individually on an NLP computational experiment and experience the course vision of self-service data science. The assignment has many variations in datasets, language models and techniques.

Societal Impact

You decide which popular Data Science book with societal impact you read and pitch!

Other Trends

In the final lecture we will introduce other interesting data science techniques and developments which we could not cover in the course, but which may be worth investigating in a later course or research project.

Course form

This Corona edition of our course is somewhat differently structured... We do keep the twice-a-week lecture slots, in MS Teams streaming format. However, these sessions will mostly start with an interactive multiple choice quiz, which is just for fun and to informally test your current knowledge, and be followed by a general Q/A session for any remaining questions. These sessions will be recorded and it is not mandatory to attend any lectures.

Regular lecture materials will be provided as videos to be viewed anytime. This is why we will have regular quizes to test and help you remind whether you actually watched and read all materials. The workshop sessions will be taking place online as well in a standard asynchronous discussion channel format on MS Teams. Our TA and SAs will try to answer any queries asap in the Technical Support channel.

Throughout the course, you are given a number of individual (mostly quite small) assignments. The answers to the assignments are to be submitted to the appropriate channel in our DSS 2020 Teams group before the stated deadline (mostly one week after release). There will be no deadline extensions, so be sure to submit appropriately. These assignments will be assessed but not graded: you either PASS or FAIL. When you have FAILed 20 percent or more of the total number of assignments, you will have FAILed the course due to the 'inspanningsverplichting' (course effort) criterion. However, if you did PASS at least 65% of the assignments, you will be given the opportunity to do the REPAIR assignment (which is a relatively big assignment).

e.g. With 16 assignments, you will need to PASS 13/16 (~81%) assignments. In case you have either 11 or 12 PASSes, you qualify for the substantial REPAIR assignment. Should you merely PASS 10 (~63%) or less assignments, then you have FAILed the course without a second chance.

To help you complete the assignments, this class is also supported by the DataCamp learning platform for Python, SQL and more, through a combination of short expert videos and hands-on-the-keyboard exercises.


We provide PDFs for most if not all required literature.

Even though this course is not a programming course, you are required to write various data analysis scripts. Therefore, if you don't have any script programming experience yet, it is advisable to familiarise yourself beforehand by taking an online introductory Python programming course.
Verplicht materiaal
Igual, L., & Seguí, S. (2017). Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications. Switserland: Springer.
We provide literature on all topics as listed in the official course documentation.
Kosten materiaal:0,00

This course contains two lectures per week, for which afterwards the slides will be made available on our DSS Teams group.


Throughout the course, you are given a number of individual assignments. The answers to the assignments are to be submitted to the appropriate channel in our DSS Teams group.

Minimum cijfer-

Switch to English