After this course the student knows how several well-known data mining algorithms work, how and when they can be applied, and how the resulting models and patterns should be interpreted.
Furthermore, the student understands general problems of data-analysis, such as overfitting, the curse of dimensionality, and model selection.
Finally, the student gains practical experience with the programming and application of data mining algorithms through practical assignments.
The course is graded through a written exam, two practical assignments and homework exercises.
With grades P1 and P2 for the practical assignments, grade E for the written exam, and grade H for the homework exercises, the final grade F is computed as follows:
F = 0.5 * E + 0.3 * P1 + 0.2 * P2 + 0.1 * H.
To pass the course, it is required that each practical assignment has grade at least 6 and the grade for the written exam is at least 5. There are 4 homework exercise sets. Ten percent of the average grade H of these four sets counts as a bonus. There is no minimum requirement for the homework exercises.
Participation in the repair test requires that the original final grade is 4 or 5, or AANV.
- F is rounded to the nearest tenth of a point if F >= 6.0.
- F is rounded to the nearest whole point if F < 6.0.
- The maximum final grade is 10.
It is required that the student has:
- Knowledge of algorithms and data structures, at the level of the bachelor course INFODS Datastructuren.
- Successfully completed a serious programming course, such as the bachelor course INFOIMP Imperatief Programmeren
Experience with using packages in R or Python is not sufficient.
- Knowledge of probability and statistics, at the level of INFOB3OMI Onderzoeksmethoden voor Informatica.
- Knowledge of linear algebra (such as treated in the bachelor course INFOGR Graphics).
This course is aimed at students of the Computing Science (COSC) master program.
Topics covered include (content can vary somewhat from year to year):
- Classification Tree Algorithms, Bagging and Random Forests
- Graphical Models (including Bayesian Networks)
- Frequent Pattern Mining
- Text Mining
- Social Network Mining
Lectures and Computer Lab.
Selected book chapters, articles, and lecture notes.