At the end of the course the student:
- can read and understand a paper in current computational and systems biology literature,
- identify relevant parts in the paper on the topic of data generation and the algorithms used to analyse these data and criticise the computational approaches taken,
- list and describe several high-throughput data types and computer algorithms to analyse these data and motivate why a certain algorithm is suitable for the analysis of a certain data type,
- apply the algorithms discussed in this course to toy problems, and derive and design adaptations of these algorithms for new data types,
- draw biologically meaningful conclusions from results obtained with a analysis algorithm.
- understands and can explain the basics of unsupervised Machine learning (ML) and the specifics of k-means, hierarchical and spectral clustering
- understands and can explain the basics of supervised Machine learning (ML), including concepts such as cross-validation and overtraining and the specifics of probabilistic, knn and random forest classifiers
- understands and can explain the basics of dimension reduction and the specifics of PCA, NMF and tSNE.
- understands and can explain the basics of Hidden Markov Models and their application to (epi)genomic data
- understands and can explain the basics of sequence analysis and alignment and the specifics of dynamic programming, variant calling and modern next generation sequencing analysis
Period (from-till): 18 June 2018 - 22 June 2018|
Name, faculty/department, participation (%) in course
Dr. Jeroen de Ridder, UMC University, 60%
Dr. Alexander Schoenhuth, Utrecht University, 40%
Extended course description (for Osiris):
Bioinformatics is at the heart of many modern genomics research, and encompasses the application of statistics and computer science to (large-scale) biomolecular datasets. In essence, bioinformatics is about smart ways of extracting knowledge from the enormous amounts of data that can be generated using modern measurement techniques. For instance, it plays an important role in finding the genetic origins of various diseases, such as cancer, diabetes or alzheimer.
In this course we will study some key examples of bioinformatics analyses, i.e. data analytics and computational algorithms, by reading a set of selected papers that present some significant biological conclusions. Instead of the teachers giving lectures about the methodologies, the students are stimulated to read, study and comprehend the available course material. Some lectures will be provided to ensure the basic concepts are clear.
Schedule: The course runs for five days from 9.00 till approximately 17.00. Each day will start with a lecture followed by two rounds of paper discussions that goes into depth with regards to the computational approaches taken.
- Unsupervised learning, Hierarchical and k-means clustering, spectral clustering
- Supervised learning, cross-validation, overtraining, Bayes classifier, Random Forest classifier
- Dimension reduction, PCA, NMF, tSNE
- Hidden Markov Models, Forward Backward algorithm, Viterbi
- Sequence alignment, Dynamic programming
- Read mapping techniques
- Sequence data indexes, such as Burrows-Wheeler Transform
- Genome assembly basics, de Bruijn graphs, overlap graphs
- Hash-based techniques, for example for overlap detection
Literature/study material used:
Provided course materials (slides) will be made available through our online learning platform: elearning.ubc.uu.nl
Please register online on the CS&D website: www.CSnD.nl/courses.
Bioinformatics Profile students will have priority when this course is followed as a part of their profile.
Thereafter, registration is on 'first-come-first-serve' basis until the maximum number of 20 participants is reached.
Mandatory for students in own Master’s programme:
Optional for students in other GSLS Master’s programme:
Basic knowledge of Linear Algebra and Statistics.
Entry requirements |Required materials-Instructional formats|