CloseHelpPrint
Kies de Nederlandse taal
Course module: B-MBIOBMLB
B-MBIOBMLB
Basic Machine Learning for Bioinformatics
Course info
Course codeB-MBIOBMLB
EC3
Course goals
At the end of the course, the student:
-understands and can explain the difference between unsupervised and supervised learning.
-can write their own cost function and gradient descent function for supervised learning problems.
-can write their own implementation of linear regression, logistic regression, and basic (dense) neural networks.
-can use the basics of linear algebra to vectorise these implementations for speed and readability.
- Understands the concept of overtraining and the bias / variance tradeoff and can use cross-validation, regularisation and learning curves to assure generalisation potential of learned predictors.
-can pick good hyperparameters for learning algorithms.
-can write their own implementation of k-nearest neighbour and hierarchical clustering, and perform dimensionality reduction using PCA.
-knows the different metrics for classifier performance, how to calculate them, and when to use them (ROC-AUC, accuracy, precision and recall, etc.)
-knows the basics of working with scikit-learn, a modern, comprehensive framework for machine learning in Python.
-knows the basics of visualisation for machine learning (ROC curves, learning curves, etc.).
-has applied the above knowledge to a group project based on real biological data.
 
Content
Modern biology is largely a data-driven enterprise. We collect genomic information on thousands of patients and matched controls to find genomic causes for illness using GWAS, easily collect expression of (tens of) thousands of genes at different time points and under different experimental conditions to understand what makes a system tick, and with the rise of single-cell omics the datasets are larger and more specific than ever before. Our minds are formidable pattern recognition devices, but they are biased in various ways and not equipped for these huge datasets. How can we use all this data to build good predictive models, or automatically order data so that we can gain new insights?

Enter Machine Learning. A term that calls forth visions of AI overlords for some, but is so much more pedestrian in most of its applications (yet I, for one, welcome our AI overlords, if they so happen to read this). In this course, we start with the basics: what is the difference between supervised and unsupervised learning (and what we want the computer to do in each case), how do we formulate something that the computer can optimise on its own given training data, and how do we then iteratively optimise this? With these basics of cost functions and gradient descent (or more elaborate optimisation methods) under our belt, we then look at several well-known algorithms and implement them ourselves using only Python, numpy and pandas to gain in-depth understanding in the first week, before moving on to the modern scikit-learn library which does all the heavy lifting for you in the second week. We top it off with a group project on a biological dataset where your team tries to build the best classifier for that dataset. Along the way we look at clustering and dimensionality reduction, and gain cursory knowledge of linear algebra, which is the language that machine learning algorithms are formulated in and which you will use to do so as well.

When you are done with this course, you should be well-equipped to independently learn about more complex classifiers (Random Forests, convolutional neural networks, etc.) or unsupervised methods, and to apply ML to real-world biological problems. This course also lays the foundations for the more theoretical and higher-level understanding you’ll gain in Analytics and Algorithms for Omics Data (BMB508219). 
 
CloseHelpPrint
Kies de Nederlandse taal