A yearly course, part of the BioSB Research School
Lecturers
 dr. ir. Perry Moerland (Amsterdam UMC, location: Academic Medical Center)
 prof. dr. ir. Marcel Reinders (Delft University of Technology)
 prof. dr. Lodewyk Wessels (Netherlands Cancer Institute)
Course coordinator:
The first BioSB Machine Learning for Bioinformatics & Systems Biology Course will be taught October 711, 2019, at the Academic Medical Center, Amsterdam. General information on the course can be found here. This page only contains the material used during the course. Note that this material is free for academic use only and should not be redistributed.
Note that the course material is still going to change before and during the course week.
 To prepare for the course:
 a selfevaluation test (PDF, 90 Kb) on the prerequisite prior knowledge (probability theory and linear algebra). If you have a lot of trouble answering some of these exercises, consult the text books mentioned in the PDF, or:
 a few primers (ZIP/PDF, 4.9 Mb) on these topics.
The lab courses will make extensive use of Matlab. You do not need to be a fluent programmer, but if you have never worked with Matlab before it may help to try to get a hold of a copy of Matlab (your university may have a campus license) and have a look at the Appendices of the lab course manual (see below). An extensive Matlab primer is also available. During the course Matlab and all software/data are available on the PCs in the lab, so there is no need to bring your laptop.
 Material used during the lectures:
 Material used during the lab course:
To use the code and data, download the ZIP file, unpack everything in the same directory and run prstartup from the Matlab command prompt. A not too old version of Matlab (R2006a or newer) is required.
 Additional tools (not required for the course, but perhaps interesting):
 scikitlearn is a free software machine learning library for the Python programming language
 R is very popular for solving data analysis problems. Here is a short reference that provides a mapping between Matlab and R commands.
 R packages relevant for some of the topics treated in the course are (spread out over a whole range of packages, list is far from complete):
 First have a look at mlr which is the machine learning package in R.
 Then have a look at caret which also provides a nice set of functions that attempt to streamline the process for creating predictive models.
 e0171: support vector machines and a flexible framework for crossvalidation/bootstrapping using the tune function
 MCResimate: flexible framework for feature selection and crossvalidation providing a wrapper for several classifiers (PAM, SVM, random forests, ...). Easily extended with classifiers available in other packages
 MASS: dla, qda
 rpart: decision trees
 stats (installed by default): hierarchical clustering, kmeans
 glmnet: lasso, elastic net
 See CRAN Task View: Machine Learning & Statistical Learning and CRAN Task View: Cluster Analysis & Finite Mixture Models for pointers to other packages
 WEKA, a Javabased collection of machine learning algorithms for data mining
 Shogun, a Matlab toolbox focusing on large scale kernel methods
 GenLab and PRLab (ZIP), a GUI for microarray data analysis, clustering and classification (poorly maintained, use at your own risk!)
 BRB ArrayTools, an Excelbased microarray data analysis package using R in the background
Some good material for further reading:
 G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning with Applications in R, also freely available online with R code, slides, videos etc.
 R.O. Duda, P.E. Hart and D.G. Stork, Pattern classification, 2nd ed., 2000. ISBN: 0471056693.
 C. Bishop, Pattern recognition and machine learning, 2007. ISBN: 0387310738.
 T. Hastie, R. Tibshirani and J. Friedman, The elements of statistical learning: data mining, inference, and prediction, 2nd ed., 2009. ISBN: 0387848576.
 D. Barber, Bayesian Reasoning and Machine Learning, also freely available online.
 A. Webb, Statistical pattern recognition, 2nd ed., 2002. ISBN: 0470845147.
 F. van der Heijden, R.P.W. Duin, D. de Ridder and D.M.J. Tax, Classification, parameter estimation and state estimation: an engineering approach using MATLAB, 2004. ISBN: 0470090138.
 A.K. Jain, R.P.W. Duin and J. Mao, Statistical pattern recognition: a review, IEEE Tr. on Pattern Analysis and Machine Intelligence 22(1):437, 2000.
 M. FernándezDelgado, E. Cernadas, S. Barro, Dinani Amorim, Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?, JMLR, 15(Oct):3133−3181, 2014
