At the end of this course, you will be able to:
- Understand the role of data science and its societal impact
- Recognise the knowledge discovery processes in applied data science
- Identify trends and developments in big data technologies
- Apply selected big data technologies to solve real-world problems
The graded deliverables generate the final course grade as follows:
- [A] Book review
- [B] Mid-term exam
- [C] End-term exam
- [D] Optional bonus for extraordinary participation/performance
Grade = [A]*0.10 + [B]*0.40 + [C]*0.50 + [D]
NB: To qualify for the second chance exam, all grading components need to be at least 4.0, and component A needs to have been submitted within the allotted time. The 2nd chance exam is an extensive market survey report assignment.
|Even though this course is not a programming course, you are required to write various data analysis scripts. Therefore, if you don't have any script programming experience yet, it is advisable to familiarise yourself beforehand by taking an online introductory Python programming course. |
Last year self-study programming support was supported for free, thanks to the DataCamp.com for the Classroom intuitive learning platform.
|White, J. (2016). Hadoop: The Definitive Guide. Third edition. O'Reilly.|
|Chambers, B., & Zaharia, M. (2018). Apache Spark - The Definitive Guide. O'Reilly.|
|Spruit, M., & Lytras, M. (2018). Applied Data Science in Patient-centric Healthcare: Adaptive Analytic Systems for Empowering Physicians and Patients. Telematics and Informatics, 35(4), 643–653.|
|Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203-1205.|Aanbevolen materiaal
|Pritzker, P., and May, W. (2015). NIST Big Data interoperability Framework (NBDIF): Volume 1: Definitions. NIST Special Publication 1500-1. Final Version 1. National Institute of Standards and Technology.|
|Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.|
|Ghemawat, S., Gobioff, H., & Leung, S. (2003). The Google file system. SIGOPS Operating Systems Review, 37(5), 29-43.|
|Davenport, T. H., & Patil, D. J. (2012). Data scientist: The Sexiest Job of the 21st Century. Harvard business review, 90(5), 70-76.|
|Stair, R. & Reynolds, G. (2012). Fundamentals of Information Systems. Sixth Edition. NOTE: Chapters 1 and 3 ONLY, on Information Systems in Perspective & Database Systems, Data Centers, and Business Intelligence. Cengage: Boston, MA.|
|Chapman, P. Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., and Wirth, R. (2000). CRISP-DM 1.0 Step-by-step Data Mining Guide.|Werkvormen
AlgemeenThere will be 6 contact hours per week. On Tuesdays and Thursdays, regular lectures will be given.
In the first weeks, the lectures will focus more on the fundamentals of applied data science, whereas in the second half we will be introduced into current research of various UU/UMCU researchers related to applied data science.
AlgemeenThe Thursday lectures are then followed by workshop sessions where we will practice with big data tools (esp. Hadoop) and collaboratively investigate their societal impact.