In this course students will apply basic statistical and data mining techniques to work with large scale datasets. Structured and unstructured data will be used. Case studies of large-scale analysis in practice will be reviewed. Topics include summarizing large datasets, analyzing text data, methods of modelling with data, and analyzing data over time. Students use a statistical programming language.
Prerequisites
DATA101N, MATH106N.