DATA210N: Data Wrangling

Class Program
Class Hours 2 Lab Hours 2 Credits 3

Data wrangling is the process of working with large, messy and diverse datasets to prepare them for data analysis.  Students will learn techniques of uploading and merging source data, diagnosing data issues that require intervention before analysis (missing data, misformatted data, and other unclean data issues such as outliers and corrupt data), fixing data issues (cleaning) and preparing and exporting usable analytical data files. Students will also learn methods of joining fields (such as concatenating text fields), sub setting and filtering data, and large-scale dimensionality reduction techniques (principal components and factor analysis). This class will use spreadsheet and related software tools.

Prerequisites

DATA101N or permission of instructor.