505 Amherst St,

Nashua, NH 03063

P. 603 578-8900

E. nashua@ccsnh.edu

© 2022 Nashua Community College

Data Science

This course is an introduction to the tools and techniques used in data analytics as well as an introduction to the profession of data analytics. Topics covered include descriptive statistics, structures and types of data, data visualization, data mining, and legal and ethical issues. The use of a statistical programming package and basic statistical programming (in R or Python) is introduced.

Data mining is the process of examining data for patterns that provide useful and valuable information from the data. This course will provide background study for data mining and hands on skills practice in data mining techniques. Topics include basic probability background, types of data, preprocessing data, cluster analysis, classification methods and association analysis. Work will be performed in a Python based visual programming environment (not requiring coding or programming background). The connection between data mining techniques and machine learning and artificial intelligence will be examined. An important component of this course is the completion of an applied project and communicating results to others.

In this course students will apply basic statistical and data mining techniques to work with large scale datasets. Structured and unstructured data will be used. Case studies of large-scale analysis in practice will be reviewed. Topics include summarizing large datasets, analyzing text data, methods of modelling with data, and analyzing data over time. Students use a statistical programming language.

In this course students will learn to apply design principles and techniques of effectively visualizing data. Students will develop an understanding of how visual representations are used in the analysis of complex real-world data. Class projects will require students to practice creating and presenting interactive visualizations. A current data visualization tool such as Tableau will be utilized.

Data wrangling is the process of working with large, messy and diverse datasets to prepare them for data analysis. Students will learn techniques of uploading and merging source data, diagnosing data issues that require intervention before analysis (missing data, misformatted data, and other unclean data issues such as outliers and corrupt data), fixing data issues (cleaning) and preparing and exporting usable analytical data files. Students will also learn methods of joining fields (such as concatenating text fields), sub setting and filtering data, and large-scale dimensionality reduction

techniques (principal components and factor analysis).The posit integrated development environment using elements of R, Python and SQL languages in synchrony will be used.

This course will provide a basic summary of skills that include: Machine learning concepts, techniques and procedures. Both supervised and unsupervised machine learning will be discussed. Students will develop an understanding of how machine learning is an integral part of data analytics. They will explore how machine learning helps to develop data-driven decisions and gives computers the ability to learn without being explicitly programmed. Previous programming experience is highly recommended.