Data Wrangling in R
Posted by Superadmin on February 09 2019 05:07:09

Data Wrangling in R

 

Tidy data is a data format that provides a standardized way of organizing data values within a dataset. By leveraging tidy data principles, statisticians, analysts, and data scientists can spend less time cleaning data and more time tackling the more compelling aspects of data analysis. In this course, learn about the principles of tidy data, and discover how to create and manipulate data tibbles—transforming them from source data into tidy formats. Instructor Mike Chapple uses the R programming language and the tidyverse packages to teach the concept of data wrangling—the data cleaning and data transformation tasks that consume a substantial portion of analysts' time. He wraps up with three hands-on case studies that help to reinforce the data wrangling principles and tactics covered in this course.

Topics include:

 

 

 

 

00. Introduction



001 Welcome
002 What you need to know
003 Using the exercise files



1. Tidy Data



004 What is tidy data_
005 Variables, observations, and values
006 Common data problems
007 Using the tidyverse



02. Working with tibbles



008 Building and printing tibbles
009 Subsetting tibbles
010 Filtering tibbles



03. Importing Data into R



011 What are CSV files_
012 Importing CSV files into R
013 What are TSV files_
014 Importing TSV files into R
015 Importing delimited files into R
016 Importing fixed-width files into R
017 Importing Excel files into R
018 Reading data from databases and the web



04. Data Transformation



019 Wide vs. long datasets
020 Making wide datasets long with gather()
021 Making long datasets wide with spread()
022 Converting data types in R
023 Working with dates and times in R



05. Data Cleaning



024 Detecting outliers
025 Missing and special values in R
026 Breaking apart columns with separate()
027 Combining columns with unite()
028 Manipulating strings in R with stringr



06. Data Wrangling Case Study : Coal Consumption



029 Understanding the coal dataset
030 Reading in the coal dataset
031 Converting the coal dataset from long to wide
032 Segmenting the coal dataset
033 Visualizing the coal dataset



07. Data Wrangling Case Study - Water Quality



034 Understanding the water quality dataset
035 Reading in the water quality dataset
036 Filtering the water quality dataset
037 Water quality data types
038 Correcting data entry errors
039 Identifying and removing outliers
040 Converting temperature from Fahrenheit to Celsius
041 Widening the water quality dataset



08. Data Wrangling Case Study : Social Security Disability Claims



042 Understanding the Social Security Disability dataset
043 Importing the Social Security Disability Data Set
044 Making the Social Security Disability dataset long
045 Formatting dates in the Social Security Disability dataset
046 Handling fiscal years in the Social Security Disability dataset
047 Widening the Social Security Disability dataset
048 Visualizing the Social Security Disability dataset
049 Next steps