Become a Data Scientist
Posted by Superadmin on January 22 2019 03:00:18

Become a Data Scientist

 

 

Whether you're working in IT or simply have an interest in entering the exciting field, this learning path will support you in developing a career in data science. Learn about the fundamental stages of data science work, from Statistics and Systems Engineering to Data Mining and Machine Learning.
Build a solid foundational understanding of statistics, which is necessary for any data science-related field.
Discover the many categories of job specialization within Data Science.
Learn how to source, explore, and communicate with data through graphs and statistics.

 

 

 

 

01

Bracketology Club: Using March Madness to Learn Data Science with Brian Tonsoni

12m 7s • COURSE
When one pictures the group who bested over 100 sports experts to win the 2016 Bracket Matrix—an online March Madness bracket competition—a classroom full of small-town high school students might not be the first image that comes to mind. It certainly wasn't what social studies teacher Brian Tonsoni expected when started Delphi Bracketology—a high school club that uses data science to predict which teams the NCAA will select for the Division I Men's Basketball Tournament. But what began as an informal gathering of sports fans soon grew into a collection of champions. In this short film, meet some of the members of this remarkable team, and learn how Tonsoni's informal, project-based approach to learning helped these young bracketologists acquire the kinds of key skills—data science, public speaking, and more—that every teacher hopes to instill in their students. 
02

Data Science & Analytics Career Paths & Certifications: First Steps with Jungwoo Ryoo

1h 12m • COURSE
The career opportunities in data science, big data, and data analytics are growing dramatically. If you're interested in changing career paths, determining the right course of study, or deciding if certification is worth your time, this course is for you.

Jungwoo Ryoo is a professor of information science and technology at Penn State. Here he reviews the history of data science and its subfields, explores the marketplaces for these fields, and reveals the five main skills areas: data mining, machine learning, natural language processing (NLP), statistics, and visualization. This leads to a discussion of the five biggest career opportunities, the six leading industry-recognized certifications available, and the most exciting emerging technologies. Along the way, Jungwoo discusses the importance of ethics and professional development, and provides pointers to online resources for learning more.
Topics include:
  • A history of data science
  • Why data analytics is important
  • How data science is used in fraud detection, disease control, network security, and other fields
  • Data science skills
  • Data science roles
  • Data science certifications
  • The future of data science
03

Data Science Foundations: Fundamentals with Barton Poulson

3h 6m • COURSE
Introduction to Data Science provides a comprehensive overview of modern data science: the practice of obtaining, exploring, modeling, and interpreting data. While most only think of the "big subject," big data, there are many more fields and concepts to explore. Here Barton Poulson explores disciplines such as programming, statistics, mathematics, machine learning, data analysis, visualization, and (yes) big data. He explains why data scientists are now in such demand, and the skills required to succeed in different jobs. He shows how to obtain data from legitimate open-source repositories via web APIs and page scraping, and introduces specific technologies (R, Python, and SQL) and techniques (support vector machines and random forests) for analysis. By the end of the course, you should better understand data science's role in making meaningful insights from the complex and large sets of data all around us.
Topics include:
  • Assess the skills required for a career in data science.
  • Evaluate different sources of data, including metrics and APIs.
  • Explore data through graphs and statistics.
  • Discover how data scientists use programming languages such as R, Python, and SQL.
  • Assess the role of mathematics, such as algebra, in data science.
  • Assess the role of applied statistics, such as confidence intervals, in data science.
  • Assess the role of machine learning, such as artificial neural networks, in data science.
  • Define the components of effective data visualization.
04

Statistics Foundations: 1 with Eddie Davila

2h 6m • COURSE
Statistics is not just the realm of data scientists. All types of jobs use statistics. Statistics are important for making decisions, new discoveries, investments, and predictions. Whether the subject is political races, sports rankings, shopping trends, or healthcare advancements, statistics is an instrument for understanding your favorite topic at a deeper level. With these beginner-level lessons, you too can master the terms, formulas, and techniques needed to perform the most common types of statistics.

Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
Topics include:
  • Why statistics matter
  • Evaluating your data sets
  • Finding means, medians, and modes
  • Calculating standard deviation
  • Measuring distribution and relative position
  • Understanding probability and multiple-event probability
  • Describing permutations: the order of things
  • Calculating discrete and continuous probability distributions
05

Learning Data Governance with Jonathan Reichental

41m 4s • COURSE
In the era of big data and data science, most businesses and institutions realize the power of data. Yet far too many fail to appreciate the legal and fiscal responsibilities and liabilities associated with it. The stakes are high, but a well-rounded data governance process can help ensure the consistent quality, availability, integrity, and usability of your data.

Here Dr. Jonathan Reichental explains how to begin to implement a data governance program within any organization. Learn the components of data governance, its strategic value, the roles and responsibilities of stakeholders, and the overall steps that an organization needs to take to manage, monitor, and measure the program. Plus, get guidance on a set of next steps for building skills. As the data science domain grows, so does the demand for data governance expertise. Start here for your first look at this in-demand skill.
Topics include:
  • What is data governance?
  • Why do organizations need data governance?
  • Who owns the data?
  • Designing the data governance process
  • Managing, maintaining, monitoring, and measuring your program
06

Data Science Foundations: Data Mining with Barton Poulson

4h 40m • COURSE
All data science begins with good data. Data mining is a framework for collecting, searching, and filtering raw data in a systematic matter, ensuring you have clean data from the start. It also helps you parse large data sets, and get at the most meaningful, useful information. This course, Data Science Foundations: Data Mining, is designed to provide a solid point of entry to all the tools, techniques, and tactical thinking behind data mining.

Barton Poulson covers data sources and types, the languages and software used in data mining (including R and Python), and specific task-based lessons that help you practice the most common data-mining techniques: text mining, data clustering, association analysis, and more. This course is an absolute necessity for those interested in joining the data science workforce, and for those who need to obtain more experience in data mining.
Topics include:
  • Prerequisites for data mining
  • Data mining using R, Python, Orange, and RapidMiner
  • Data reduction
  • Data clustering
  • Anomaly detection
  • Association analysis
  • Regression analysis
  • Sequence mining
  • Text mining

 

07

Excel 2016: Managing and Analyzing Data with Dennis Taylor

3h • COURSE
Large amounts of data can become unmanageable fast. But with the data management and analysis features in Excel 2016, you can keep the largest spreadsheets under control. In this course, Dennis Taylor shares easy-to-use commands, features, and functions for maintaining large lists of data in Excel. He covers sorting, adding subtotals, filtering, eliminating duplicate data, and using Excel's Advanced Filter feature and specialized database functions to isolate and analyze data. With these techniques, you'll be able to extract the most important information from your data, in the shortest amount of time.
Topics include:
  • Prepping data for analysis
  • Multiple-key sorting
  • Sorting by rows or by columns
  • Setting single- and multi-level subtotals
  • Using text, numeric, and date filters
  • Creating custom filters
  • Filtering tables using slicers
  • Using Advanced Filter
  • Eliminating duplicate data
  • Using SUMIF and COUNTIF functions for quick data analysis
  • Working with the database functions such as DSUM and DMAX
08

Data Visualization: Storytelling with Bill Shander

1h 37m • COURSE
We are wired for story. We crave it. Storytelling has played an integral role in our ability to make progress. It should come as no surprise, then, that presenting data and information in story form maximizes the effectiveness of our communication. We can create deeper emotional responses in our audience when we present data in story form.

Join data visualization expert Bill Shander as he guides you through the process of turning "facts and figures" into "story" to engage and fulfill our human expectation for information. This course is intended for anyone who works with data and has to communicate it to others, whether a researcher, a data analyst, a consultant, a marketer, or a journalist. Bill shows you how to think about, and craft, stories from data by examining many compelling stories in detail.
Topics include:
  • Creating a narrative structure for data
  • Applying narrative to data
  • Identifying what you want to say with the data
  • Analyzing what your data is saying
  • Determining what your audience needs to hear
  • Leveraging tables, charts, and visuals
  • Ensuring your narrative provides context and direction

 

 

 

 

 

 

Bracketology Club: Using March Madness to Learn Data Science with Brian Tonsoni

12m 7s • COURSE
When one pictures the group who bested over 100 sports experts to win the 2016 Bracket Matrix—an online March Madness bracket competition—a classroom full of small-town high school students might not be the first image that comes to mind. It certainly wasn't what social studies teacher Brian Tonsoni expected when started Delphi Bracketology—a high school club that uses data science to predict which teams the NCAA will select for the Division I Men's Basketball Tournament. But what began as an informal gathering of sports fans soon grew into a collection of champions. In this short film, meet some of the members of this remarkable team, and learn how Tonsoni's informal, project-based approach to learning helped these young bracketologists acquire the kinds of key skills—data science, public speaking, and more—that every teacher hopes to instill in their students. 

Bracketology Club; Using March Madness to Learn Data Science



Data Science & Analytics Career Paths & Certifications: First Steps with Jungwoo Ryoo

1h 12m • COURSE
The career opportunities in data science, big data, and data analytics are growing dramatically. If you're interested in changing career paths, determining the right course of study, or deciding if certification is worth your time, this course is for you.

Jungwoo Ryoo is a professor of information science and technology at Penn State. Here he reviews the history of data science and its subfields, explores the marketplaces for these fields, and reveals the five main skills areas: data mining, machine learning, natural language processing (NLP), statistics, and visualization. This leads to a discussion of the five biggest career opportunities, the six leading industry-recognized certifications available, and the most exciting emerging technologies. Along the way, Jungwoo discusses the importance of ethics and professional development, and provides pointers to online resources for learning more.
Topics include:
  • A history of data science
  • Why data analytics is important
  • How data science is used in fraud detection, disease control, network security, and other fields
  • Data science skills
  • Data science roles
  • Data science certifications
  • The future of data science

0. Introduction



01_welcome
02_WhoShould



1. Define Data Science



01_intro
02_BriefHistory
03_Concepts
04_BigDataAnalytics
05_EnablingTechnologies



2. Marketplace



01_marketplace
02_FraudDetection
03_SocialMediaAnalytics
04_DiseaseControl_2017Q4
05_DatingServices
06_Simulations
07_ClimateResearch
08_NetworkSecurity



3. Skills



01_skills
02_DataMiningAndAnalytics
03_MachineLearning
04_NLP
05_Statistics
06_Visualization



4. Roles



01_roles
02_DataScientist
03_BusinessIntelligenceArchitect
04_MachineLearningScientist
05_BusinessAnalyticsSpecialist
06_DataVisualizationDeveloper
06a_salaries



5. Certifications



01_certifications
02_MCSE_BI
03_CCP_DataScientist
04_EMC_DataScienceAssociate
05_Oracle_BusinessIntelligenceCertificate
05a_SAS
05b_CAP



6. Future of Data Science



01_future
02_EmergingTechnologies
03_EmergingCareers
04_Ethics
05_ProfessionalDevelopment



7. Conclusion



01_nextsteps



Data Science Foundations: Fundamentals with Barton Poulson

3h 6m • COURSE
Introduction to Data Science provides a comprehensive overview of modern data science: the practice of obtaining, exploring, modeling, and interpreting data. While most only think of the "big subject," big data, there are many more fields and concepts to explore. Here Barton Poulson explores disciplines such as programming, statistics, mathematics, machine learning, data analysis, visualization, and (yes) big data. He explains why data scientists are now in such demand, and the skills required to succeed in different jobs. He shows how to obtain data from legitimate open-source repositories via web APIs and page scraping, and introduces specific technologies (R, Python, and SQL) and techniques (support vector machines and random forests) for analysis. By the end of the course, you should better understand data science's role in making meaningful insights from the complex and large sets of data all around us
Topics include:
  • Assess the skills required for a career in data science.
  • Evaluate different sources of data, including metrics and APIs.
  • Explore data through graphs and statistics.
  • Discover how data scientists use programming languages such as R, Python, and SQL.
  • Assess the role of mathematics, such as algebra, in data science.
  • Assess the role of applied statistics, such as confidence intervals, in data science.
  • Assess the role of machine learning, such as artificial neural networks, in data science.
  • Define the components of effective data visualization.

00. Introduction



00_01 - Welcome
00_02 Exercise Files
00_03 What you need to know
00_04 Using knowledge checks



01 What is Data Science



01_01 Demand
01_02 Venn diagram
01_03 Pipeline
01_04 Roles
01_05 Team
01_06 Knowledge check: What is data science



02 Field of study



02_01 Big Data
02_02 Programming
02_03 Statistics
02_04 Knowledge check: Fields of study



03 Ethics



03_01 Ethical issues
03_02 Knowledge check: Ethics



04 Data Sources



04_01 Metrics
04_02 Existing data
04_03 APIs
04_04 Scraping
04_05 Creating data
04_06 Knowledge check: Data sources



05 Data Exploration



05_01 Exploratory graphs
05_02 Exploratory statistics
05_03 Knowledge check: Data exploration



06 Programming



06_01 Spreadsheets
06_02 R
06_03 Python
06_04 SQL
06_05 Web formats
06_06 Knowledge check: Programming



07 Mathematics



07_01 Algebra
07_02 Systems of equations
07_03 Calculus
07_04 Big O
07_05 Bayes probability
07_06 Knowledge check: Mathematics



08 Applied Statistics



08_01 Hypothesis
08_02 Confidence
08_03 Problems
08_04 Validating
08_05 Knowledge check: Applied statistics



`

09 Machine Learning



09_01 Decision trees
09_02 Ensembles
09_03 k-nearest neighbors (kNN)
09_04 Naive Bayes classifiers
09_05 Artificial neural networks
09_06 Knowledge check: Machine learning



10 Communicating



10_01 Interpretability
10_02 Actionable insights
10_03 Visualization for presentation
10_04 Reproducible research
10_05 Knowledge check: Communicating



11 Conclusion



11_01 Next steps



Statistics Foundations: 1 with Eddie Davila

2h 6m • COURSE
Statistics is not just the realm of data scientists. All types of jobs use statistics. Statistics are important for making decisions, new discoveries, investments, and predictions. Whether the subject is political races, sports rankings, shopping trends, or healthcare advancements, statistics is an instrument for understanding your favorite topic at a deeper level. With these beginner-level lessons, you too can master the terms, formulas, and techniques needed to perform the most common types of statistics.

Professor Eddie Davila covers statistics basics, like calculating averages, medians, modes, and standard deviations. He shows how to use probability and distribution curves to inform decisions, and how to detect false positives and misleading data. Each concept is covered in simple language, with detailed examples that show how statistics are used in real-world scenarios from the worlds of business, sports, education, entertainment, and more. These techniques will help you understand your data, prove theories, and save time, money, and other valuable resources—all by understanding the numbers.
Topics include:
  • Why statistics matter
  • Evaluating your data sets
  • Finding means, medians, and modes
  • Calculating standard deviation
  • Measuring distribution and relative position
  • Understanding probability and multiple-event probability
  • Describing permutations: the order of things
  • Calculating discrete and continuous probability distributions

00. Introduction



00_01_Welcome
00_02_Before
00_03_Exercise



01. The World of Statistics



01_01_Statistics
01_02_Data
01_03_Chart



02. The Center of the Data



02_01_Middle
02_02_Median
02_03_Weighted
02_04_Mode



03. Data Variability



03_01_Range
03_02_Standard
03_03_Deviations
03_04_Outliers



04. Distribution and Relative Position



04_01_Zscore
04_02_Empirical
04_03_Percentiles



05. Probability Explained



05_01_Probability
05_02_Examples
05_03_Types



06. Multiple Event Probability



06_01_TwoEvents
06_02_Conditional
06_03_Independence
06_04_False
06_05_Bayes



07. How Objects Are Arranged



07_01_Permutations
07_02_Combinations



08. Discrete vs. Continuous Probability Distributions



08_01_Discrete



09. Discrete Probability Distributions



09_01_Meandiscrete
09_02_Monetary
09_03_Binomial



10. Continuous Probability Distributions



10_01_Densities
10_02_Bell
10_03_FuzzyCentral
10_04_ZTransform



11. Conclusion



11_01_Conclusion



Learning Data Governance with Jonathan Reichental

41m 4s • COURSE
In the era of big data and data science, most businesses and institutions realize the power of data. Yet far too many fail to appreciate the legal and fiscal responsibilities and liabilities associated with it. The stakes are high, but a well-rounded data governance process can help ensure the consistent quality, availability, integrity, and usability of your data.

Here Dr. Jonathan Reichental explains how to begin to implement a data governance program within any organization. Learn the components of data governance, its strategic value, the roles and responsibilities of stakeholders, and the overall steps that an organization needs to take to manage, monitor, and measure the program. Plus, get guidance on a set of next steps for building skills. As the data science domain grows, so does the demand for data governance expertise. Start here for your first look at this in-demand skill.
Topics include:
  • What is data governance?
  • Why do organizations need data governance?
  • Who owns the data?
  • Designing the data governance process
  • Managing, maintaining, monitoring, and measuring your program

0. Introduction



00_01_WL30_Welcome



1. What is Data Governance



01_01_Role
01_02_Basics
01_03_Principles
01_04_Focus
01_05_Focus



2. Data Governance Deployment



02_01_Who
02_02_Understanding
02_03_Designing



3. Managing a Data Governance Program



03_01_Managing
03_02_Monitoring
04_01_Summary



Data Science Foundations: Data Mining with Barton Poulson

4h 40m • COURSE
All data science begins with good data. Data mining is a framework for collecting, searching, and filtering raw data in a systematic matter, ensuring you have clean data from the start. It also helps you parse large data sets, and get at the most meaningful, useful information. This course, Data Science Foundations: Data Mining, is designed to provide a solid point of entry to all the tools, techniques, and tactical thinking behind data mining.

Barton Poulson covers data sources and types, the languages and software used in data mining (including R and Python), and specific task-based lessons that help you practice the most common data-mining techniques: text mining, data clustering, association analysis, and more. This course is an absolute necessity for those interested in joining the data science workforce, and for those who need to obtain more experience in data mining.
Topics include:
  • Prerequisites for data mining
  • Data mining using R, Python, Orange, and RapidMiner
  • Data reduction
  • Data clustering
  • Anomaly detection
  • Association analysis
  • Regression analysis
  • Sequence mining
  • Text mining

 

0. Introduction



00_01_Welcome
00_02 Who should watch this course
00_03 Exercise files



1. Preliminaries



01_01_Data mining prerequisites
01_02_Algorithm prerequisites
01_03_Software prerequisites



2. Data Reduction



02_01_Goals of data reduction
02_02_Data for data reduction
02_03_Data reduction in R
02_04 Data reduction in Python
02_05_Data reduction in Orange
02_06_Data reduction in RapidMiner



3. Clustering



03_01_Clustering goals
03_02_Clustering data
03_03_Clustering in R
03_04_Clustering in Python
03_05_Clustering in BigML
03_06_Clustering in Orange



4. Classification



04_01_Classification goals
04_02_Classification data
04_03_Classification in R
04_04_ Classification in Python
04_05_Classification in RapidMiner
04_06_Classification in KNIME



5. Anomaly Detection



05_01_Anomaly detection goals
05_02_Anomaly detection data
05_03_Anomaly detection in R
05_04_ Anomaly detection in Python
05_05_Anomaly detection in BigML
05_06_Anomaly detection in RapidMiner



6. Association Analysis



06_01_Association analysis goals
06_02_Association analysis data
06_03_Association analysis in R
06_04_Association analysis in Python
06_05_Association analysis in Orange
06_06_Association analysis in KNIME



7. Regression Analysis



07_01_Regression analysis goals
07_02_Regression analysis data
07_03_Regression analysis in R
07_04_Regression analysis in Python
07_05_Regression analysis in KNIME
07_06_Regression analysis in RapidMiner



8. Sequential Patterns



08_01_Sequence mining goals
08_02_Sequence mining algorithms
08_03_Sequence mining in R
08_04_Sequence mining in Python
08_05_Sequence mining in BigML: Part 1
08_06_Sequence mining in BigML: Part 2



9. Text Mining



03_01_Text mining goals
03_02_Text mining algorithms
03_03_Text mining in R
03_04_Text mining in Python
03_05_Text mining in RapidMiner



Conclusion



Next steps



Excel 2016: Managing and Analyzing Data with Dennis Taylor

3h • COURSE
Large amounts of data can become unmanageable fast. But with the data management and analysis features in Excel 2016, you can keep the largest spreadsheets under control. In this course, Dennis Taylor shares easy-to-use commands, features, and functions for maintaining large lists of data in Excel. He covers sorting, adding subtotals, filtering, eliminating duplicate data, and using Excel's Advanced Filter feature and specialized database functions to isolate and analyze data. With these techniques, you'll be able to extract the most important information from your data, in the shortest amount of time.
Topics include:
  • Prepping data for analysis
  • Multiple-key sorting
  • Sorting by rows or by columns
  • Setting single- and multi-level subtotals
  • Using text, numeric, and date filters
  • Creating custom filters
  • Filtering tables using slicers
  • Using Advanced Filter
  • Eliminating duplicate data
  • Using SUMIF and COUNTIF functions for quick data analysis
  • Working with the database functions such as DSUM and DMAX

0. Introduction



00_01_Welcome
00_02 - Exercise files
01_01 - Structure data for optimum usage



2.1. Sorting Data



02.Sort concepts and Sort menu options
03.Multiple-key sorting
04.Sort from AZ and ZA menu icons
05.Sort based on data order in custom lists
06.Sort by background color or font color
07.Sort left-to-right columns
08.Sort data in random order



3.2. Filtering Data



09.Filter single- and multiple-column text
10.Numeric filters
11.Date filters
12.Text filters
13.Top 10 (value or percent) option
14.Create custom filters
15.Copy and sort filtered lists
16.Recognize standard filtering limitations



Data Visualization: Storytelling with Bill Shander

1h 37m • COURSE
We are wired for story. We crave it. Storytelling has played an integral role in our ability to make progress. It should come as no surprise, then, that presenting data and information in story form maximizes the effectiveness of our communication. We can create deeper emotional responses in our audience when we present data in story form.

Join data visualization expert Bill Shander as he guides you through the process of turning "facts and figures" into "story" to engage and fulfill our human expectation for information. This course is intended for anyone who works with data and has to communicate it to others, whether a researcher, a data analyst, a consultant, a marketer, or a journalist. Bill shows you how to think about, and craft, stories from data by examining many compelling stories in detail.
Topics include:
  • Creating a narrative structure for data
  • Applying narrative to data
  • Identifying what you want to say with the data
  • Analyzing what your data is saying
  • Determining what your audience needs to hear
  • Leveraging tables, charts, and visuals
  • Ensuring your narrative provides context and direction

1.Introduction



1. Welcome
2. Need to know
3. Using the exercise files
4. Knowledge checks



2. Why Storytelling



1. Wired for story
2. Storytelling is essential
3. Use story even when you dont
4. Knowledge check



3. Story Structure



1. KWYRWTS
2. Story structure
3. Find the story in your data
4. Sketch and storyboard
5. Knowledge check



4. Story Mechanisms



1. Linear logic
2. Change over time
3. Flow diagrams
4. Compare and contrast
5. Progressive depth
6. Personalization
7. Text
8. Knowledge check



5 Final Touches



01 Labeling
Eye candy
Repetition
Relatability
Complexity
Knowledge check



6.Conclusion



01. Next steps
Ex_Files_Excel2016_Mac_Data.zip