Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka
Posted by Superadmin on November 15 2020 15:53:54

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


01_01-Welcome



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


01_02-What you should know



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


01_03-Using the exercise files



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


02_01-Modern Hadoop



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


02_02-File system used with Hadoop



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


02_03-Apache and commerical Hadoop distributions



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


02_04-Hadoop libraries



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


02_05-Hadoop on Google Cloud Platform



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


02_06-Run Hadoop job on GCP



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


02_07-Databricks on AWS



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


03_01-Set up IDE VS Code Python extension



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


03_02-Sign up for Databricks community edition



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


03_03-Add Hadoop libraries to your test environment



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


03_04-Your first cluster on Databricks Community Edition



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


03_05-Load data into tables



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


04_01-Processing options



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


04_02-Prerequisite understanding



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


04_03-Resource coordinators



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


04_04-Compare YARN vs. Standalone



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


05_01-Fast Hadoop use cases



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


05_02-Big data streaming



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


05_03-Streaming options



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


05_04-Apache Spark basics



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


05_05-Spark use cases



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


06_01-Apache Spark libraries



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


06_02-Spark data interfaces



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


06_03-Select your programming language



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


06_04-Spark session objects



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


06_05-Spark shell



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


07_01-Tour the DataBricks Environment



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


07_02-Tour the notebook



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


07_03-Import and export notebooks



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


07_04-Calculate pi on Spark



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


07_05-Run wordcount of Spark with Scala



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


07_06-Understand wordcount on Spark with Python



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


07_07-Import data



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


07_08-Transformations and actions



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


07_09-Caching and the DAG



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


08_01-Spark SQL



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


08_02-SparkR



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


08_03-Spark ML_ Preparing data



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


08_04-Spark ML_ Building the model



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


08_05-Spark ML_ Evaluating the model



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


08_06-Advanced machine learning on Spark



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


08_07-MXNet or TensorFlow



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


08_08-Spark with GraphX



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


08_09-Spark with ADAM for genomics



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


09_01-Reexamine streaming pipelines



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


09_02-Spark streaming



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


09_03-Streaming ingest services



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


09_04-Advanced Spark streaming with MLeap



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


10_01-PubSub on GCP



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


10_02-Apache Kafka



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


10_03-Kafka architecture



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


10_04-Apache Storm



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


10_05-Storm architecture



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


11_01-Combine Hadoop libraries and more



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


11_02-Review batch architecture for ETL



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


11_03-Spark architecture for interactive analytics



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


11_04-Spark architecture for genomics



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


11_05-Spark Streaming architecture for IoT



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


11_06-Spark Streaming architecture for dynamic prediction



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


12_01-Next steps



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

with Lynn Langit


Ex_Files_Extending_Hadoop



Extend your Hadoop data science knowledge by learning how to use other Apache data science platforms, libraries, and tools. This course goes beyond the basics of Hadoop MapReduce, into other key Apache libraries to bring flexibility to your Hadoop clusters. Coverage of core Spark, SparkSQL, SparkR, and SparkML is included. Learn how to scale and visualize your data with interactive Databricks clusters and notebooks and other implementations. This course is designed to help those working data science, development, or analytics get familiar with attendant technologies.

Topics Include:
  • Relate which file system is typically used with Hadoop.
  • Explain the differences between Apache and commercial Hadoop distributions
  • Cite how to set up IDE - VS Code + Python extension
  • Relate the value of Databricks community edition.
  • Compare YARN vs. Standalone.
  • Review various streaming options.
  • Recall how to select your programming language.
  • Describe the Databricks environment.

      
Course Contents
01. Introduction 02. Hadoop Core Fundamentals 03. Setting Up a Hadoop Dev Environment 04. Hadoop Batch Processing 05. Fast Hadoop Options 06. Spark Basics 07. Using Spark 08. Spark Libraries 09. Spark Streaming 10. Hadoop Streaming 11. Modern Hadoop Architectures 12. Conclusion Exercice Files