Apache Spark For Java Developers
Posted by Superadmin on November 16 2020 16:44:55

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Welcome

Get started with the amazing Apache Spark parallel computing framework – this course is designed especially for Java Developers. If you’re new to Data Science and want to find out about how massive datasets are processed in parallel, then the Java API for spark is a great way to get started, fast.

All of the fundamentals you need to understand the main operations you can perform in Spark Core, SparkSQL and DataFrames are covered in detail, with easy to follow examples. You’ll be able to follow along with all of the examples, and run them on your own local development computer.

Included with the course is a module covering SparkML, an exciting addition to Spark that allows you to apply Machine Learning models to your Big Data! No mathematical experience is necessary!
And finally, there’s a full 3 hour module covering Spark Streaming, where you will get hands-on experience of integrating Spark with Apache Kafka to handle real-time big data streams. We use both the DStream and the Structured Streaming APIs.
Optionally, if you have an AWS account, you’ll see how to deploy your work to a live EMR (Elastic Map Reduce) hardware cluster. If you’re not familiar with AWS you can skip this video, but it’s still worthwhile to watch rather than following along with the coding.
You’ll be going deep into the internals of Spark and you’ll find out how it optimizes your execution plans. We’ll be comparing the performance of RDDs vs SparkSQL, and you’ll learn about the major performance pitfalls which could save a lot of money for live projects.
Throughout the course, you’ll be getting some great practice with Java 8 Lambdas – a great way to learn functional-style Java if you’re new to it.

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Downloading the Code.html

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Module 1 - Introduction

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3.1 Practicals.zip

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Spark Architecture and RDDs

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Warning - Java 91011 is not supported by Spark.html

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Installing Spark

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Reduces on RDDs

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Mapping Operations

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Outputting Results to the Console

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Counting Big Data Items

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. If you've had a NotSerializableException in Spark

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. RDDs of Objects

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Tuples and RDDs

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Overview of PairRDDs

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Building a PairRDD

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Coding a ReduceByKey

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Using the Fluent API

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. Grouping By Key

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. FlatMaps

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Filters

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Reading from Disk

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Practical Requirements

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Worked Solution

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Worked Solution (continued) with Sorting

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Why do sorts not work with foreach in Spark

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Why Coalesce is the Wrong Solution

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. What is Coalesce used for in Spark

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. How to start an EMR Spark Cluster

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Packing a Spark Jar for EMR

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Running a Spark Job on EMR

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Understanding the Job Progress Output

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. Calculating EMR costs and Terminating the cluster

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Inner Joins

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Left Outer Joins and Optionals

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Right Outer Joins

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Full Joins and Cartesians

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Introducing the Requirements

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1.1 Practical Guide.pdf

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Warmup

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Main Exercise Requirments

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Walkthrough - Step 2

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. Walkthrough - Step 3

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

6. Walkthrough - Step 4

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

7. Walkthrough - Step 5

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

8. Walkthrough - Step 6

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

9. Walkthrough - Step 7

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

10. Walkthrough - Step 8

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

11. Walkthrough - Step 9, adding titles and using the Big Data file

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Transformations and Actions

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. The DAG and SparkUI

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Narrow vs Wide Transformations

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Shuffles

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. Dealing with Key Skews

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

6. Avoiding groupByKey and using map-side-reduces instead

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

7. Caching and Persistence

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Code for SQLDataFrames Section.html

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1.1 biglog.txt

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1.2 Code.zip

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Introducing SparkSQL

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. SparkSQL Getting Started

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Dataset Basics

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Filters using Expressions

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Filters using Lambdas

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Filters using Columns

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Using a Spark Temporary View for SQL

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. In Memory Data

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Groupings and Aggregations

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Date Formatting

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Multiple Groupings

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Ordering

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. SQL vs DataFrames

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. DataFrame Grouping

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. How does a Pivot Table work

11Sj3PqT6d7jbm-H_zcDzb6e65rvEmTpS

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Coding a Pivot Table in Spark

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. How to use the agg method in Spark

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Building a Pivot Table with Multiple Aggregations

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. How to use a Lambda to write a UDF in Spark

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Using more than one input parameter in Spark UDF

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Using a UDF in Spark SQL

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Understand the SparkUI for SparkSQL

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. How does SQL and DataFrame performance compare

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Update - Setting spark.sql.shuffle.partitions

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Explaining Execution Plans

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. How does HashAggregation work

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. How can I force Spark to use HashAggregation

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. SQL vs DataFrames Performance Results

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. SparkSQL Performance vs RDDs

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Introducing Linear Regression

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Welcome to Module 3.html

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1.1 MLCode.zip

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. What is Machine Learning

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Coming up in this Module - and introducing Kaggle

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Supervised vs Unsupervised Learning

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. The Model Building Process

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Introducing Linear Regression

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Beginning Coding Linear Regressions

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Assembling a Vector of Features

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Model Fitting

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Training vs Test and Holdout Data

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Using data from Kaggle

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Practical Walkthrough

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Splitting Training Data with Random Splits

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. Assessing Model Accuracy with R2 and RMSE

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Setting Linear Regression Parameters

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Training, Test and Holdout Data

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Describing the Features

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Correlation of Fetures

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Identifying and Eliminating Duplicated Features

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Data Preparation

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Using OneHotEncoding

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Understanding Vectors

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Pipelines

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Requirements

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Case Study - Walkthrough Part 1

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Case Study - Walkthrough Part 2

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Code for chapters 9-12.html

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1.1 MLCodeChapters9-12.zip

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. TrueFalse Negatives and Postives

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Coding a Logistic Regression

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Overview of Decision Trees

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Building the Model

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Interpreting a Decision Tree

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Random Forests

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. K Means Clustering

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Overview and Matrix Factorisation

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Building the Model

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Welcome to Module 4 - Spark Streaming

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1.1 Code.zip

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Streaming Chapter 1 - Introduction to Streaming

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. DStreams

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3.1 LoggingServer.zip

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Starting a Streaming Job

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. Streaming Transformations

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

6. Streaming Aggregations

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

7. SparkUI for Streaming Jobs

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

8. Windowing Batches

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Overview of Kafka

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Installing Kafka

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Using a Kafka Event Simulator

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3.1 viewing-figures-generation.zip

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Integrating Kafka with Spark

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. Using KafkaUtils to access a DStream

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

6. Writing a Kafka Aggegration

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

7. Adding a Window

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

8. Adding a Slide Interval

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Structured Streaming Overview

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Data Sinks

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Structured Streaming Output Modes

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Windows and Watermarks

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. What is the Batch Size in Structured Streaming

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

6. Kafka Structured Streaming Pipelines

What you’ll learn

Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
Use Machine Learning Algorithms with Big Data and SparkML
Connect Spark to Apache Kafka to process Streams of Big Data.
See how Structured Streaming can be used to build pipelines with Kafka

Requirements

Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
Previous knowledge of Java is assumed, but anything above the basics is explained
Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

Course Contents

01 Introduction

02 Getting Started

03 Reduces on RDDs

1. Reduces on RDDs

04 Mapping and Outputting

05 Tuples

06 PairRDDs

07. FlatMaps and Filters

8. Reading from Disk

1. Reading from Disk

9. Keyword Ranking Practical

10. Sorts and Coalesce

11. Deploying to AWS EMR (Optional)

12. Joins

13. Big Data Big Exercise

14. RDD Performance

15. Module 2 - Chapter 1 SparkSQL Introduction

16. SparkSQL Getting Started

1. SparkSQL Getting Started

17. Datasets

18. The Full SQL Syntax

1. Using a Spark Temporary View for SQL

19. In Memory Data

1. In Memory Data

20. Groupings and Aggregations

1. Groupings and Aggregations

21. Date Formatting

1. Date Formatting

22. Multiple Groupings

1. Multiple Groupings

23. Ordering

1. Ordering

24. DataFrames API

25. Pivot Tables

26. More Aggregations

1. How to use the agg method in Spark

27. Practical Exercise

1. Building a Pivot Table with Multiple Aggregations

28. User Defined Functions

29. SparkSQL Performance

30. HashAggregation

31. SparkSQL Performance vs RDDs

1. SparkSQL Performance vs RDDs

32. Module 3 - SparkML for Machine Learning

33. Linear Regression Models

34. Training Data

35. Model Fitting Parameters

36. Feature Selection

37. Non-Numeric Data

38. Pipelines

1. Pipelines

39. Case Study

40. Logistic Regression

41. Decision Trees

42. K Means Clustering

1. K Means Clustering

43. Recommender Systems

44. Module 4 -Spark Streaming and Structured Streaming with Kafka

45. Streaming Chapter 2 - Streaming with Apache Kafka

46. Streaming Chapter 3- Structured Streaming

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Welcome

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Downloading the Code.html

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Module 1 - Introduction

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3.1 Practicals.zip

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Spark Architecture and RDDs

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Warning - Java 91011 is not supported by Spark.html

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Installing Spark

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Reduces on RDDs

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Mapping Operations

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Outputting Results to the Console

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Counting Big Data Items

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. If you've had a NotSerializableException in Spark

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. RDDs of Objects

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Tuples and RDDs

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Overview of PairRDDs

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Building a PairRDD

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Coding a ReduceByKey

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Using the Fluent API

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. Grouping By Key

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. FlatMaps

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Filters

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Reading from Disk

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Practical Requirements

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Worked Solution

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Worked Solution (continued) with Sorting

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Why do sorts not work with foreach in Spark

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Why Coalesce is the Wrong Solution

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. What is Coalesce used for in Spark

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. How to start an EMR Spark Cluster

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Packing a Spark Jar for EMR

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Running a Spark Job on EMR

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Understanding the Job Progress Output

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

5. Calculating EMR costs and Terminating the cluster

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Inner Joins

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Left Outer Joins and Optionals

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

3. Right Outer Joins

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

4. Full Joins and Cartesians

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1. Introducing the Requirements

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

1.1 Practical Guide.pdf

Apache Spark For Java Developerswith Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

2. Warmup

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers

Apache Spark For Java Developers
with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers