Users Online

· Guests Online: 7

· Members Online: 0

· Total Members: 188
· Newest Member: meenachowdary055

Forum Threads

Newest Threads
No Threads created
Hottest Threads
No Threads created

Latest Articles

Apache Spark For Java Developers

Apache Spark For Java Developers

with Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers


4. Shuffles



Get started with the amazing Apache Spark parallel computing framework – this course is designed especially for Java Developers. If you’re new to Data Science and want to find out about how massive datasets are processed in parallel, then the Java API for spark is a great way to get started, fast.

All of the fundamentals you need to understand the main operations you can perform in Spark Core, SparkSQL and DataFrames are covered in detail, with easy to follow examples. You’ll be able to follow along with all of the examples, and run them on your own local development computer.

Included with the course is a module covering SparkML, an exciting addition to Spark that allows you to apply Machine Learning models to your Big Data! No mathematical experience is necessary!
And finally, there’s a full 3 hour module covering Spark Streaming, where you will get hands-on experience of integrating Spark with Apache Kafka to handle real-time big data streams. We use both the DStream and the Structured Streaming APIs.
Optionally, if you have an AWS account, you’ll see how to deploy your work to a live EMR (Elastic Map Reduce) hardware cluster. If you’re not familiar with AWS you can skip this video, but it’s still worthwhile to watch rather than following along with the coding.
You’ll be going deep into the internals of Spark and you’ll find out how it optimizes your execution plans. We’ll be comparing the performance of RDDs vs SparkSQL, and you’ll learn about the major performance pitfalls which could save a lot of money for live projects.
Throughout the course, you’ll be getting some great practice with Java 8 Lambdas – a great way to learn functional-style Java if you’re new to it.

What you’ll learn
  • Use functional style Java to define complex data processing jobs
  • Learn the differences between the RDD and DataFrame APIs
  • Use an SQL style syntax to produce reports against Big Data sets
  • Use Machine Learning Algorithms with Big Data and SparkML
  • Connect Spark to Apache Kafka to process Streams of Big Data.
  • See how Structured Streaming can be used to build pipelines with Kafka
Requirements
  • Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
  • Previous knowledge of Java is assumed, but anything above the basics is explained
  • Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience

      
Course Contents
01 Introduction 02 Getting Started 03 Reduces on RDDs 04 Mapping and Outputting 05 Tuples 06 PairRDDs 07. FlatMaps and Filters 8. Reading from Disk 9. Keyword Ranking Practical 10. Sorts and Coalesce 11. Deploying to AWS EMR (Optional) 12. Joins 13. Big Data Big Exercise 14. RDD Performance 15. Module 2 - Chapter 1 SparkSQL Introduction 16. SparkSQL Getting Started 17. Datasets 18. The Full SQL Syntax 19. In Memory Data 20. Groupings and Aggregations 21. Date Formatting 22. Multiple Groupings 23. Ordering 24. DataFrames API 25. Pivot Tables 26. More Aggregations 27. Practical Exercise 28. User Defined Functions 29. SparkSQL Performance 30. HashAggregation 31. SparkSQL Performance vs RDDs 32. Module 3 - SparkML for Machine Learning 33. Linear Regression Models 34. Training Data 35. Model Fitting Parameters 36. Feature Selection 37. Non-Numeric Data 38. Pipelines 39. Case Study 40. Logistic Regression 41. Decision Trees 42. K Means Clustering 43. Recommender Systems 44. Module 4 -Spark Streaming and Structured Streaming with Kafka 45. Streaming Chapter 2 - Streaming with Apache Kafka 46. Streaming Chapter 3- Structured Streaming

Comments

No Comments have been Posted.

Post Comment

Please Login to Post a Comment.

Ratings

Rating is available to Members only.

Please login or register to vote.

No Ratings have been Posted.
Render time: 0.87 seconds
10,922,811 unique visits