Users Online

· Guests Online: 151

· Members Online: 0

· Total Members: 188
· Newest Member: meenachowdary055

Forum Threads

Newest Threads
No Threads created
Hottest Threads
No Threads created

Latest Articles

Apache Spark MCQ 1====

21.       How do you define RDD?

Ans: A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. It represents an immutable, partitioned collection of elements that can be operated on in parallel. Resilient Distributed Datasets (RDDs) are a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner.

 

·         Resilient: Fault-tolerant and so able to recomputed missing or damaged partitions on node failures with the help of RDD lineage graph.

·         Distributed: across clusters.

·         Dataset: is a collection of partitioned data.

 

22.       What is Lazy evaluated RDD mean?

Ans: Lazy evaluated, i.e. the data inside RDD is not available or transformed until an action is executed that triggers the execution.

 

23.       How would you control the number of partitions of a RDD?

Ans You can control the number of partitions of a RDD using repartition or coalesce operations.

 

24.       What are the possible operations on RDD

Ans: RDDs support two kinds of operations:

·         transformations - lazy operations that return another RDD.

·         actions - operations that trigger computation and return values.

 

25.       How RDD helps parallel job processing?

Ans: Spark does jobs in parallel, and RDDs are split into partitions to be processed and written in parallel. Inside a partition, data is processed sequentially.

 

26.       What is the transformation?

Ans: A transformation is a lazy operation on a RDD that returns another RDD, like map , flatMap , filter , reduceByKey , join , cogroup , etc. Transformations are lazy and are not executed immediately, but only after an action have been executed.

 

27.       How do you define actions?

Ans: An action is an operation that triggers execution of RDD transformations and returns a value (to a Spark driver - the user program). They trigger execution of RDD transformations to return values. Simply put, an action evaluates the RDD lineage graph.

 

You can think of actions as a valve and until no action is fired, the data to be processed is not even in the pipes, i.e. transformations. Only actions can materialize the entire processing pipeline with real data.

 

 

28.       How can you create an RDD for a text file?

Ans: SparkContext.textFile

 

 

29.       What is Preferred Locations

Ans: A preferred location (aka locality preferences or placement preferences) is a block location for an HDFS file where to compute each partition on.

def getPreferredLocations(split: Partition): Seq[String] specifies placement preferences for a partition in an RDD.

 

 30.   What is a RDD Lineage Graph

Ans: A RDD Lineage Graph (aka RDD operator graph) is a graph of the parent RDD of a RDD. It is built as a result of applying transformations to the RDD. A RDD lineage graph is hence a graph of what transformations need to be executed after an action has been called.

 

Comments

No Comments have been Posted.

Post Comment

Please Login to Post a Comment.

Ratings

Rating is available to Members only.

Please login or register to vote.

No Ratings have been Posted.
Render time: 0.94 seconds
10,810,742 unique visits