Users Online

· Guests Online: 142

· Members Online: 0

· Total Members: 188
· Newest Member: meenachowdary055

Forum Threads

Newest Threads
No Threads created
Hottest Threads
No Threads created

Latest Articles

Articles Hierarchy

Articles: Hadoop




04. Hadoop - Introduction
Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. A Hadoop frame-worked application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

05. Hadoop - Environment Setup
Hadoop is supported by GNU/Linux platform and its flavors. Therefore, we have to install a Linux operating system for setting up Hadoop environment. In case you have an OS other than Linux, you can install a Virtualbox software in it and have Linux inside the Virtualbox.

06. Hadoop - HDFS Overview
Hadoop File System was developed using distributed file system design. It is run on commodity hardware. Unlike other distributed systems, HDFS is highly faulttolerant and designed using low-cost hardware.


08. Hadoop - Command Reference
There are many more commands in "$HADOOP_HOME/bin/hadoop fs" than are demonstrated here, although these basic operations will get you started. Running ./bin/hadoop dfs with no additional arguments will list all the commands that can be run with the FsShell system. Furthermore, $HADOOP_HOME/bin/hadoop fs -help commandName will display a short usage summary for the operation in question, if you are stuck.

09. MapReduce
MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity hardware in a reliable manner.

10. Hadoop - Streaming
Hadoop streaming is a utility that comes with the Hadoop distribution. This utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer.

11. Hadoop Multi Node Cluster
This chapter explains the setup of the Hadoop Multi-Node cluster on a distributed environment. As the whole cluster cannot be demonstrated, we are explaining the Hadoop cluster environment using three systems (one master and two slaves); given below are their IP addresses. Hadoop Master: 192.168.1.15 (hadoop-master) Hadoop Slave: 192.168.1.16 (hadoop-slave-1) Hadoop Slave: 192.168.1.17 (hadoop-slave-2) Follow the steps given below to have Hadoop Multi-Node cluster setup.

12. Hadoop Interview Questions and Answers
Dear readers, these Hadoop Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of Hadoop. As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer −

Apache Spark For Java Developers

Apache Spark For Java Developers


Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka

Extending Hadoop for Data Science: Streaming, Spark, Storm, and Kafka


Hadoop Questions and Answers Data Flow
Hadoop Questions and Answers – Data Flow This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Data Flow”.
Render time: 1.00 seconds
10,813,564 unique visits