Hadoop is a
constantly changing field which requires everyone to quickly upgrade their
skills, to be fit as per the requirements to join Hadoop related jobs. If you
are planning to apply for a Hadoop job role, it is important to be prepared
with the Hadoop Interview Questions
that may come your way. There are more than 30,000 open Hadoop developer jobs,
professionals should be aware of each and every component of the Hadoop
ecosystem to make sure that they have a deep understanding about Hadoop.
Hadoop is an open
source, a Java-based programming framework that supports the storage and
processing of extremely large data sets in distributed computing environment.
Let’s look at some of the frequently asked HadoopInterview Questions –
1. What is the function of ‘jps’ command?
Answer – It gives
the status of daemons which runs Hadoop cluster. It provides the output
mentioning the secondary name node, data node, status of name node, Job tracker
& Task tracker.
2. How to restart Name node?
Answer – Click on
‘stop-all.sh’ and then click on ‘start-all.sh
3. Which are the tree modes in which Hadoop can be
run?
Answer – a)
Pseudo-distributed mode
b) Standalone mode (local)
c) Fully distributed mode
4. What does ‘/etc/init.d’ do?
Answer – It specifies
where the daemons are placed or to see the status of these daemons. It is LINUX
specific and nothing to do with Hadoop
5. What is big data?
Answer – It is an
assortment of such a huge and complex data that becomes very tedious to store,
capture, process, retrieve, and analyze it using a hand database management
tool or data processing technique.
6. Why do we need Hadoop?
Answer – Major
challenge is not to store large data sets in the system because every day a
huge chunk of unstructured data is getting dumped into our systems. Hadoop has
the ability to analyze a data present in different machines at different
locations quickly and very cost effectively. It uses the concept of Map Reduce
that enables it to divide the query into small parts and process them in
parallel.
7. What is fault tolerance?
Answer – In case you
have a file stored in a system but due to technical issues that file gets
destroyed. Then there is absolutely no chance of getting the data back present
in the file. To avoid situations like such, Hadoop has introduced the feature
of fault tolerance in HDFS. In Hadoop, when you store a file it automatically
gets replicated at two locations giving you two copies of the same data in
different locations so that if one gets corrupted you still have access to the
other copy of the file
8. What do you mean by task tracker?
Answer – Task
tracker is a kind of daemon that runs on the data nodes. It manages to execute
an individual task on slave node when a job is submitted by a client, the job
tracker will initialize the job and then divide it and assign them to different
task trackers to perform Map Reduce tasks.
9. What is a rack?
Answer – It is a
storage area with all the data nodes put together. These data nodes are
physically located at different places. Rack is a physical collection of data
nodes that is stored at a single location. A single location can have multiple
racks.
1. What is name node?
Answer – It is the
master node on which job tracker runs and consists of the metadata. It
maintains and manages the blocks that are present on the data nodes.
These are the top 10
Hadoop Interview Questions that can be asked to you if you’re looking forward
to applying for a post of one such Hadoop developer.