Monday, 28 November 2016

10 mostly asked Hadoop Interview Questions

Hadoop is a constantly changing field which requires everyone to quickly upgrade their skills, to be fit as per the requirements to join Hadoop related jobs. If you are planning to apply for a Hadoop job role, it is important to be prepared with the Hadoop Interview Questions that may come your way. There are more than 30,000 open Hadoop developer jobs, professionals should be aware of each and every component of the Hadoop ecosystem to make sure that they have a deep understanding about Hadoop.
Hadoop is an open source, a Java-based programming framework that supports the storage and processing of extremely large data sets in distributed computing environment. Let’s look at some of the frequently asked HadoopInterview Questions
1.      What is the function of ‘jps’ command?
Answer – It gives the status of daemons which runs Hadoop cluster. It provides the output mentioning the secondary name node, data node, status of name node, Job tracker & Task tracker.

2.      How to restart Name node?
Answer – Click on ‘stop-all.sh’ and then click on ‘start-all.sh

3.      Which are the tree modes in which Hadoop can be run?
Answer – a) Pseudo-distributed mode
                  b) Standalone mode (local)
                  c) Fully distributed mode

4.      What does ‘/etc/init.d’ do?
Answer – It specifies where the daemons are placed or to see the status of these daemons. It is LINUX specific and nothing to do with Hadoop

5.      What is big data?
Answer – It is an assortment of such a huge and complex data that becomes very tedious to store, capture, process, retrieve, and analyze it using a hand database management tool or data processing technique.

6.      Why do we need Hadoop?
Answer – Major challenge is not to store large data sets in the system because every day a huge chunk of unstructured data is getting dumped into our systems. Hadoop has the ability to analyze a data present in different machines at different locations quickly and very cost effectively. It uses the concept of Map Reduce that enables it to divide the query into small parts and process them in parallel.

7.      What is fault tolerance?
Answer – In case you have a file stored in a system but due to technical issues that file gets destroyed. Then there is absolutely no chance of getting the data back present in the file. To avoid situations like such, Hadoop has introduced the feature of fault tolerance in HDFS. In Hadoop, when you store a file it automatically gets replicated at two locations giving you two copies of the same data in different locations so that if one gets corrupted you still have access to the other copy of the file

8.      What do you mean by task tracker?
Answer – Task tracker is a kind of daemon that runs on the data nodes. It manages to execute an individual task on slave node when a job is submitted by a client, the job tracker will initialize the job and then divide it and assign them to different task trackers to perform Map Reduce tasks.

9.      What is a rack?
Answer – It is a storage area with all the data nodes put together. These data nodes are physically located at different places. Rack is a physical collection of data nodes that is stored at a single location. A single location can have multiple racks.

1.  What is name node?
Answer – It is the master node on which job tracker runs and consists of the metadata. It maintains and manages the blocks that are present on the data nodes.

These are the top 10 Hadoop Interview Questions that can be asked to you if you’re looking forward to applying for a post of one such Hadoop developer.


No comments:

Post a Comment