What are steps for Hadoop Configuration with Example?

In my previous article I have given the details about Hadoop Architecture and where to use Hadoop in detail. In this article I would like to give the steps for Hadoop Configuration so that one can configure the Hadoop with steps. We will start with Hadoop Architecture snapshot and then will move towards the Hadoop Configuration in detail.

Hadoop Architecture for Hadoop Configuration :

In this section we will discuss about Hadoop architecture in detail which is useful for Hadoop configuration. The Hadoop architecture is an open- source framework for processing large amounts of data quickly utilizing the principles of distributed computing, where the data is dispersed over various cluster nodes. This architecture divides data processing and storage into two phases and uses a master-slave structure. These tasks are carried out by MapReduce and HDFS, with MapReduce handling the processing and HDFS handling the storing.

Hadoop architecture

Its architecture for data processing and storage is master-slave. The name node in Hadoop serves as the master node for data storage. A master node that uses Hadoop Map Reduce to monitor and parallelize data processing is also present.
The slaves are additional Hadoop cluster machines that assist with data archiving and advanced computations. To efficiently run the processes and synchronize them, each slave node has been given a task tracker, and a data node has a job tracker. This kind of technology can be installed on-premises or in the cloud.

Hadoop Configuration :

Numerous services, including HDFS, Yarn, Oozie, MapReduce, Spark, Atlas, Ranger,Zeppelin, Kafka, NiFi, Hive, HBase, and others, are included in the Hadoop stack. Each service has its unique functionality and operation style. As we’ve already mentioned, each service has a different setup in addition to a varied working approach. We must take care of the operating system configuration as well before configuring Hadoop. Hadoop configuration will be covered in the second section of the Hadoop ecosystem. In the initial stage, we must fine-tune the operating system settings and bring it up to standard. The operating system will therefore be able to withstand the strain of the Hadoop environment.

Location of Configuration File :etc/hadoop/

Kindly check following Configuration files with its usages :

https.need-auth function: It will be possible to determine whether SSL client certificate authentication is necessary or not for client and server connection by using the client.
client.cached.conn.retry: The value will determine how frequently the HDFS client can obtain a socket from the cache. The HDFS client will attempt to create a new socket if the maximum number of socket tries is reached.
https.server.keystore.resource: The resource file will be the identical one from which we will extract the SSL server Keystore proof.
client.https.keystore.resource: The HTTPS connection proof will be taken from the same resource file that we used to obtain the SSL server’s Keystore.
qjournal.queued-edits.limit.mb: It will aid in defining the queue size for quorum journal edits. It will be specified in MB.
qjournal.select-input-streams.timeout.ms: It is the timeout value for accepting streams for journal managers. It would be measured in milliseconds.
qjournal.start-segment.timeout.mb: The quorum timeout can be defined with the aid of this configuration variable. It will be measured in milliseconds.
Datanode.https.address: Datanode’s secure HTTPS server address and port number will be specified by the configuration parameter.
Namenode.https-address: The configuration parameter for the namenode secure https server address and port information.

OOZIE

CATALINA_OPTS: It will assist in tomcat server configuration. Running the oozie java configuration or properties will be helpful. This setting has no default value.
OOZIE_CONFIG_FILE: Load the oozie configuration file into the system using this configuration property. Oozie-site.xml is the configuration’s value.
OOZIE_LOGS: The information from the oozie logs should be kept in the designated directory. The value will be defined by the Oozie server on its own during installation.

YARN

resource-types: The addition of resources will take place. We must specify each value using a comma. It won’t contain configuration information like RAM values (in Mb or GB) or vcore counts.

resource-types..units: In the yarn configuration, this will serve as the resource type’s default unit.

resource-types.: We can specify a value for the minimum request for a specific resource type using the.minimum-allocation property.

resource-types.: We can specify a value for the maximum request for a specific resource type using the.maximum-allocation property.

App.mapreduce.am.resource.mb: It would be useful to configure the RAM requested for the application master container. It will be expressed as MB.

App.mapreduce.am.resource.memory: It will help to configure the memory requested for the application master container; the default values of the configuration are 1536. It will be expressed as MB.
app.mapreduce.am.resource.memory-mb: Setting the memory demanded for the application master will be helpful. It will be expressed as MB. The configuration’s default values are 1536 for app.
mapreduce.am.resource.cpu-vcores: Setting the CPU demanded for the application master container to the value will be beneficial. It will be expressed as a CPU count. The configuration’s default settings are 1.

The above are key configuration file for doing Hadoop configuration effectively. I hope you like this article on Hadoop Configuration. If you like this article or if you are having issues with the same kindly comment in comments section.

What are Usages of Hadoop ? | Where to Use Hadoop?

What is Hadoop Framework with diagram?

What are Important Hadoop Interview Questions and answers?

Amit S

Oracle Consultant with vast experience in Oracle BI and PL/SQL Development. Amiet is the admin head of this website who contributes by preparing tutorials and articles related to database technologies. He is responsible to manage the content and front-end of the website.