How To Install And Configure Apache Hadoop On Ubuntu

Apache Hadoop is a Java-based, open-source, freely available software platform for storing and analyzing big datasets on your system clusters. It keeps its data in the Hadoop Distributed File system (HDFS) and processes it utilizing MapReduce. Hadoop has been used in machine learning and data mining techniques. It is also used for managing multiple dedicated servers.

The primary components of Apache Hadoop are:

HDFS: In Apache Hadoop, HDFS is a file system that is distributed over numerous nodes.
MapReduce: It is a framework for developing applications that handle a massive amount of data.
Hadoop Common: It is a set of libraries and utilities that are needed by Hadoop modules.

How to install Apache Hadoop on Ubuntu

First of all, we will open up our Ubuntu terminal by pressing “CTRL+ALT+T”, you can also type “terminal” in the application’s search bar as follows:

The next step is to update the system repositories:

$ sudo apt update

Now we will install Java on our Ubuntu system by writing out the following command in the terminal:

$ sudo apt install openjdk-11-jdk

Enter “y/Y” to permit the installation process to continue:

Now, verify the existence of the installed Java by checking its version:

$ java -version

We will create a separate user for running Apache Hadoop on our system by utilizing the “adduser” command:

$ sudo adduser hadoopuser

Enter the new user’s password, its full name, and other information. Type “y/Y” to confirm that the provided information is correct:

It’s time to switch the current user with the created Hadoop user, which is “hadoopuser” in our case:

$ su – hadoopuser

Now, utilize the below-given command for generating private and public key pairs:

$ ssh-keygen -t rsa

Enter the file address where you want to save the key pair. After this, add a passphrase that you are going to be used in the whole setup of the Hadoop user:

Next, add these key pairs to the ssh authorized_keys:

at ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

As we have stored the generated key pair in the ssh authorized key, now we will change the file permissions to “640” which means that only we as the “owner” of the file will have the read and write permissions, “groups” will only have the read permission. No permission will be granted to “other users”:

$ chmod 640 ~/.ssh/authorized_keys

Now authenticate the localhost by writing out the following command:

$ ssh localhost

Utilize the below-given wget command for installing the Hadoop framework for your system:

$ wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

Extract the downloaded “hadoop-3.3.0.tar.gz” file with the tar command:

$ tar -xvzf hadoop-3.3.0.tar.gz

You can also rename the extracted directory as we will do by executing the below-given command:

$ mv hadoop-3.3.0 hadoop

Now, configure Java environment variables for setting up Hadoop. For this, we will check out the location of our “JAVA_HOME” variable:

$ dirname $(dirname $(readlink -f $(which java)))

Open the “~/.bashrc” file in your “nano” text editor:

$ nano ~/.bashrc

Add the following paths in the opened “~/.bashrc” file:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/home/hadoopuser/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS=“-Djava.library.path=$HADOOP_HOME/lib/native”

After that, press “CTRL+O” to save the changes we made in the file:

Now, write out the below-given command to activate the “JAVA_HOME” environment variable:

$ source ~/.bashrc

The next thing we have to do is to open up the environment variable file of Hadoop:

$ nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

We have to set our “JAVA_HOME” variable in the Hadoop environment:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Again, press “CTRL+O” to save the file content:

How to configure Apache Hadoop on Ubuntu

Till this point, we have successfully installed JAVA and Hadoop, created Hadoop users, configured SSH key-based authentication. Now, we will move forward to show you how to configure Apache Hadoop on the Ubuntu system. For this, the step is to create two directories: datanode and namenode, inside the home directory of Hadoop:

$ mkdir -p ~/hadoopdata/hdfs/namenode

$ mkdir -p ~/hadoopdata/hdfs/datanode

We will update the Hadoop “core-site.xml” file by adding our hostname, so firstly, confirm your system hostname by executing this command:

Conclusion

For different big data applications, Apache Hadoop is a freely available platform for managing, storing, and processing data that operates on clustered servers. It is a fault-tolerant distributed file system that allows parallel processing. In Hadoop, the MapReduce model is utilized for storing and extracting data from its nodes. In this article, we have shown you the method for installing and configuring Apache Hadoop on your Ubuntu system.

Discover more from Ubuntu-Server.com

Subscribe to get the latest posts sent to your email.

How to Install and Configure Apache Hadoop on Ubuntu

How to install Apache Hadoop on Ubuntu

How to configure Apache Hadoop on Ubuntu

Conclusion

Like this:

Related

Discover more from Ubuntu-Server.com

Comments

Leave a Reply Cancel reply

How to install Apache Hadoop on Ubuntu

How to configure Apache Hadoop on Ubuntu

Conclusion

Share this:

Like this:

Related

Discover more from Ubuntu-Server.com

Comments

Leave a Reply Cancel reply