22.9 C

How To Install Apache Hadoop / HBase on Ubuntu 18.04

HBase is an open source distributed non-relational database developed under the Apache Software Foundation. It is written in Java & runs on top of Hadoop File Systems (HDFS). HBase is one of the dominant databases when working with big data. It is designed for a quick read & write access to huge amounts of structured data.

Today, we will cover our first guide on the Installation of Hadoop & HBase on Ubuntu 18.04 and it is a HBase Installation on a Single Node Hadoop Cluster. It is done on a barebone Ubuntu 18.04 Virtual Machine with 8GB Ram & 4vCPU

Installing Hadoop on Ubuntu 18.04

Cover these steps to install a Single node Hadoop cluster on Ubuntu 18.04 LTS

Step 1: Update System

To deploy Hadoop & HBase on Ubuntu , update it.

- Advertisement -
sudo apt update
sudo apt -y upgrade
sudo reboot

Step 2: Install Java

Skip this step if you have Installed java.

sudo apt install openjdk-8-jre-headless
sudo apt update

Confirm the Installation of Java by

sabi@Ubuntu:~$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

Set up JAVA_HOME variable.

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export PATH=$PATH:$JAVA_HOME/bin

Now, update your PATH & settings.

source /etc/profile.d/hadoop_java.sh

Testing Java

sabi@Ubuntu:~$ echo $JAVA_HOME

Step 3: Creating User Account

Move forward to create an Account for Hadoop so we have isolation b/w the Hadoop file system & the Unix file system.

sabi@Ubuntu:~$ sudo adduser hadoop
Adding user hadoop' ... Adding new grouphadoop' (1001) …
Adding new user hadoop' (1001) with grouphadoop' …
Creating home directory /home/hadoop' ... Copying files from/etc/skel' …
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
Full Name []: Sabir Hussain
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
sabi@Ubuntu:~$ sudo usermod -aG sudo hadoop

After adding user, generate SS key pair for the user.

sabi@Ubuntu:~$ sudo su - hadoop
hadoop@Ubuntu:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:f/lEUTkJyr49dZEHr9xZ7wCD4Lg3+ephloHQ8w8GVlY hadoop@Ubuntu
The key's randomart image is:
+---[RSA 2048]----+
| +.E .o +|
| . = …. B.|
| . * . .oo .o=|
| o * .. + +*|
| o S . =o+|
| o O oo.o.|
| = +.oo. .|
| o o . o. |
| .o . |

Allow authorization

Add this user’s key to list of Authorized ssh keys.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Make sure that you can ssh using added key.

hadoop@Ubuntu:~$ ssh localhost
The authenticity of host 'localhost (' can't be established.
ECDSA key fingerprint is SHA256:jyWPWJLVC9MCHnOAFJjN8c8bwLu0o0U85cWTxHwuHvE.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 5.0.0-37-generic x86_64)
Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage
Canonical Livepatch is available for installation.
Reduce system reboots and improve kernel security. Activate at:
0 packages can be updated.
0 updates are security updates.
Your Hardware Enablement Stack (HWE) is supported until April 2023.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
hadoop@Ubuntu:~$ exit
Connection to localhost closed.

Step 4: Download & Install Hadoop

Go for the latest release of Hadoop & download it.

wget https://www-eu.apache.org/dist/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz

Extract the files.

tar xzvf hadoop-2.10.0.tar.gz

Move resulting directory to /usr/local/hadoop

sudo mv hadoop-2.10.0 /usr/local/hadoop

Set up HADOOP_HOME and add directory with Hadoop binaries to your $PATH

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export HADOOP_HOME=/usr/local/hadoop

Source file using

source /etc/profile.d/hadoop_java.sh

Confirm your Hadoop version by

hadoop@Ubuntu:~$ hadoop version
Hadoop 2.10.0
Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
Compiled by jhung on 2019-10-22T19:10Z
Compiled with protoc 2.5.0
From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar

Step 5: Configure Hadoop

Hadoop configurations are located under /usr/local/hadoop/etc/hadoop/

Various files needed to be modified to complete the Installation on Ubuntu 18.04

First of all edit JAVA_HOME in shell script hadoop-env.sh:

$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Then configure:


The core-site.xml file contains Hadoop cluster information used when starting up. These properties include:

  • The port number used for Hadoop instance
  • The memory allocated for file system
  • The memory limit for data storage
  • The size of Read / Write buffers.

Open core-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following properties in b/w the <configuration> and </configuration> tags.

2. hdfs-site.xml

Configure this file for each host to be used in the cluster. It holds the information of

  • The namenode & datanode paths ol the local filesystem.
  • Value of replication data

I’m using my disk to store Hadoop infrastructure. You can follow this procedure for your secondary disk.

hadoop@Ubuntu:~$ lsblk
loop0 7:0 0 149.9M 1 loop /snap/gnome-3-28-1804/67
loop1 7:1 0 54.4M 1 loop /snap/core18/1066
loop2 7:2 0 4.2M 1 loop /snap/gnome-calculator/544
loop3 7:3 0 14.8M 1 loop /snap/gnome-characters/296
loop4 7:4 0 4M 1 loop /snap/gnome-calculator/406
loop5 7:5 0 3.7M 1 loop /snap/gnome-system-monitor/123
loop6 7:6 0 89.1M 1 loop /snap/core/8268
loop7 7:7 0 14.8M 1 loop /snap/gnome-characters/375
loop8 7:8 0 3.7M 1 loop /snap/gnome-system-monitor/100
loop9 7:9 0 1008K 1 loop /snap/gnome-logs/61
loop10 7:10 0 88.5M 1 loop /snap/core/7270
loop11 7:11 0 156.7M 1 loop /snap/gnome-3-28-1804/110
loop12 7:12 0 956K 1 loop /snap/gnome-logs/81
loop13 7:13 0 44.2M 1 loop /snap/gtk-common-themes/1353
loop14 7:14 0 42.8M 1 loop /snap/gtk-common-themes/1313
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 20G 0 part /
sr0 11:0 1 2G 0 rom

Do partition & mount the disk to /hadoop directory.

1.sudo parted -s -- /dev/sdb mklabel gpt
2.sudo parted -s -a optimal -- /dev/sdb mkpart primary 0% 100%
3.sudo parted -s -- /dev/sdb align-check optimal 1
4.sudo mkfs.xfs /dev/sdb1
5.sudo mkdir /hadoop
echo "/dev/sdb1 /hadoop xfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a


hadoop@Ubuntu:~$ df -hT | grep /dev/sda1
/dev/sda1 ext4 20G 7.4G 12G 40% /

Create directories for namenode & datanode

sudo mkdir -p /hadoop/hdfs/{namenode,datanode}

Now, set ownership to hadoop user & group

sudo chown -R hadoop:hadoop /hadoop

Open the file

sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Then add the below data in b/w <configuration> & </configuration> tags.

3. mapred-site.xml

Use this file to set the MapReduce Framework

sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

Set according to the below

4. yarn-site.xml

It will overwrite the configurations for Hadoop.yarn because it will define resource management & job scheduling logic.

sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

Do similar configuration

Step 6: Validate Hadoop Configuration

Initialize Hadoop Infrastructure store.

sudo su - hadoop
hdfs namenode -format

Test HDFS Configuration

$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [hbase]
hbase: Warning: Permanently added 'hbase' (ECDSA) to the list of known hosts.

In the end, verify the YARN configurations

$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

Hadoop 2.x default web UI ports.

  • NameNode – Default HTTP port is 9870.
  • ResourceManager – Default HTTP port is 8088.
  • MapReduce JobHistory Server – Default HTTP port is 19888.

Check these by typing

ss -tunelp

Access Hadoop Web Dashboard at http://ServerIP:9870

See Hadoop Cluster Overview at http://ServerIP:8080

Let’s create a directory to test

$ hadoop fs -mkdir /test
$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2019-12-29 10:23 /test
Stopping Hadoop Services

Run the following command to stop the Hadoop Services.

$ stop-dfs.sh
$ stop-yarn.sh

See our next article to read How To Install HBase on Ubuntu 18.04

- Advertisement -
Everything Linux, A.I, IT News, DataOps, Open Source and more delivered right to you.
"The best Linux newsletter on the web"


Please enter your comment!
Please enter your name here

Latest article