How To Install Apache Hadoop / HBase on Ubuntu 18.04

sabi — Fri, 17 Jan 2020 13:53:50 +0000

HBase is an open source distributed non-relational database developed under the Apache Software Foundation. It is written in Java & runs on top of Hadoop File Systems (HDFS). HBase is one of the dominant databases when working with big data. It is designed for a quick read & write access to huge amounts of structured data.

Today, we will cover our first guide on the Installation of Hadoop & HBase on Ubuntu 18.04 and it is a HBase Installation on a Single Node Hadoop Cluster. It is done on a barebone Ubuntu 18.04 Virtual Machine with 8GB Ram & 4vCPU

Installing Hadoop on Ubuntu 18.04

Cover these steps to install a Single node Hadoop cluster on Ubuntu 18.04 LTS

Step 1: Update System

To deploy Hadoop & HBase on Ubuntu , update it.

sudo apt update
sudo apt -y upgrade
sudo reboot

Step 2: Install Java

Skip this step if you have Installed java.

sudo apt install openjdk-8-jre-headless
sudo apt update

Confirm the Installation of Java by

sabi@Ubuntu:~$ java -version
 openjdk version "1.8.0_232"
 OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09)
 OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

Set up JAVA_HOME variable.

cat <export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export PATH=\$PATH:\$JAVA_HOME/bin
EOF

Now, update your PATH & settings.

source /etc/profile.d/hadoop_java.sh

Testing Java

sabi@Ubuntu:~$ echo $JAVA_HOME
 /usr/lib/jvm/java-11-openjdk-amd64

Step 3: Creating User Account

Move forward to create an Account for Hadoop so we have isolation b/w the Hadoop file system & the Unix file system.

sabi@Ubuntu:~$ sudo adduser hadoop
 Adding user hadoop' ... Adding new grouphadoop' (1001) …
 Adding new user hadoop' (1001) with grouphadoop' …
 Creating home directory /home/hadoop' ... Copying files from/etc/skel' …
 Enter new UNIX password: 
 Retype new UNIX password: 
 passwd: password updated successfully
 Changing the user information for hadoop
 Enter the new value, or press ENTER for the default
     Full Name []: Sabir Hussain
     Room Number []: 
     Work Phone []: 
     Home Phone []: 
     Other []: 
 Is the information correct? [Y/n] y
sabi@Ubuntu:~$ sudo usermod -aG sudo hadoop

After adding user, generate SS key pair for the user.

sabi@Ubuntu:~$ sudo su - hadoop
hadoop@Ubuntu:~$ ssh-keygen -t rsa
 Generating public/private rsa key pair.
 Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
 Created directory '/home/hadoop/.ssh'.
 Enter passphrase (empty for no passphrase): 
 Enter same passphrase again: 
 Your identification has been saved in /home/hadoop/.ssh/id_rsa.
 Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
 The key fingerprint is:
 SHA256:f/lEUTkJyr49dZEHr9xZ7wCD4Lg3+ephloHQ8w8GVlY hadoop@Ubuntu
 The key's randomart image is:
 +---[RSA 2048]----+
 |        +.E  .o +|
 |     . = ….  B.|
 |    . * . .oo .o=|
 |     o * ..  + +*|
 |      o S  .  =o+|
 |       o O  oo.o.|
 |        = +.oo. .|
 |       o o . o.  |
 |       .o     .  |
 +----[SHA256]-----+

Allow authorization

Add this user’s key to list of Authorized ssh keys.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Make sure that you can ssh using added key.

hadoop@Ubuntu:~$ ssh localhost
 The authenticity of host 'localhost (127.0.0.1)' can't be established.
 ECDSA key fingerprint is SHA256:jyWPWJLVC9MCHnOAFJjN8c8bwLu0o0U85cWTxHwuHvE.
 Are you sure you want to continue connecting (yes/no)? y
 Please type 'yes' or 'no': yes
 Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
 Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 5.0.0-37-generic x86_64)
 Documentation:  https://help.ubuntu.com
 Management:     https://landscape.canonical.com
 Support:        https://ubuntu.com/advantage
 Canonical Livepatch is available for installation.
 Reduce system reboots and improve kernel security. Activate at:
  https://ubuntu.com/livepatch 
 0 packages can be updated.
 0 updates are security updates.
 Your Hardware Enablement Stack (HWE) is supported until April 2023.
 The programs included with the Ubuntu system are free software;
 the exact distribution terms for each program are described in the
 individual files in /usr/share/doc/*/copyright.
 Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
 applicable law.
 hadoop@Ubuntu:~$ exit
 logout
 Connection to localhost closed.

Step 4: Download & Install Hadoop

Go for the latest release of Hadoop & download it.

wget https://www-eu.apache.org/dist/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz

Extract the files.

tar xzvf hadoop-2.10.0.tar.gz

Move resulting directory to /usr/local/hadoop

sudo mv hadoop-2.10.0 /usr/local/hadoop

Set up HADOOP_HOME and add directory with Hadoop binaries to your $PATH

cat < export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
 export HADOOP_HOME=/usr/local/hadoop
 export HADOOP_HDFS_HOME=$HADOOP_HOME
 export HADOOP_MAPRED_HOME=$HADOOP_HOME
 export YARN_HOME=$HADOOP_HOME
 export HADOOP_COMMON_HOME=$HADOOP_HOME
 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
 export PATH=\$PATH:\$JAVA_HOME/bin:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin
 EOF

Source file using

source /etc/profile.d/hadoop_java.sh

Confirm your Hadoop version by

hadoop@Ubuntu:~$ hadoop version
 Hadoop 2.10.0
 Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
 Compiled by jhung on 2019-10-22T19:10Z
 Compiled with protoc 2.5.0
 From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar

Step 5: Configure Hadoop

Hadoop configurations are located under /usr/local/hadoop/etc/hadoop/

Various files needed to be modified to complete the Installation on Ubuntu 18.04

First of all edit JAVA_HOME in shell script hadoop-env.sh:

$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Then configure:

1.core-site.xml

The core-site.xml file contains Hadoop cluster information used when starting up. These properties include:

The port number used for Hadoop instance
The memory allocated for file system
The memory limit for data storage
The size of Read / Write buffers.

Open core-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following properties in b/w the and tags.

2. hdfs-site.xml

Configure this file for each host to be used in the cluster. It holds the information of

The namenode & datanode paths ol the local filesystem.
Value of replication data

I’m using my disk to store Hadoop infrastructure. You can follow this procedure for your secondary disk.

hadoop@Ubuntu:~$ lsblk
 NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
 loop0    7:0    0 149.9M  1 loop /snap/gnome-3-28-1804/67
 loop1    7:1    0  54.4M  1 loop /snap/core18/1066
 loop2    7:2    0   4.2M  1 loop /snap/gnome-calculator/544
 loop3    7:3    0  14.8M  1 loop /snap/gnome-characters/296
 loop4    7:4    0     4M  1 loop /snap/gnome-calculator/406
 loop5    7:5    0   3.7M  1 loop /snap/gnome-system-monitor/123
 loop6    7:6    0  89.1M  1 loop /snap/core/8268
 loop7    7:7    0  14.8M  1 loop /snap/gnome-characters/375
 loop8    7:8    0   3.7M  1 loop /snap/gnome-system-monitor/100
 loop9    7:9    0  1008K  1 loop /snap/gnome-logs/61
 loop10   7:10   0  88.5M  1 loop /snap/core/7270
 loop11   7:11   0 156.7M  1 loop /snap/gnome-3-28-1804/110
 loop12   7:12   0   956K  1 loop /snap/gnome-logs/81
 loop13   7:13   0  44.2M  1 loop /snap/gtk-common-themes/1353
 loop14   7:14   0  42.8M  1 loop /snap/gtk-common-themes/1313
 sda      8:0    0    20G  0 disk 
 └─sda1   8:1    0    20G  0 part /
 sr0     11:0    1     2G  0 rom

Do partition & mount the disk to /hadoop directory.

1.sudo parted -s -- /dev/sdb mklabel gpt
2.sudo parted -s -a optimal -- /dev/sdb mkpart primary 0% 100%
3.sudo parted -s -- /dev/sdb align-check optimal 1
4.sudo mkfs.xfs /dev/sdb1
5.sudo mkdir /hadoop
echo "/dev/sdb1 /hadoop xfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a

Check:

hadoop@Ubuntu:~$ df -hT | grep /dev/sda1

/dev/sda1      ext4       20G  7.4G   12G  40% /

Create directories for namenode & datanode

sudo mkdir -p /hadoop/hdfs/{namenode,datanode}

Now, set ownership to hadoop user & group

sudo chown -R hadoop:hadoop /hadoop

Open the file

sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Then add the below data in b/w & tags.

3. mapred-site.xml

Use this file to set the MapReduce Framework

sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

Set according to the below

4. yarn-site.xml

It will overwrite the configurations for Hadoop.yarn because it will define resource management & job scheduling logic.

sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

Do similar configuration

Step 6: Validate Hadoop Configuration

Initialize Hadoop Infrastructure store.

sudo su - hadoop
hdfs namenode -format

Test HDFS Configuration

$ start-dfs.sh

Starting namenodes on [localhost]

Starting datanodes

Starting secondary namenodes [hbase]

hbase: Warning: Permanently added 'hbase' (ECDSA) to the list of known hosts.

In the end, verify the YARN configurations

$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

Hadoop 2.x default web UI ports.

NameNode – Default HTTP port is 9870.
ResourceManager – Default HTTP port is 8088.
MapReduce JobHistory Server – Default HTTP port is 19888.

Check these by typing

ss -tunelp

Access Hadoop Web Dashboard at http://ServerIP:9870

See Hadoop Cluster Overview at http://ServerIP:8080

Let’s create a directory to test

$ hadoop fs -mkdir /test
 $ hadoop fs -ls /
 Found 1 items
 drwxr-xr-x   - hadoop supergroup          0 2019-12-29 10:23 /test

Stopping Hadoop Services

Run the following command to stop the Hadoop Services.

$ stop-dfs.sh
 $ stop-yarn.sh

See our next article to read How To Install HBase on Ubuntu 18.04

The post How To Install Apache Hadoop / HBase on Ubuntu 18.04 appeared first on Linux Windows and android Tutorials.

Apache Hadoop Archives - Linux Windows and android Tutorials