Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

4 min readMar 12, 2021

Introduction

We know, in today’s world, we need to deal with Big Data on every front. Thus, we have a huge demand for processing a large volume of data in a short amount of time.

Here comes Hadoop, which possesses the power to process and store a huge amount of data, or simply, Big Data very easily. A Hadoop cluster is a collection of computing devices with some storage, known as nodes.

These nodes are networked together to perform computations parallelly on big data sets, working on multiple machines simultaneously. For processing any data, the client submits the data and program to Hadoop. HDFS (Hadoop Distributed File System) is the protocol used to transfer & store the data.

Need for Elasticity

In the Hadoop cluster, data nodes contribute their storage to the namenode. Datanode keeps storing the data coming from the client-side. But, if the storage of the datanode contributed to the cluster goes full, then we can’t store any more data. For this reason, we need elasticity in storage.

But, Hadoop doesn’t provide such a feature. So, to achieve elasticity, we have to use one of the concepts of Linux Partition, known as LVM.

Task Demo

First, we need to have a Linux OS to work on. We are using RedHat Enterprise Linux 8, running on a VM. With this sorted, we need to get to the Terminal with the root user for now.

Now, we need to attach a new hard disk of certain volume to our VM, with which we will create logical volumes.

The next steps are as follows -

⭐️ Check Status of Disks in the OS

With the “fdisk -l” command, we can check the status of all the attached hard disk -

⭐️Create a physical volume

We need to create the physical volume with pvcreate command. We can see the details of physical volume with pvdisplay command -

⭐️ Create a Volume Group

We need to create a volume group with the same hard disk. For this, the command is :

vgcreate <vg_name> <physical_volume>

⭐️Create a Logical Volume

Now, we will create logical volume of size 5gb. We can see the details of created logical volume with lvdisplay command.

Creating logical voloume:
    lvcreate --size <volume_size> --name <logical_volume_name> <vg_name>Display the volume:
    lvdisplay <vg_name>
    lvdisplay <logical_volume_name>

⭐️Format the Logical Volume

As we see, the volume is now formatted. Now we have to mount it.

⭐️Mount the Volume with the Datanode Directory

Here, we will mount the logical volume with the directory assigned to the datanode.

Now, this volume is ready use as the datanode directory. If we check, now our shared datanode volume is same as the logical volume, i.e., 5Gb.

⭐️Scaling-up the Shared Datanode Volume

Now, we can increase the shared volume of the datanode on the fly, using the lvextend and resize2fs commands. Here, we will increase 3gb.

Now, if we check again, our shared datanode volume has been increased by 3Gb -

Finally, we are done.

Now if we exhaust the limit of the logical volume, we can solve this problem by just adding another hard disk and then extending our Volume Group to that disk.

Here we come to an end of the demo.

Hope this was helpful.

Thanks for reading.