JRuby code to purge data on Hbase over Hive table…

Problem to Solve:- How to delete/update/query Binary format stored values in a HBase column family column. Hive over HBase table, where we cant use standard API and unable to apply filters on binary values, you can use below solution for programmability.


Find JRuby source code at github location github.com/mkjmkumar/JRuby_HBase_API

This program written in JRuby to purge data using HBase shell and deletes required data applying filter on given binary column.


So you have already heard many advantages of storing data in HBase(specially binary block format) and create Hive table on top of that to query your data. I am not going to explain use case for this, why we required HBase over Hive but simple reason for batter visibility/representation of data in tabular format.

I have come across this scenario few days back when we required to purge HBase data after completion of retention period and we struck to delete data from HBase table using API and filters when that columns is of INT data type from Hive. Below is sample use case:-

There are two type of storage format when for Hive data in HBase:-

1. Binary

2. String

Storing data in Binary block in HBase has its own advantages. Below script to create sample tables in both Hbase and Hive:-


HBase:- create ‘tiny_hbase_table1’, ‘ck’, ‘o’, {NUMREGIONS => 16, SPLITALGO => ‘UniformSplit’}


key INT,
kingdom STRING,
kingdomkey INT,
kongo bigint
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key#b,o:kingdom#s,o:kingdomKey#b,o:kongo#b”)
“hbase.table.name” = “tiny_hbase_table1”,
“hbase.table.default.storage.type” = “binary”

Now we have populated our table from Hive and below are sample records:-

hbase(main):001:0> scan ‘orgdata4’

\x00\x00\x00\x00 — column=o:kingdom, value=23897
\x00\x00\x00\x00 — column=o:kingdomKey, value=\x00\x00\x00\x00
\x00\x00\x00\x00 — column=o:kongo, value=\x00\x00\x00\x00\x00\x00]Y
\x00\x00\x00\x02 — column=o:kingdom, value=2
\x00\x00\x00\x02 — column=o:kingdomKey, value=\x00\x00\x00\x02
\x00\x00\x00\x02 — column=o:kongo, value=\x00\x00\x00\x00\x00\x00\x00\x02
\x00\x00\x00\x0C — column=o:kingdom, value=12
\x00\x00\x00\x0C — column=o:kingdomKey, value=\x00\x00\x00\x0C
\x00\x00\x00\x0C — column=o:kongo, value=\x00\x00\x00\x00\x00\x00\x00\x0C
\x00\x00\x00\x15 — column=o:kingdom, value=21
\x00\x00\x00\x15 — column=o:kingdomKey, value=\x00\x00\x00\x15
\x00\x00\x00\x15 — column=o:kongo, value=\x00\x00\x00\x00\x00\x00\x00\x15
\x00\x00\x00\xC8 — column=o:kingdom, value=200
\x00\x00\x00\xC8 — column=o:kingdomKey, value=\x00\x00\x00\xC8
\x00\x00\x00\xC8 — column=o:kongo, value=\x00\x00\x00\x00\x00\x00\x00\xC8
\x00\x1E\xA2\xE1 — column=o:kingdom, value=2007777
\x00\x1E\xA2\xE1 — column=o:kingdomKey, value=\x00\x1E\xA2\xE1
\x00\x1E\xA2\xE1 — column=o:kongo, value=\x00\x00\x00\x00\x00\x1E\xA2\xE1


You can see INT datatype columns key and kingdomkey are modified to binary format and if would like to filter these values based on HBase filter like

scan ‘orgdata’, {FILTER => “(PrefixFilter (‘\x00\x1E\xA2\xE1’))”}

It will not return any result, and if we want to delete that data based on o:key = 2 simply not possible using HBase CLI.


My Solution is to create a JRuby program and execute that program on HBase CLI like below:-

def purge_qhour_agrt()

var_table = “tiny_hbase_table1”

htable = HTable.new(HBaseConfiguration.new, var_table)

rs = htable.getScanner(Bytes.toBytes(“o”), Bytes.toBytes(“kingdomKey”))

output = ArrayList.new output.add “ROW\t\t\t\t\t\tCOLUMN\+CELL”

rs.each { |r| r.raw.each { |kv|

row = Bytes.toInt(kv.getRow)

fam = kv.getFamily

ql = Bytes.toString(kv.getQualifier)

ts = kv.getTimestamp

val = Bytes.toInt(kv.getValue)

rowval = Bytes.toInt(kv.getRow)

output.add “#{row} #{ql} #{val}”



output.each {|line| puts “#{line}\n”}




Below is the result on screen:-


1000 kingdomKey 99909

1001 kingdomKey 99919

You can you below condition to apply filter and delete based on readable values:-

if val <= purge_before_date and row.include? ‘likeme^’
output.add “#{val} #{row} <<<<<<<<<<<<<<<<<<<<<<<<<<- Candidate for deletion”
deleteall var_table, row

Hope this solve a problem you are facing too. Let me know in case of any query and suggestions.