Problem to Solve:- How to delete/update/query Binary format stored values in a HBase column family column. Hive over HBase table, where we cant use standard API and unable to apply filters on binary values, you can use below solution for programmability.
Find JRuby source code at github location github.com/mkjmkumar/JRuby_HBase_API
This program written in JRuby to purge data using HBase shell and deletes required data applying filter on given binary column.
So you have already heard many advantages of storing data in HBase(specially binary block format) and create Hive table on top of that to query your data. I am not going to explain use case for this, why we required HBase over Hive but simple reason for batter visibility/representation of data in tabular format.
I have come across this scenario few days back when we required to purge HBase data after completion of retention period and we struck to delete data from HBase table using API and filters when that columns is of INT data type from Hive. Below is sample use case:-
There are two type of storage format when for Hive data in HBase:-
1. Binary
2. String
Storing data in Binary block in HBase has its own advantages. Below script to create sample tables in both Hbase and Hive:-
HBase:- create ‘tiny_hbase_table1’, ‘ck’, ‘o’, {NUMREGIONS => 16, SPLITALGO => ‘UniformSplit’}
Hive:- CREATE EXTERNAL TABLE orgdata (
key INT,
kingdom STRING,
kingdomkey INT,
kongo bigint
)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key#b,o:kingdom#s,o:kingdomKey#b,o:kongo#b”)
TBLPROPERTIES(
“hbase.table.name” = “tiny_hbase_table1”,
“hbase.table.default.storage.type” = “binary”
);
Now we have populated our table from Hive and below are sample records:-
hbase(main):001:0> scan ‘orgdata4’
ROW COLUMN+CELL
\x00\x00\x00\x00 — column=o:kingdom, value=23897
\x00\x00\x00\x00 — column=o:kingdomKey, value=\x00\x00\x00\x00
\x00\x00\x00\x00 — column=o:kongo, value=\x00\x00\x00\x00\x00\x00]Y
\x00\x00\x00\x02 — column=o:kingdom, value=2
\x00\x00\x00\x02 — column=o:kingdomKey, value=\x00\x00\x00\x02
\x00\x00\x00\x02 — column=o:kongo, value=\x00\x00\x00\x00\x00\x00\x00\x02
\x00\x00\x00\x0C — column=o:kingdom, value=12
\x00\x00\x00\x0C — column=o:kingdomKey, value=\x00\x00\x00\x0C
\x00\x00\x00\x0C — column=o:kongo, value=\x00\x00\x00\x00\x00\x00\x00\x0C
\x00\x00\x00\x15 — column=o:kingdom, value=21
\x00\x00\x00\x15 — column=o:kingdomKey, value=\x00\x00\x00\x15
\x00\x00\x00\x15 — column=o:kongo, value=\x00\x00\x00\x00\x00\x00\x00\x15
\x00\x00\x00\xC8 — column=o:kingdom, value=200
\x00\x00\x00\xC8 — column=o:kingdomKey, value=\x00\x00\x00\xC8
\x00\x00\x00\xC8 — column=o:kongo, value=\x00\x00\x00\x00\x00\x00\x00\xC8
\x00\x1E\xA2\xE1 — column=o:kingdom, value=2007777
\x00\x1E\xA2\xE1 — column=o:kingdomKey, value=\x00\x1E\xA2\xE1
\x00\x1E\xA2\xE1 — column=o:kongo, value=\x00\x00\x00\x00\x00\x1E\xA2\xE1
You can see INT datatype columns key and kingdomkey are modified to binary format and if would like to filter these values based on HBase filter like
scan ‘orgdata’, {FILTER => “(PrefixFilter (‘\x00\x1E\xA2\xE1’))”}
It will not return any result, and if we want to delete that data based on o:key = 2 simply not possible using HBase CLI.
My Solution is to create a JRuby program and execute that program on HBase CLI like below:-
def purge_qhour_agrt()
var_table = “tiny_hbase_table1”
htable = HTable.new(HBaseConfiguration.new, var_table)
rs = htable.getScanner(Bytes.toBytes(“o”), Bytes.toBytes(“kingdomKey”))
output = ArrayList.new output.add “ROW\t\t\t\t\t\tCOLUMN\+CELL”
rs.each { |r| r.raw.each { |kv|
row = Bytes.toInt(kv.getRow)
fam = kv.getFamily
ql = Bytes.toString(kv.getQualifier)
ts = kv.getTimestamp
val = Bytes.toInt(kv.getValue)
rowval = Bytes.toInt(kv.getRow)
output.add “#{row} #{ql} #{val}”
}
}
output.each {|line| puts “#{line}\n”}
end
purge_qhour_agrt
Below is the result on screen:-
ROW COLUMN+CELL
1000 kingdomKey 99909
1001 kingdomKey 99919
You can you below condition to apply filter and delete based on readable values:-
if val <= purge_before_date and row.include? ‘likeme^’
output.add “#{val} #{row} <<<<<<<<<<<<<<<<<<<<<<<<<<- Candidate for deletion”
deleteall var_table, row
end
Hope this solve a problem you are facing too. Let me know in case of any query and suggestions.