hbase 数据导出乱码_通过hbase快照备份恢复数据流程

cde1709d7dcc7035a9bbe93958862a1f.png

一、备份数据

1.获取已存在的数据

获取已存在的对象
curl localhost:50840/snapshot/snapshot -X GET  -H 'Username: work' -H 'Password:123456'
curl localhost:50840/snapshot/hello -X GET  -H 'Username: work' -H 'Password:123456'
存在数据!

2.从hbase导出快照

导出快照,这里导出的是三张表snapshot_data,snapshot_meta,snapshot_index
例子:hbase> snapshot 'sync_stage:Photo', 'PhotoSnapshot' //对sync_stage这个namespace下的Photo表做一次snapshot(表只有一个column family,叫做PHOTO),snapshot名字叫做PhotoSnapshot
hbase> snapshot 'default:snapshot_data', 'snapshot_data'
hbase> snapshot 'default:snapshot_meta', 'snapshot_meta'
hbase> snapshot 'default:snapshot_index', 'snapshot_index'

3.查看当前的snapshots

hbase(main):006:0> list_snapshots
SNAPSHOT                             TABLE + CREATION TIME            
 snapshot_data                       snapshot_data (Thu Nov 12 17:57:47 +0800 2020)               
 snapshot_index                      snapshot_index (Thu Nov 12 17:57:59 +0800 2020)                   
 snapshot_meta                       snapshot_meta (Thu Nov 12 17:57:53 +0800 2020)                 
3 row(s) in 0.0370 seconds
=> ["snapshot_data", "snapshot_index", "snapshot_meta"]

4.将数据备份到hdfs

将snapshot导出到hdfs
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot_data -copy-to hdfs://hdp23.bigdata.zll.360es.cn:9000/snapshot/

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot_meta -copy-to hdfs://hdp23.bigdata.zll.360es.cn:9000/snapshot/

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot_index -copy-to hdfs://hdp23.bigdata.zll.360es.cn:9000/snapshot/

5.破坏数据

破坏数据
./hadoop fs -rm -f -r /home/hbase/data/default/snapshot_data
./hadoop fs -rm -f -r /home/hbase/data/default/snapshot_index
./hadoop fs -rm -f -r /home/hbase/data/default/snapshot_meta

6.删除现有快照

delete_snapshot 'snapshot_data'
delete_snapshot "snapshot_meta"
delete_snapshot "snapshot_index"

7.curl命令测试数据是否还存在

这时候用curl命令请求bucket,没有任何数据返回:
curl localhost:50840/snapshot/hello -X GET  -H 'Username: work' -H 'Password:123456'
curl: (52) Empty reply from server

二、恢复数据

1.如果有/home/hbase/.hbase-snapshot目录,删掉

./hadoop fs -rm -f -r /home/hbase/.hbase-snapshot

注意:删掉之后,hbase的快照就不存在了

2.将备份的快照拷贝到/home/hbase/下

./hadoop fs -cp /snapshot/.hbase-snapshot/ /home/hbase/

3.此时文件夹的状态

此时文件夹里多了快照的文件夹
[work@hdp23 bin]$ ./hadoop fs -ls /home/hbase/                           
Found 8 items
drwxr-xr-x   - work  supergroup          0 2020-11-12 18:15 /home/hbase/.hbase-snapshot
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:43 /home/hbase/.tmp
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:51 /home/hbase/MasterProcWALs
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:43 /home/hbase/WALs
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:43 /home/hbase/data
-rw-r--r--   3 hbase supergroup         42 2020-11-12 17:43 /home/hbase/hbase.id
-rw-r--r--   3 hbase supergroup          7 2020-11-12 17:43 /home/hbase/hbase.version
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:43 /home/hbase/oldWALs

4.将真实数据拷贝到/home/hbase/archive/data/default/

./hadoop fs -cp /snapshot/archive /home/hbase/

这样hbase的根目录多了archive目录,如下:
[work@hdp23 bin]$ ./hadoop fs -ls /home/hbase/                  
Found 9 items
drwxr-xr-x   - work  supergroup          0 2020-11-12 18:15 /home/hbase/.hbase-snapshot
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:43 /home/hbase/.tmp
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:51 /home/hbase/MasterProcWALs
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:43 /home/hbase/WALs
drwxr-xr-x   - work  supergroup          0 2020-11-12 18:19 /home/hbase/archive
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:43 /home/hbase/data
-rw-r--r--   3 hbase supergroup         42 2020-11-12 17:43 /home/hbase/hbase.id
-rw-r--r--   3 hbase supergroup          7 2020-11-12 17:43 /home/hbase/hbase.version
drwxr-xr-x   - hbase supergroup          0 2020-11-12 17:43 /home/hbase/oldWALs

5.此时快照恢复了

现在快照也恢复了:
hbase(main):010:0*   list_snapshots
SNAPSHOT                             TABLE + CREATION TIME   
 snapshot_data                       snapshot_data (Thu Nov 12 17:57:47 +0800 2020)   
 snapshot_index                      snapshot_index (Thu Nov 12 17:57:59 +0800 2020)   
 snapshot_meta                       snapshot_meta (Thu Nov 12 17:57:53 +0800 2020)     
3 row(s) in 0.0210 seconds
=> ["snapshot_data", "snapshot_index", "snapshot_meta"]

6.如果被删除的表存在,删除表

disable 'table_name'
drop 'table_name'
这时候,hbase数据全部消失

7.通过快照恢复数据

restore_snapshot 'snapshot_data'
restore_snapshot 'snapshot_meta'
restore_snapshot 'snapshot_index'

如果出现如下报错:
ERROR: org.apache.hadoop.hbase.snapshot.RestoreSnapshotException: clone snapshot={ ss=snapshot_data table=snapshot_data type=FLUSH } failed because A clone should not have regions to restore
用如下命令解决:./hadoop fs -rmr /home/hbase/.tmp/*

如果出现权限错误,用如下命令:
./hadoop fs -chown -R  hbase:supergroup /home/hbase/

8.这时候再去用curl命令请求bucket,有数据了

[work@hdp23 bin]$ curl localhost:50840/snapshot/hello -X GET  -H 'Username: work' -H 'Password:123456'
Hello World

三、参考文档

https://www.cnblogs.com/foxmailed/p/3914117.html