ods层知识点总结 – 源码巴士

日志表 ods_log

1、原始数据
{
“common”: { },
“start”: { },
“err”: { },
“ts”: { }
}

2、如果要创建的表已经存在，先删除该表

drop table if exists ods_log;

3、创建一张外部表

create external table ods_log(
	line string
)

（1）按照时间进行分区

partitioned by(
	`dt` string
)

（2）LZO压缩格式处理
到 Hive 官网找到:
Documentation -> Language Manual -> File Formats -> LZO Compression -> Table Definition

stored as
	inputformat "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
	outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

（3）数据存储位置

location '/warehouse/gmall/ods/ods_log';

4、向 Hive 表中插入数据

load data
inpath '/origin_data/gmall/log/topic_log/2020-06-14"
overwrite
into table
ods_log
partition(dt='2020-06-14');

overwrite: 采用覆盖的方式插入数据
5、最终现象
原始数据 /origin_data/gmall/log/topic_log/2020-06-14 对应分区的数据没有了，原来对应的 lzo 文件加载到 /warehouse/gmall/ods/ods_log/dt=2020-06-14/ 路径下。
在这里插入图片描述

业务数据 ods_db

mysql_to_hdfs

#!/bin/bash

#2. 定义变量
sqoop=/opt/module/sqoop/bin/sqoop

#3. 处理业务逻辑
#$1: 表名
#$2: SQL
import_data() {
$sqoop import
–connect jdbc:mysql://hadoop102:3306/gmall
–username root
–password 111111
–target-dir /origin_data/gmall/db/ $1 /$ do_date
–delete-target-dir
–query “$2 and $CONDITIONS”
–fields-terminated-by “\t”
–compress
–compress-codec lzop
–null-string ‘\N’
–null-non-string ‘\N’

#创建索引
hadoop jar
/opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar
com.hadoop.compression.lzo.DistributedLzoIndexer
/origin_data/gmall/db/order_info/2020-06-14
}

import_order_info() {
import_data order_info “select * from order_info where 1=1 and $CONDITIONS”
}

import_order_info

hdfs_to_hive

#1、MySQL: 备份 / 导出表结构
#2、在Hive中建立表格（创建分区表）
#3、写Sqoop脚本

#!/bin/bash

APP=gmall
do_date=2020-06-14
table=order_info
hive=/opt/module/hive/bin/hive

sql="
load data inpath ‘/origin_data/ $A P P / d b /$ {table}/${do_date}’ OVERWRITE into table ${APP}.$ {table} partition(dt=’${do_date}’);
"

$h i v e - e "$ sql"

原文链接：https://blog.csdn.net/qq_20519927/article/details/116713738