java hql 计数_使用hql-统计连续登陆的三天及以上的用户

@

这个问题可以扩展到很多相似的问题:连续几个月充值会员、连续天数有商品卖出、连续打车、连续逾期……

数据提供

用户ID、登入日期

user01,2018-02-28

user01,2018-03-01

user01,2018-03-02

user01,2018-03-04

user01,2018-03-05

user01,2018-03-06

user01,2018-03-07

user02,2018-03-01

user02,2018-03-02

user02,2018-03-03

user02,2018-03-06

输出字段

+---------+--------+-------------+-------------+--+

| uid | times | start_date | end_date |

+---------+--------+-------------+-------------+--+

解法一

先对每个用户的登录日期排序,然后拿第n行的日期,减第n-2行的日期,如果等于2,就说明连续三天登录了。

解法二

开窗,窗囗内部排序然后做差

rownumber() oover

建表

create table wedw_dw.t_login_info(

user_id string COMMENT '用户ID'

,login_date date COMMENT '登录日期'

)

row format delimited fields terminated by ',';

导数据

hdfs dfs -put /test/login.txt /data/hive/test/wedw/dw/t_login_info/

验证数据

select * from wedw_dw.t_login_info;

+----------+-------------+--+

| user_id | login_date |

+----------+-------------+--+

| user01 | 2018-02-28 |

| user01 | 2018-03-01 |

| user01 | 2018-03-02 |

| user01 | 2018-03-04 |

| user01 | 2018-03-05 |

| user01 | 2018-03-06 |

| user01 | 2018-03-07 |

| user02 | 2018-03-01 |

| user02 | 2018-03-02 |

| user02 | 2018-03-03 |

| user02 | 2018-03-06 |

+----------+-------------+--+

解决方案-使用解法二

select

t2.user_id as user_id,

count(1) as times,

min(t2.login_date) as start_date,

max(t2.login_date) as end_date

from

(

select

t1.user_id,

t1.login_date,

date_sub(t1.login_date,rn) as date_diff

from

(

select

user_id,

login_date,

row_number() over(partition by user_id order by login_date asc) as rn

from

wedw_dw.t_login_info

) t1

) t2

group by

t2.user_id, t2.date_diff

having times >= 3;

结果

+----------+--------+-------------+-------------+--+

| user_id | times | start_date | end_date |

+----------+--------+-------------+-------------+--+

| user01 | 3 | 2018-02-28 | 2018-03-02 |

| user01 | 4 | 2018-03-04 | 2018-03-07 |

| user02 | 3 | 2018-03-01 | 2018-03-03 |

+----------+--------+-------------+-------------+--+

思路

先把数据按照用户id分组,根据登录日期排序

select

user_id

,login_date

,row_number() over(partition by user_id order by login_date asc) as rn

from

wedw_dw.t_login_info

+----------+-------------+-----+--+

| user_id | login_date | rn |

+----------+-------------+-----+--+

| user01 | 2018-02-28 | 1 |

| user01 | 2018-03-01 | 2 |

| user01 | 2018-03-02 | 3 |

| user01 | 2018-03-04 | 4 |

| user01 | 2018-03-05 | 5 |

| user01 | 2018-03-06 | 6 |

| user01 | 2018-03-07 | 7 |

| user02 | 2018-03-01 | 1 |

| user02 | 2018-03-02 | 2 |

| user02 | 2018-03-03 | 3 |

| user02 | 2018-03-06 | 4 |

+----------+-------------+-----+--+

用登录日期减去排序数字rn,得到的差值日期如果是相等的,则说明这两天肯定是连续的

select

t1.user_id

,t1.login_date

,date_sub(t1.login_date,rn) as date_diff

from

(

select

user_id

,login_date

,row_number() over(partition by user_id order by login_date asc) as rn

from

wedw_dw.t_login_info

) t1

;

+----------+-------------+-------------+--+

| user_id | login_date | date_diff |

+----------+-------------+-------------+--+

| user01 | 2018-02-28 | 2018-02-27 |

| user01 | 2018-03-01 | 2018-02-27 |

| user01 | 2018-03-02 | 2018-02-27 |

| user01 | 2018-03-04 | 2018-02-28 |

| user01 | 2018-03-05 | 2018-02-28 |

| user01 | 2018-03-06 | 2018-02-28 |

| user01 | 2018-03-07 | 2018-02-28 |

| user02 | 2018-03-01 | 2018-02-28 |

| user02 | 2018-03-02 | 2018-02-28 |

| user02 | 2018-03-03 | 2018-02-28 |

| user02 | 2018-03-06 | 2018-03-02 |

+----------+-------------+-------------+--+

根据user_id和日期差date_diff 分组,最小登录日期即为此次连续登录的开始日期start_date,最大登录日期即为结束日期end_date,登录次数即为分组后的count(1)

select

t2.user_id as user_id

,count(1) as times

,min(t2.login_date) as start_date

,max(t2.login_date) as end_date

from

(

select

t1.user_id

,t1.login_date

,date_sub(t1.login_date,rn) as date_diff

from

(

select

user_id

,login_date

,row_number() over(partition by user_id order by login_date asc) as rn

from

wedw_dw.t_login_info

) t1

) t2

group by

t2.user_id

,t2.date_diff

having times >= 3

;

+----------+--------+-------------+-------------+--+

| user_id | times | start_date | end_date |

+----------+--------+-------------+-------------+--+

| user01 | 3 | 2018-02-28 | 2018-03-02 |

| user01 | 4 | 2018-03-04 | 2018-03-07 |

| user02 | 3 | 2018-03-01 | 2018-03-03 |

+----------+--------+-------------+-------------+--+

本文同步分享在 博客“Sun's Blog-专注于阅读”(CNBlog)。

如有侵权,请联系 support@oschina.cn 删除。

本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。


版权声明:本文为weixin_29952235原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。