HiveSQL一天小统计当前时间点状态情况辅助变量+累计变换思路
0 需求描述
1 问题分析
目标:需要求出每个时间段,有客人在住的房间数量。
如果只考虑一人一房,我们可以借助于同时在线人数统计的思路,入住时间加辅助标记记为1,离店时间加辅助标记记为-1,并按照时间进行顺序排序,求当前累计值,具体SQL如下:
类似于公交车上车下车场景,统计公交车某一时刻车上人数,上车加1,下车减1,最终某一时刻人数通过加1减1这种辅助标记来进行累加,累加反应的是某一时间节点处,截止当前状态情况。
-
with data as
-
(
-
select 7 user_id, 2004 room_num, '2021-03-05' in_time, '2021-03-07' out_time
-
union all
-
select 23 user_id, 2010 room_num, '2021-03-05' in_time, '2021-03-06' out_time
-
union all
-
select 7 user_id, 1003 room_num, '2021-03-07' in_time, '2021-03-08' out_time
-
union all
-
select 8 user_id, 2014 room_num, '2021-03-07' in_time, '2021-03-08' out_time
-
union all
-
select 14 user_id, 3001 room_num, '2021-03-07' in_time, '2021-03-10' out_time
-
union all
-
select 18 user_id, 3002 room_num, '2021-03-08' in_time, '2021-03-10' out_time
-
union all
-
select 23 user_id, 3020 room_num, '2021-03-08' in_time, '2021-03-09' out_time
-
union all
-
select 25 user_id, 2006 room_num, '2021-03-09' in_time, '2021-03-12' out_time
-
)
-
-
select *
-
from (select time as start_time, lead(time) over (order by time) as end_time, acc_cnt
-
from (select time, sum(num) over (order by time) as acc_cnt
-
from (
-
select in_time as time, 1 as num
-
from data
-
union all
-
select out_time as time, -1 as num
-
from data
-
) t
-
) t
-
group by time, acc_cnt
-
) t
-
where end_time is not null
但是这里面有个问题,实际场景是一人一房间吗?如果有多个人共住一房间,今天退了一个人,明天又退了一个人,后天的时候才退完,虽然在退,但房间还是有人住的,这种情况,是不是也算?如果考虑这种情况则需要对累加的状态进行调整,此时需要考虑每个房间中截止当前时间的人数情况。
第一步:先求出每个房间当前时间的人数累计值,作为状态判断辅助条件
增加一行数据:
-
union all
-
select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
具体SQL如下:
-
with data as
-
(
-
select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
-
union all
-
select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
-
union all
-
select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
-
union all
-
select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
-
union all
-
select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
-
union all
-
select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
-
union all
-
select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
-
union all
-
select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
-
union all
-
select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
-
)
-
select time
-
, room_num
-
, sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
-
from (
-
select in_time as time, room_num, count(user_id) user_cnt
-
from data
-
group by in_time, room_num
-
union all
-
select out_time as time, room_num, -1 * count(user_id) user_cnt
-
from data
-
group by out_time, room_num
-
) t
第二步:基于累计的每个房间人数的状态变量进行判断:如果房间有人就标记1,没有人时候就标记为-1。标记思路实现:
方法1:case when 判断当user_cnt > 0时标记1,否则标记-1
case when user_cnt > 0 then 1 else -1 end num1
方法2:判断每个房间截止当前人数的最小值,如果大于0 标记为1,否则标记为-1
case when min(user_cnt) over (partition by room_num order by time) > 0 then 1 else -1 end
具体SQL如下:
-
with data as
-
(
-
select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
-
union all
-
select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
-
union all
-
select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
-
union all
-
select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
-
union all
-
select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
-
union all
-
select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
-
union all
-
select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
-
union all
-
select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
-
union all
-
select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
-
)
-
-
select time
-
, room_num
-
, user_cnt
-
, case when user_cnt > 0 then 1 else -1 end num1
-
, case when min(user_cnt) over (partition by room_num order by time) > 0 then 1 else -1 end num2
-
from (select time
-
, room_num
-
, sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
-
from (select in_time as time, room_num, count(user_id) user_cnt
-
from data
-
group by in_time, room_num
-
union all
-
select out_time as time, room_num, -1 * count(user_id) user_cnt
-
from data
-
group by out_time, room_num
-
) t
-
) t
第三步:
基于上述结果,累计num值(其实本质就是通过构造辅助条件对辅助标记值进行变换,累加值变了),反应截止当前时间点 有人入住的房间数量,最终SQL如下:
-
with data as
-
(
-
select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
-
union all
-
select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
-
union all
-
select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
-
union all
-
select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
-
union all
-
select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
-
union all
-
select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
-
union all
-
select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
-
union all
-
select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
-
-- union all
-
-- select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
-
)
-
-
select *
-
from
-
(select time as start_time, lead(time) over (order by time) as end_time, acc_cnt
-
from (select time
-
, sum(case when user_cnt > 0 then 1 else -1 end) over (order by time) acc_cnt
-
from (select time
-
, room_num
-
, sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
-
from (select in_time as time, room_num, count(user_id) user_cnt
-
from data
-
group by in_time, room_num
-
union all
-
select out_time as time, room_num, -1 * count(user_id) user_cnt
-
from data
-
group by out_time, room_num
-
) t
-
) t
-
) t
-
group by time, acc_cnt
-
) t
-
where end_time is not null
最终结果如下:
新增测试数据结果如下:
-
with data as
-
(
-
select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
-
union all
-
select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
-
union all
-
select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
-
union all
-
select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
-
union all
-
select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
-
union all
-
select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
-
union all
-
select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
-
union all
-
select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
-
union all
-
select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
-
)
-
-
select *
-
from
-
(select time as start_time, lead(time) over (order by time) as end_time, acc_cnt
-
from (select time, sum(case when user_cnt > 0 then 1 else -1 end) over (order by time) acc_cnt
-
from (select time
-
, room_num
-
, sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
-
from (select in_time as time, room_num, count(user_id) user_cnt
-
from data
-
group by in_time, room_num
-
union all
-
select out_time as time, room_num, -1 * count(user_id) user_cnt
-
from data
-
group by out_time, room_num
-
) t
-
) t
-
) t
-
group by time, acc_cnt
-
) t
-
where end_time is not null
2 小结
本文总结了一种当前时间点状态统计的思路和方法,对于此类问题主要采用构造辅助计数变量及累加变换思路进行求解。常见的场景有:直播同时在线人数、服务器实时并发数、公家车当前时间段人数、某个仓库的货物积压数量,某一段时间内的同时处于服务过程中的最大订单量等
这篇好文章是转载于:学新通技术网
- 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
- 本站站名: 学新通技术网
- 本文地址: /boutique/detail/tanhibfgee
-
photoshop保存的图片太大微信发不了怎么办
PHP中文网 06-15 -
word里面弄一个表格后上面的标题会跑到下面怎么办
PHP中文网 06-20 -
photoshop扩展功能面板显示灰色怎么办
PHP中文网 06-14 -
《学习通》视频自动暂停处理方法
HelloWorld317 07-05 -
TikTok加速器哪个好免费的TK加速器推荐
TK小达人 10-01 -
Android 11 保存文件到外部存储,并分享文件
Luke 10-12 -
微信公众号没有声音提示怎么办
PHP中文网 03-31 -
excel下划线不显示怎么办
PHP中文网 06-23 -
微信运动停用后别人还能看到步数吗
PHP中文网 07-22 -
excel打印预览压线压字怎么办
PHP中文网 06-22