• 首页 首页 icon
  • 工具库 工具库 icon
    • IP查询 IP查询 icon
  • 内容库 内容库 icon
    • 快讯库 快讯库 icon
    • 精品库 精品库 icon
    • 问答库 问答库 icon
  • 更多 更多 icon
    • 服务条款 服务条款 icon

HiveSQL一天小统计当前时间点状态情况辅助变量+累计变换思路

武飞扬头像
莫叫石榴姐
帮助1

0 需求描述

学新通

  1 问题分析

目标:需要求出每个时间段,有客人在住的房间数量。

如果只考虑一人一房,我们可以借助于同时在线人数统计的思路,入住时间加辅助标记记为1,离店时间加辅助标记记为-1,并按照时间进行顺序排序,求当前累计值,具体SQL如下:

类似于公交车上车下车场景,统计公交车某一时刻车上人数,上车加1,下车减1,最终某一时刻人数通过加1减1这种辅助标记来进行累加,累加反应的是某一时间节点处,截止当前状态情况。

  1.  
    with data as
  2.  
    (
  3.  
    select 7 user_id, 2004 room_num, '2021-03-05' in_time, '2021-03-07' out_time
  4.  
    union all
  5.  
    select 23 user_id, 2010 room_num, '2021-03-05' in_time, '2021-03-06' out_time
  6.  
    union all
  7.  
    select 7 user_id, 1003 room_num, '2021-03-07' in_time, '2021-03-08' out_time
  8.  
    union all
  9.  
    select 8 user_id, 2014 room_num, '2021-03-07' in_time, '2021-03-08' out_time
  10.  
    union all
  11.  
    select 14 user_id, 3001 room_num, '2021-03-07' in_time, '2021-03-10' out_time
  12.  
    union all
  13.  
    select 18 user_id, 3002 room_num, '2021-03-08' in_time, '2021-03-10' out_time
  14.  
    union all
  15.  
    select 23 user_id, 3020 room_num, '2021-03-08' in_time, '2021-03-09' out_time
  16.  
    union all
  17.  
    select 25 user_id, 2006 room_num, '2021-03-09' in_time, '2021-03-12' out_time
  18.  
    )
  19.  
     
  20.  
    select *
  21.  
    from (select time as start_time, lead(time) over (order by time) as end_time, acc_cnt
  22.  
    from (select time, sum(num) over (order by time) as acc_cnt
  23.  
    from (
  24.  
    select in_time as time, 1 as num
  25.  
    from data
  26.  
    union all
  27.  
    select out_time as time, -1 as num
  28.  
    from data
  29.  
    ) t
  30.  
    ) t
  31.  
    group by time, acc_cnt
  32.  
    ) t
  33.  
    where end_time is not null

 学新通

但是这里面有个问题,实际场景是一人一房间吗?如果有多个人共住一房间,今天退了一个人,明天又退了一个人,后天的时候才退完,虽然在退,但房间还是有人住的,这种情况,是不是也算?如果考虑这种情况则需要对累加的状态进行调整,此时需要考虑每个房间中截止当前时间的人数情况。

第一步:先求出每个房间当前时间的人数累计值,作为状态判断辅助条件

增加一行数据:

  1.  
    union all
  2.  
    select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time

具体SQL如下:

  1.  
    with data as
  2.  
    (
  3.  
    select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
  4.  
    union all
  5.  
    select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
  6.  
    union all
  7.  
    select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
  8.  
    union all
  9.  
    select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
  10.  
    union all
  11.  
    select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
  12.  
    union all
  13.  
    select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
  14.  
    union all
  15.  
    select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
  16.  
    union all
  17.  
    select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
  18.  
    union all
  19.  
    select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
  20.  
    )
  21.  
    select time
  22.  
    , room_num
  23.  
    , sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
  24.  
    from (
  25.  
    select in_time as time, room_num, count(user_id) user_cnt
  26.  
    from data
  27.  
    group by in_time, room_num
  28.  
    union all
  29.  
    select out_time as time, room_num, -1 * count(user_id) user_cnt
  30.  
    from data
  31.  
    group by out_time, room_num
  32.  
    ) t

学新通 

第二步:基于累计的每个房间人数的状态变量进行判断:如果房间有人就标记1,没有人时候就标记为-1。标记思路实现:

方法1:case when 判断当user_cnt > 0时标记1,否则标记-1

case when user_cnt > 0 then 1 else -1 end num1

方法2:判断每个房间截止当前人数的最小值,如果大于0 标记为1,否则标记为-1

case when min(user_cnt) over (partition by room_num order by time) > 0 then 1 else -1 end

具体SQL如下:

  1.  
    with data as
  2.  
    (
  3.  
    select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
  4.  
    union all
  5.  
    select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
  6.  
    union all
  7.  
    select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
  8.  
    union all
  9.  
    select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
  10.  
    union all
  11.  
    select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
  12.  
    union all
  13.  
    select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
  14.  
    union all
  15.  
    select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
  16.  
    union all
  17.  
    select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
  18.  
    union all
  19.  
    select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
  20.  
    )
  21.  
     
  22.  
    select time
  23.  
    , room_num
  24.  
    , user_cnt
  25.  
    , case when user_cnt > 0 then 1 else -1 end num1
  26.  
    , case when min(user_cnt) over (partition by room_num order by time) > 0 then 1 else -1 end num2
  27.  
    from (select time
  28.  
    , room_num
  29.  
    , sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
  30.  
    from (select in_time as time, room_num, count(user_id) user_cnt
  31.  
    from data
  32.  
    group by in_time, room_num
  33.  
    union all
  34.  
    select out_time as time, room_num, -1 * count(user_id) user_cnt
  35.  
    from data
  36.  
    group by out_time, room_num
  37.  
    ) t
  38.  
    ) t

学新通

第三步:

基于上述结果,累计num值(其实本质就是通过构造辅助条件对辅助标记值进行变换,累加值变了),反应截止当前时间点 有人入住的房间数量,最终SQL如下:

  1.  
    with data as
  2.  
    (
  3.  
    select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
  4.  
    union all
  5.  
    select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
  6.  
    union all
  7.  
    select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
  8.  
    union all
  9.  
    select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
  10.  
    union all
  11.  
    select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
  12.  
    union all
  13.  
    select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
  14.  
    union all
  15.  
    select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
  16.  
    union all
  17.  
    select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
  18.  
    -- union all
  19.  
    -- select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
  20.  
    )
  21.  
     
  22.  
    select *
  23.  
    from
  24.  
    (select time as start_time, lead(time) over (order by time) as end_time, acc_cnt
  25.  
    from (select time
  26.  
    , sum(case when user_cnt > 0 then 1 else -1 end) over (order by time) acc_cnt
  27.  
    from (select time
  28.  
    , room_num
  29.  
    , sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
  30.  
    from (select in_time as time, room_num, count(user_id) user_cnt
  31.  
    from data
  32.  
    group by in_time, room_num
  33.  
    union all
  34.  
    select out_time as time, room_num, -1 * count(user_id) user_cnt
  35.  
    from data
  36.  
    group by out_time, room_num
  37.  
    ) t
  38.  
    ) t
  39.  
    ) t
  40.  
    group by time, acc_cnt
  41.  
    ) t
  42.  
    where end_time is not null

最终结果如下:

学新通

 新增测试数据结果如下:

  1.  
    with data as
  2.  
    (
  3.  
    select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
  4.  
    union all
  5.  
    select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
  6.  
    union all
  7.  
    select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
  8.  
    union all
  9.  
    select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
  10.  
    union all
  11.  
    select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
  12.  
    union all
  13.  
    select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
  14.  
    union all
  15.  
    select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
  16.  
    union all
  17.  
    select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
  18.  
    union all
  19.  
    select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
  20.  
    )
  21.  
     
  22.  
    select *
  23.  
    from
  24.  
    (select time as start_time, lead(time) over (order by time) as end_time, acc_cnt
  25.  
    from (select time, sum(case when user_cnt > 0 then 1 else -1 end) over (order by time) acc_cnt
  26.  
    from (select time
  27.  
    , room_num
  28.  
    , sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
  29.  
    from (select in_time as time, room_num, count(user_id) user_cnt
  30.  
    from data
  31.  
    group by in_time, room_num
  32.  
    union all
  33.  
    select out_time as time, room_num, -1 * count(user_id) user_cnt
  34.  
    from data
  35.  
    group by out_time, room_num
  36.  
    ) t
  37.  
    ) t
  38.  
    ) t
  39.  
    group by time, acc_cnt
  40.  
    ) t
  41.  
    where end_time is not null

学新通

 2 小结

本文总结了一种当前时间点状态统计的思路和方法,对于此类问题主要采用构造辅助计数变量及累加变换思路进行求解。常见的场景有:直播同时在线人数、服务器实时并发数、公家车当前时间段人数、某个仓库的货物积压数量,某一段时间内的同时处于服务过程中的最大订单量等

这篇好文章是转载于:学新通技术网

  • 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
  • 本站站名: 学新通技术网
  • 本文地址: /boutique/detail/tanhibfgee
系列文章
更多 icon
同类精品
更多 icon
继续加载