本文主要是介绍每日HiveSQL_求解运动员最大连胜的次数_15,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
1.现需要从运动员比赛结果表中统计每个运动员最大连胜的次数
需求结果:
2.所用到的表和数据
--表创建
CREATE TABLE athlete_results
(athlete_id INT,match_time TIMESTAMP,result VARCHAR(10) -- 'win', 'lose', 'draw'
);--数据装载
INSERT INTO athlete_results
VALUES (1, '2023-11-01', 'win'),(1, '2023-11-02', 'win'),(1, '2023-11-03', 'lose'),(1, '2023-11-04', 'win'),(1, '2023-11-05', 'draw'),(1, '2023-11-06', 'win'),(1, '2023-11-07', 'win'),(1, '2023-11-08', 'win'),(2, '2023-11-01', 'win'),(2, '2023-11-02', 'lose'),(2, '2023-11-03', 'draw'),(2, '2023-11-04', 'win'),(2, '2023-11-05', 'win'),(2, '2023-11-06', 'win');
3.答案
解析:由结果可知,此结果是对每个运动员进行分组得到的每个运动员的最大连胜数,此题关键在于如何根据特定的分组条件进行分组,得到每个可以代表运动员和比赛结果的分组字段。
思路一:
首先使用row_number()开窗根据运动员id进行分区,日期进行排序得到分组字段rn1,row_number()开窗根据运动员id和比赛结果进行分区,日期进行排序得到分组字段rn2,如下图rn1 - rn2可以唯一表述每个运动员在一次连续胜利、失败、平局的分组字段。
3.1 使用row_number()开两个窗,一个根据运动员id分区,日期升序排序,得到group1,另一个是根据运动员id和比赛结果进行分区,日期升序排序,得到group2
select athlete_id,to_date(match_time) match_time,result,row_number() over (partition by athlete_id order by to_date(match_time)) group1,row_number() over (partition by athlete_id, result order by to_date(match_time)) group2
from athlete_results;
运行结果:
3.2筛选出比赛结果为”win“,然后再根据运动员id和win_group(group1 - group2)分组,count(*)得到运动员每个分组阶段的连胜次数
select athlete_id,group1 - group2 win_group,count(*) win_count
from (select athlete_id,to_date(match_time) match_time,result,row_number() over (partition by athlete_id order by to_date(match_time)) group1,row_number() over (partition by athlete_id, result order by to_date(match_time)) group2from athlete_results) t
where result = 'win'
group by athlete_id, group1 - group2;
运行结果:
3.3最后根据以上运行结果根据运动员id进行分组得到最终结果
select athlete_id,max(win_count) max_win_count
from (select athlete_id,group1 - group2 win_group,count(*) win_countfrom (select athlete_id,to_date(match_time) match_time,result,row_number() over (partition by athlete_id order by to_date(match_time)) group1,row_number() over (partition by athlete_id, result order by to_date(match_time)) group2from athlete_results) twhere result = 'win'group by athlete_id, group1 - group2) t
group by athlete_id;
运行结果:
思路二:
思考打破连续胜利的条件,仔细思考可知,一个运动员每次失败或者平局都会开启一个新的分组,怎么描述这个分组呢?很容易可以想到sum(if())结构,如下图
3.1使用sum(if())开窗得到分组字段win_group
select athlete_id,to_date(match_time) match_day,result,sum(if(result = 'lose' or result = 'draw', 1, 0))over (partition by athlete_id order by to_date(match_time)) win_group
from athlete_results;
运行结果:
3.2根据以上结果筛选出result = ‘win’的结果,之后进行根据运动员id和win_group分组,count(*)得到连胜的次数。
select athlete_id,count(*) win_count
from (select athlete_id,result,sum(if(result = 'lose' or result = 'draw', 1, 0))over (partition by athlete_id order by match_time) win_groupfrom athlete_results) t
where result = 'win'
group by athlete_id, win_group;
运行结果:
3.3根据运动员id进行分组,得到最终结果
select athlete_id,max(win_count) max_win_count
from (select athlete_id,count(*) win_countfrom (select athlete_id,result,sum(if(result = 'lose' or result = 'draw', 1, 0))over (partition by athlete_id order by match_time) win_groupfrom athlete_results) twhere result = 'win'group by athlete_id, win_group) t
group by athlete_id;
运行结果:
这篇关于每日HiveSQL_求解运动员最大连胜的次数_15的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!