本文主要是介绍Hive 不支持 intersect 的解决办法,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
问题
有一个SQL语句要在hive里执行但是hive不支持intersect,所以要使用其他方法代替
思路
intersect就是取交集,可以使用inner join进行连接然后取其中一列并去重
原SQL
select s_store_name,sum(ss_net_profit)from store_sales,date_dim,store,(select ca_zipfrom (SELECT substr(ca_zip,1,5) ca_zipFROM customer_addressWHERE substr(ca_zip,1,5) IN ('27385','58049','58200','16808','21360','32961','18586','79307','15492')intersectselect ca_zipfrom (SELECT substr(ca_zip,1,5) ca_zip,count(*) cntFROM customer_address, customerWHERE ca_address_sk = c_current_addr_sk andc_preferred_cust_flag='Y'group by ca_ziphaving count(*) > 10)A1)A2) V1where ss_store_sk = s_store_skand ss_sold_date_sk = d_date_skand d_qoy = 1 and d_year = 2002and (substr(s_zip,1,2) = substr(V1.ca_zip,1,2))group by s_store_nameorder by s_store_namelimit 100;
更改后的HQL
代码没有格式化,如要测试可以先格式化一下
SELECT s_store_name ,sum(ss_net_profit)
FROM store_sales ,date_dim ,store,( (SELECT DISTINCT A.ca_zip
FROM ((SELECT substr(ca_zip,1,5) ca_zipFROM customer_addressWHERE substr(ca_zip,1,5) IN ('27385','58049','58200','16808','21360','32961','18586','79307','15492'))AINNER JOIN(SELECT substr(ca_zip,1,5) ca_zip,count(*) cntFROM customer_address,customerWHERE ca_address_sk = c_current_addr_skAND c_preferred_cust_flag='Y'GROUP BY ca_zipHAVING count(*) > 10) B on(A.ca_zip=B.ca_zip) ))C )
WHERE ss_store_sk = s_store_skAND ss_sold_date_sk = d_date_skAND d_qoy = 1AND d_year = 2002AND (substr(s_zip,1,2) = substr(C.ca_zip,1,2))
GROUP BY s_store_name
ORDER BY s_store_name LIMIT 100;
欢迎点赞
这篇关于Hive 不支持 intersect 的解决办法的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!