3-KYLIN订单例程

2024-09-06 06:38
文章标签 例程 kylin 订单

本文主要是介绍3-KYLIN订单例程,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

转:http://blog.itpub.net/30089851/viewspace-2122586/

一.Hive订单数据仓库构建

1. 创建事实表并插入数据

DROP TABLE IF EXISTS default.fact_order ;
create table default.fact_order (
  time_key string,
  product_key string,
  salesperson_key string,
  custom_key string,
  quantity_ordered bigint,
  order_dollars bigint,
  cost_dollars bigint
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

load data local inpath '/root/kylinsample/fact_order.txt' overwrite into table default.fact_order;

##load data local inpath '/root/kylinsample/fact_order.txt'  into table default.fact_order;


fact_order.txt

2016-05-01,pd001,sp001,ct001,100,2000,1000
2016-05-01,pd001,sp002,ct002,100,2000,1000
2016-05-01,pd001,sp003,ct002,100,2000,1000
2016-05-01,pd002,sp002,ct002,100,2000,1000
2016-05-01,pd003,sp003,ct001,100,2000,1000
2016-05-01,pd001,sp003,ct001,100,2000,1000
2016-05-01,pd001,sp002,ct001,100,2000,1000
2016-05-01,pd001,sp003,ct002,100,2000,1000
2016-05-01,pd002,sp001,ct001,100,2000,1000
2016-05-01,pd003,sp001,ct001,100,2000,1000
2016-05-01,pd004,sp001,ct001,50,1000,600
2016-05-02,pd001,sp001,ct001,50,1000,600
2016-05-02,pd001,sp002,ct002,100,2000,1000
2016-05-02,pd001,sp003,ct002,100,2000,1000
2016-05-02,pd002,sp001,ct001,50,1000,600
2016-05-02,pd003,sp001,ct001,50,1000,600
2016-05-02,pd004,sp001,ct001,50,1000,600
2016-05-03,pd001,sp001,ct001,50,1000,600
2016-05-03,pd001,sp002,ct002,100,2000,1000
2016-05-03,pd001,sp003,ct002,100,2000,1000
2016-05-04,pd002,sp001,ct001,700,14000,10000
2016-05-04,pd003,sp001,ct001,700,14000,10000
2016-05-04,pd004,sp001,ct001,100,2000,1000
2016-05-05,pd001,sp001,ct001,100,2000,1000
2016-05-05,pd001,sp002,ct002,700,14000,10000
2016-05-05,pd001,sp003,ct002,700,14000,10000
2016-05-05,pd002,sp001,ct001,100,2000,1000
2016-05-05,pd003,sp001,ct001,100,2000,1000
2016-05-05,pd004,sp001,ct001,100,2000,1000
2016-05-06,pd001,sp001,ct001,100,2000,1000
2016-05-06,pd001,sp002,ct002,100,2000,1000
2016-05-06,pd001,sp003,ct002,100,2000,1000
2016-05-07,pd002,sp001,ct001,100,2000,1000
2016-05-07,pd003,sp001,ct001,100,2000,1000
2016-05-07,pd004,sp001,ct001,50,1000,600
2016-05-07,pd002,sp001,ct001,100,2000,1000
2016-05-07,pd003,sp001,ct001,100,2000,1000
2016-05-07,pd004,sp001,ct001,50,1000,600
2016-05-08,pd001,sp001,ct001,50,1000,600
2016-05-08,pd001,sp002,ct002,100,2000,1000
2016-05-08,pd001,sp003,ct002,100,2000,1000
2016-05-08,pd001,sp001,ct001,50,1000,600
2016-05-08,pd001,sp002,ct002,100,2000,1000
2016-05-08,pd001,sp003,ct002,100,2000,1000
2016-05-08,pd001,sp001,ct001,50,1000,600
2016-05-08,pd001,sp002,ct002,100,2000,1000
2016-05-08,pd001,sp003,ct002,100,2000,1000
2016-05-09,pd002,sp001,ct001,50,1000,600
2016-05-09,pd003,sp001,ct001,50,1000,600
2016-05-09,pd004,sp001,ct001,50,1000,600
2016-05-09,pd001,sp001,ct001,50,1000,600
2016-05-09,pd002,sp001,ct001,50,1000,600
2016-05-09,pd003,sp001,ct001,50,1000,600
2016-05-09,pd004,sp001,ct001,50,1000,600
2016-05-09,pd001,sp001,ct001,50,1000,600
2016-05-09,pd001,sp002,ct002,100,2000,1000
2016-05-09,pd004,sp003,ct002,100,2000,1000
2016-05-09,pd002,sp001,ct001,700,14000,10000
2016-05-09,pd003,sp003,ct001,700,14000,10000
2016-05-09,pd004,sp003,ct001,100,2000,1000
2016-05-10,pd001,sp001,ct001,100,2000,1000
2016-05-10,pd001,sp002,ct002,700,14000,10000
2016-05-10,pd001,sp003,ct002,700,14000,10000
2016-05-10,pd002,sp001,ct001,100,2000,1000
2016-05-11,pd003,sp003,ct001,100,2000,1000
2016-05-11,pd004,sp001,ct001,100,2000,1000
2016-05-12,pd001,sp001,ct001,100,2000,1000
2016-05-12,pd004,sp002,ct002,100,2000,1000
2016-05-12,pd001,sp003,ct002,100,2000,1000
2016-05-12,pd001,sp001,ct001,100,2000,1000
2016-05-12,pd004,sp002,ct002,100,2000,1000
2016-05-12,pd001,sp003,ct002,100,2000,1000
2016-05-13,pd002,sp001,ct001,100,2000,1000
2016-05-13,pd003,sp001,ct001,100,2000,1000
2016-05-13,pd004,sp001,ct001,50,1000,600
2016-05-14,pd001,sp001,ct001,50,1000,600
2016-05-14,pd001,sp002,ct002,100,2000,1000
2016-05-14,pd001,sp003,ct002,100,2000,1000
2016-05-15,pd002,sp001,ct001,50,1000,600
2016-05-15,pd003,sp001,ct001,50,1000,600
2016-05-15,pd004,sp001,ct001,50,1000,600
2016-05-15,pd002,sp001,ct001,50,1000,600
2016-05-15,pd003,sp001,ct001,50,1000,600
2016-05-15,pd004,sp001,ct001,50,1000,600
2016-05-15,pd002,sp001,ct001,50,1000,600
2016-05-15,pd003,sp001,ct001,50,1000,600
2016-05-15,pd004,sp001,ct001,50,1000,600
2016-05-16,pd001,sp001,ct001,50,1000,600
2016-05-16,pd001,sp002,ct002,100,2000,1000
2016-05-16,pd001,sp003,ct002,100,2000,1000
2016-05-16,pd001,sp001,ct001,50,1000,600
2016-05-16,pd001,sp002,ct002,100,2000,1000
2016-05-16,pd001,sp003,ct002,100,2000,1000
2016-05-17,pd002,sp001,ct001,700,14000,10000
2016-05-17,pd003,sp001,ct001,700,14000,10000
2016-05-17,pd004,sp001,ct001,100,2000,1000
2016-05-17,pd002,sp001,ct001,700,14000,10000
2016-05-17,pd003,sp001,ct001,700,14000,10000
2016-05-17,pd004,sp001,ct001,100,2000,1000
2016-05-18,pd001,sp001,ct001,100,2000,1000
2016-05-18,pd003,sp002,ct001,700,14000,10000
2016-05-18,pd001,sp003,ct002,700,14000,10000
2016-05-19,pd002,sp001,ct001,100,2000,1000
2016-05-19,pd003,sp001,ct002,100,2000,1000
2016-05-20,pd001,sp001,ct001,100,2000,1000
2016-05-20,pd002,sp002,ct002,100,2000,1000
2016-05-20,pd003,sp003,ct001,100,2000,1000
2016-05-20,pd004,sp001,ct001,100,2000,1000
2016-05-20,pd001,sp002,ct002,100,2000,1000
2016-05-20,pd002,sp001,ct002,100,2000,1000

 
2. 创建天维度表dim_day

DROP TABLE IF EXISTS default.dim_day ;

create table default.dim_day (
  day_key string,
  full_day string,
  month_name string,
  quarter string,
  year string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE; 
load data local inpath '/root/kylinsample/dim_day.txt' overwrite into table default.dim_day;

 

dim_day.txt
  
2016-05-01,2016-05-01,201605,2016q2,2016
2016-05-02,2016-05-02,201605,2016q2,2016
2016-05-03,2016-05-03,201605,2016q2,2016
2016-05-04,2016-05-04,201605,2016q2,2016
2016-05-05,2016-05-05,201605,2016q2,2016
2016-05-06,2016-05-06,201605,2016q2,2016
2016-05-07,2016-05-07,201605,2016q2,2016
2016-05-08,2016-05-08,201605,2016q2,2016
2016-05-09,2016-05-09,201605,2016q2,2016
2016-05-10,2016-05-10,201605,2016q2,2016
2016-05-11,2016-05-11,201605,2016q2,2016
2016-05-12,2016-05-12,201605,2016q2,2016
2016-05-13,2016-05-13,201605,2016q2,2016
2016-05-14,2016-05-14,201605,2016q2,2016
2016-05-15,2016-05-15,201605,2016q2,2016
2016-05-16,2016-05-16,201605,2016q2,2016
2016-05-17,2016-05-17,201605,2016q2,2016
2016-05-18,2016-05-18,201605,2016q2,2016
2016-05-19,2016-05-19,201605,2016q2,2016
2016-05-20,2016-05-20,201605,2016q2,2016

  
3. 创建售卖员的维度表salesperson_dim
 
DROP TABLE IF EXISTS default.dim_salesperson ;
 
create table default.dim_salesperson (
  salesperson_key string,
  salesperson string,
  salesperson_id string,
  region string,
  region_code string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
  
load data local inpath '/root/kylinsample/dim_salesperson.txt' overwrite into table default.dim_salesperson;
  
dim_salesperson.txt
  
sp001,hongbin,sp001,beijing,10086
sp002,hongming,sp002,beijing,10086
sp003,hongmei,sp003,beijing,10086

 

4. 创建客户维度 custom_dim

 
DROP TABLE IF EXISTS default.dim_custom ;
  
create table default.dim_custom (
  custom_key string,
  custom_name string,
  custorm_id string,
  headquarter_states string,
  billing_address string,
  billing_city string,
  billing_state string,
  industry_name string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
 
load data local inpath '/root/kylinsample/dim_custom.txt' overwrite into table default.dim_custom;

dim_custom.txt
  
ct001,custom_john,ct001,beijing,zgx-beijing,beijing,beijing,internet                    
ct002,custom_herry,ct002,henan,shlinjie,shangdang,henan,internet     
 
 
5. 创建产品维度表并插入数据
 
DROP TABLE IF EXISTS default.dim_product ;                                              
                                                                                          
create table default.dim_product (                                                      
  product_key string,                                                                 
  product_name string,                                                                
  product_id string,                                                                  
  product_desc string,                                                                
  sku string,                                                                         
  brand string,                                                                       
  brand_code string,                                                                  
  brand_manager string,                                                               
  category string,                                                                    
category_code string                                                                
)                                                                                       
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','                                           
STORED AS TEXTFILE;                                                                     
                       
load data local inpath '/root/kylinsample/dim_product.txt' overwrite into table default.dim_product;      
dim_product.txt
  
pd001,Box-Large,pd001,Box-Large-des,large1.0,brand001,brandcode001,brandmanager001,Packing,cate001
pd002,Box-Medium,pd001,Box-Medium-des,medium1.0,brand001,brandcode001,brandmanager001,Packing,cate001
pd003,Box-small,pd001,Box-small-des,small1.0,brand001,brandcode001,brandmanager001,Packing,cate001
pd004,Evelope,pd001,Evelope_des,large3.0,brand001,brandcode001,brandmanager001,Pens,cate002

 
这样一个星型的结构表在hive中创建完毕, 实际上一个离线的数据仓库已经完成, 它包含一个主题, 即商品订单.


三.Kylin的Project创建与数据同步
  1.单击"Manage Project" 
  2.单击"New Project"
  3.输入"Project Name", WareHouse_01
  4.Submit


  1.选择WareHouse_01,选择"Data Source" tab页
  2.单击"Load Hive Table"
  3.输入需要同步的表
      "DEFAULT.FACT_ORDER,DEFAULT.DIM_DAY,DEFAULT.DIM_PRODUCT,DEFAULT.DIM_SALESPERSON,DEFAULT.DIM_CUSTOM"
  4.Sync

四.Kylin的Model创建
  1.选择"Models" tab页,单击"New Model"
  2."Model Name"输入,WareHouse_01_Model
  3.选择"Fact Table"为 DEFAULT.FACT_ORDER;再 添加Lookup Table;
  4.选取每张表的哪些列字段作为Dimensions
     ID Table Name           Columns
     1.DEFAULT.FACT_ORDER  TIME_KEY PRODUCT_KEY SALESPERSON_KEY CUSTOM_KEY
     2.DEFAULT.DIM_DAY          FULL_DAY
     3.DEFAULT.DIM_PRODUCT  PRODUCT_NAME
     4.DEFAULT.DIM_SALESPERSON  SALESPERSON
     5.DEFAULT.DIM_CUSTOM  CUSTOM_NAME

  5.选取DEFAULT.FACT_ORDER表的哪些列字段作为measures
        QUANTITY_ORDERED ORDER_DOLLARS COST_DOLLARS

  6.a.选取 "Partition Date Column"为DEFAULT.FACT_ORDER.TIME_KEY,格式 yyyy-MM-dd
    b.对于"Filter"条件,由于没有要过滤的条件,故不填写

  7.Save

 

五.Kylin的Cube创建

  1.选择"Models" tab页,单击"New Cube“

  2.Cube Info:
        "Model Name"选择,WareHouse_01_Model
        "Cube Name"输入,cube01

  3.Dismensions:
        单击"Auto Generator",依据情况选择维度的列,全选

  4.Measures:
    a.单击"+Measure",添加要聚合计算的度量,比如 sum(QUANTITY_ORDERED)
    b.Expression: SUM/MIN/MAX/COUNT/COUNT_DISTINCT/TOP_N/RAW
  5.Refresh Setting:
    a.Auto Merge Thresholds,自动合并阈值,7~28 days
    b.Retention Threshold,保留天数,60
    c.Partition Start Date,非常重要,是后面build cube的开始日期

  6.Advanced Setting:
    --Aggregation Groups:
    a.Includes: TIME_KEY ,PRODUCT_KEY ,SALESPERSON_KEY , CUSTOM_KEY
    b.Mandatory Dimensions: TIME_KEY
    c.Hierarchy Dimensions: PRODUCT_KEY ,SALESPERSON_KEY ,CUSTOM_KEY
    d.Joint Dimensions: 无
    
    --Rowkeys:
    TIME_KEY ,PRODUCT_KEY ,SALESPERSON_KEY ,CUSTOM_KEY 4个字段为dict字典编码
 
  7.Configuration Overwrites: 无

  8.Overview:
    保存cube

 

五.Cube Build

  1.选择 cube01,单击”Action”,选择Build

  2.填写End Date,Submit

  3.单击”Monitor”,观察Job

  4.等待Process100% (Any Errors)

  5.返回cube01,查看 cube size 和 Source Records等字段更新

 
六.Hive* kyin 查询对比

  点击(此处)折叠或打开

  1. 1.2016-05-01到2016-05-15期间的每天的订单数量,订单金额,订单成本
  2. Hive: 65.816 s
  3. select fact.time_key, sum(fact.quantity_ordered), sum(fact.order_dollars), sum(fact.cost_dollars) from fact_order as fact 
  4. where fact.time_key >= "2016-05-01" and fact.time_key <= "2016-05-15" 
  5. group by fact.time_key order by fact.time_key;
  6. Kylin: 0.32s-->0.27s 
  7. select fact.time_key, sum(fact.quantity_ordered), sum(fact.order_dollars), sum(fact.cost_dollars) from fact_order as fact 
  8. where fact.time_key between '2016-05-01' and '2016-05-15'
  9. group by fact.time_key order by fact.time_key

 

  点击(此处)折叠或打开

  1. 2.2016-05-01到2016-05-15期间的每天的产品的订单量
  2. Hive: 100.336s
  3. select dday.full_day,dsp.product_name, sum(fact.quantity_ordered) from fact_order as fact 
  4. inner join dim_day as dday on fact.time_key = dday.day_key 
  5. inner join dim_product as dsp on fact.product_key = dsp.product_key 
  6. where dday.full_day >= "2016-05-01" and dday.full_day <= "2016-05-15" 
  7. group by dday.full_day,dsp.product_name
  8. order by dday.full_day,dsp.product_name;
  9. Kylin:0.93s-->0.39s
  10. select dday.full_day,dsp.product_name, sum(fact.quantity_ordered) from fact_order as fact 
  11. inner join dim_day as dday on fact.time_key = dday.day_key 
  12. inner join dim_product as dsp on fact.product_key = dsp.product_key 
  13. where dday.full_day >= '2016-05-01' and dday.full_day <= '2016-05-15' 
  14. group by dday.full_day,dsp.product_name
  15. order by dday.full_day,dsp.product_name

这篇关于3-KYLIN订单例程的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1141282

相关文章

Java的ArrayDeque使用例程

题目要求为: 卡拉兹(Callatz)猜想: 对任何一个自然数n,如果它是偶数,那么把它砍掉一半;如果它是奇数,那么把(3n+1)砍掉一半。这样一直反复砍下去,最后一定在某一步得到n=1。当我们验证卡拉兹猜想的时候,为了避免重复计算,可以记录下递推过程中遇到的每一个数。例如对n=3进行验证的时候,我们需要计算3、5、8、4、2、1,则当我们对n=5、8、4、2进行验证的时候,就可以直接

gcc make cmake例程

main.cpp文件: #include <iostream>#include "utils.h"int main(void) {int a = 1;int b = 2;int c = AddFunc(a, b);std::cout<< c <<std::endl;return 0;} utils.h文件: #pragma onceint AddFunc(int a, int b);

Apache Kylin VS Apache Doris全方位对比

1 系统架构 1.1 What is Kylin1.2 What is Doris2 数据模型 2.1 Kylin的聚合模型2.2 Doris的聚合模型2.3 Kylin Cuboid VS Doris RollUp2.4 Doris的明细模型3 存储引擎4 数据导入5 查询6 精确去重7 元数据8 高性能9 高可用10 可维护性 10.1 部署10.2 运维10.3 客服11 易用性 11.1

Apache Kylin 在美团数十亿数据 OLAP 场景下的实践

大数据技术与架构 点击右侧关注,大数据开发领域最强公众号! 暴走大数据 点击右侧关注,暴走大数据! By  大数据技术与架构 场景描述:美团各业务线存在大量的OLAP分析场景,需要基于Hadoop数十亿级别的数据进行分析,直接响应分析师和城市BD等数千人

Kylin使用Spark构建Cube

Apache Kylin™是一个开源的分布式分析引擎,提供Hadoop/Spark之上的SQL查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc. 开发并贡献至开源社区。它能在亚秒内查询巨大的Hive表。下面是单机安装采坑记,直接上配置和问题解决。找一台干净的机器,把hadoop hive hbase从原有节点分别拷贝一份,主要目的是配置文件,可以不在kylin所在机器

Apache Kylin | 麒麟出没,必有祥瑞

点击上方蓝色字体,选择“设为星标” 回复”资源“获取更多资源 大数据技术与架构 点击右侧关注,大数据开发领域最强公众号! 暴走大数据 点击右侧关注,暴走大数据! 前言 随着移动互联网、物联网等技术的发展,近些年人类所积累的数据正在呈爆炸式的增长,大数据时代已经来临。但是海量数据的收集只是大数据技术的第一步,如何让数据产生价值才是大数据领域的终极目标。Hadoop的出现解决了数据存储问

一站式大数据解决方案分析与设计实践 | BI无缝整合Apache Kylin

点击上方蓝色字体,选择“设为星标” 回复”资源“获取更多资源 本文已收录于Github仓库:《大数据成神之路》  地址:https://github.com/wangzhiwubigdata/God-Of-BigData 研发背景 今天随着移动互联网、物联网、大数据、AI等技术的快速发展,数据已成为所有这些技术背后最重要,也是最具价值的“资产”,同时数据也是每一个商业决策的基石,越来越多的

大数据面试通关手册 | Kylin入门/原理/调优/OLAP解决方案和行业典型应用

Kylin入门/原理/调优/OLAP解决方案和行业典型应用一网打尽。 一:背景历史和使命 背景和历史 现今,大数据行业发展得如火如荼,新技术层出不穷,整个生态欣欣向荣。作为大数据领域最重要的技术的 Apache Hadoop 最初致力于简单的分布式存储,然后在此基础之上实现大规模并行计算,到如今在实时分析、多维分析、交互式分析、机器学习甚至人工智能等方面有了长足的发展。 2013 年年初,在

【PSINS】SINS与航位推算的EKF例程讲解|三维轨迹

文章目录 SINS与航位推算例程构造滤波前滤波主体滤波后处理 运行结果源代码 SINS与航位推算 SINS是捷联惯导,略。 航位推算: 本文讲解的程序是PSINS工具箱(相关文章:PSINS初学指导,https://blog.csdn.net/callmeup/article/details/136459824)的一个函数,在此基础上进行优化。如果要运行,需要先安装工具箱

5-Kylin Java Restful API

最近在做大数据方面的开发, 学习研究了一段时间的kylin系统, 对于前端开发需要使用 RESTful API ,但是官网并没有提供详细的Java  API. 经过几天的看文档,最终写出了 Java 的API,不敢私藏,特分享与大家. 1 import java.io.BufferedReader;2 import java.io.InputStream;3 import jav