flink重温笔记(十三): flink 高级特性和新特性(2)——ProcessFunction API 和 双流 join

本文主要是介绍flink重温笔记(十三): flink 高级特性和新特性(2)——ProcessFunction API 和 双流 join,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Flink学习笔记

前言:今天是学习 flink 的第 13 天啦!学习了 flink 高级特性和新特性之ProcessFunction API 和 双流 join,主要是解决大数据领域数据从数据增量聚合的问题,以及快速变化中的流数据拉宽问题,即变化中多个数据源合并在一起的问题,结合自己实验猜想和代码实践,总结了很多自己的理解和想法,希望和大家多多交流!

Tips:"分享是快乐的源泉💧,在我的博客里,不仅有知识的海洋🌊,还有满满的正能量加持💪,快来和我一起分享这份快乐吧😊!

喜欢我的博客的话,记得点个红心❤️和小关小注哦!您的支持是我创作的动力!"


文章目录

  • Flink学习笔记
    • 四、Flink 高级特性和新特性
      • 2. Process Function API
        • 2.1 Process Function 分类
        • 2.2 KeyedProcessFunction [重点]
        • 2.3 具有增量聚合的 ProcessWindowFunction
          • 2.3.1 用法概述
          • 2.3.2 使用 ReduceFunction 进行增量窗口聚合
          • 2.3.3 使用 AggerateFunction 进行增量窗口聚合
          • 2.3.4 Using per-window state in ProcessWindowFunction
      • 3. 双流 Join
        • 3.1 面试介绍
        • 3.2 Window Join
          • 3.2.1 Tumbling Window Join
          • 3.2.2 Sliding Window Join
          • 3.2.3 Session Window Join
          • 3.2.3 案例演示
        • 3.3 Interval Join
          • 3.3.1 Interval Join 介绍
          • 3.3.2 案例演示

四、Flink 高级特性和新特性

2. Process Function API

之前的转换算子是无法访问时间戳信息和水位线信息的,但 Process Function 可以访问时间戳,水位线,以及注册定时时间等,Flink SQL 就是使用 Process Function 实现的

2.1 Process Function 分类
  • 1- ProcessFunction 用于 dataStream
  • 2- KeyedProcessFunction 用于 Keyed dataStream
  • 3- CoProcessFunction 用于 connect 连接的流
  • 4- ProcessJoinFunction 用于 join 流操作
  • 5- BroadcastProcessFunction 用于广播
  • 6- KeyedBroadcastProcessFunction 用于 keyed 后的广播
  • 7- ProcessWindowFunction 窗口增量聚合
  • 8- ProcessAllWindowFunction 全窗口聚合

2.2 KeyedProcessFunction [重点]

KeyedProcessFunction 作为 ProcessFunction 的扩展,在其 onTimer(…) 方法中提供对定时器对应key的访问。

所有的 Process Function 都继承自 RichFunction 接口,所以都有:

  • open()
  • close()
  • getRuntimeContext()

KeyedProcessFunction 额外提供了两个方法:

  • processElement,每个元素调用一次
  • onTimer,回调函数,用于定时器

案例:在服务器运维中,需要实时监控服务器机架的温度,如果一定时间内温度超过了一定阈值(100度),且后一次上报的温度超过了前一次上报的温度,需要触发告警(温度持续升高中)

package cn.itcast.day12.process;import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.state.ListState;
import org.apache.flink.api.common.state.ListStateDescriptor;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;import org.apache.commons.collections.IteratorUtils;
import java.text.SimpleDateFormat;/*** @author lql* @time 2024-03-08 13:01:05* @description TODO:数据结构:(id,温度)*/
public class SystemMonitorDemo {public static void main(String[] args) throws Exception {// todo 1) 初始化 flink 环境StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();// todo 2) 指定并行度为 1env.setParallelism(1);// todo 3) 接入数据源DataStreamSource<String> socketTextStream = env.socketTextStream("node1", 9999);// todo 4) 将获取的数据转化为 tupleSingleOutputStreamOperator<Tuple2<Integer, Integer>> tupleDataStream = socketTextStream.map(new MapFunction<String, Tuple2<Integer, Integer>>() {@Overridepublic Tuple2<Integer, Integer> map(String line) throws Exception {String[] arrayData = line.split(",");return Tuple2.of(Integer.parseInt(arrayData[0]), Integer.parseInt(arrayData[1]));}});// todo 5) 分组操作KeyedStream<Tuple2<Integer, Integer>, Integer> tuple2TupleKeyedStream = tupleDataStream.keyBy(t -> t.f0);// todo 6) 自定义ProcessFunction对象,继承 KeyedProcessFunction 抽象类SingleOutputStreamOperator<String> result = tuple2TupleKeyedStream.process(new MyKeyedProcessFunction());// todo 7) 打印输出result.printToErr();// todo 8) 执行程序env.execute();}private static class MyKeyedProcessFunction extends KeyedProcessFunction<Integer,Tuple2<Integer,Integer>,String> {// 定义数据存储对象private ListState<Tuple2<Integer,Integer>> listState = null;// 定义时间对象SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");// 定义时间private Long timeTS = 0L;/*** 初始化资源* @param parameters* @throws Exception*/@Overridepublic void open(Configuration parameters) throws Exception {super.open(parameters);// 实例化 state 对象this.listState = getRuntimeContext().getListState(new ListStateDescriptor<Tuple2<Integer, Integer>>("listState",TypeInformation.of(new TypeHint<Tuple2<Integer, Integer>>() {})));System.out.println("初始化state对象...");}@Overridepublic void close() throws Exception {super.close();}/*** 定时器触发方法* @param timestamp* @param ctx* @param out* @throws Exception*/@Overridepublic void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {super.onTimer(timestamp, ctx, out);System.out.println("触发了定时服务...");// 迭代转化状态到列表中,然后计算个数int stateSize = IteratorUtils.toList(this.listState.get().iterator()).size();if(stateSize >= 2){//返回数据,触发告警out.collect("触发了告警");}//清空历史的状态数据this.listState.clear();}/*** 对数据集中的每条数据进行处理* @param integerIntegerTuple2* @param context* @param collector* @throws Exception*/@Overridepublic void processElement(Tuple2<Integer, Integer> integerIntegerTuple2, Context context, Collector<String> collector) throws Exception {//获取状态中存储的历史数据Tuple2<Integer, Integer> lastData = null;for (Tuple2<Integer, Integer> tuple : listState.get()){lastData =tuple;}// 判断状态中的数据是否为空if (lastData==null){lastData = Tuple2.of(0,0);}System.out.println("状态中获取到的数据是:"+lastData);if (integerIntegerTuple2.f1 > 100 & integerIntegerTuple2.f1 > lastData.f1){System.out.println("温度上升中...注册定时器!");//满足了温度大于100,且后一次的温度大于前一次的温度//将当前的温度存储起来listState.add(Tuple2.of(integerIntegerTuple2.f0,integerIntegerTuple2.f1));//注册一个定时器(当前处理的时间+窗口长度=触发计算的时间)timeTS = context.timerService().currentProcessingTime() + 10000L;context.timerService().registerProcessingTimeTimer(timeTS);}else{if (integerIntegerTuple2.f1 < lastData.f1){System.out.println("温度下降了...取消定时器!");//取消定时器context.timerService().deleteProcessingTimeTimer(timeTS);}if (integerIntegerTuple2.f1 < 100){//清除状态存储的数据listState.clear();}}}}
}

结果:

输入:
1,100
1,101输出:
温度上升中...注册定时器!
触发了告警

2.3 具有增量聚合的 ProcessWindowFunction
image-20240308165337455

在 reduce 和 aggregate 中均有可以和 processWindowFunction 结合实现增量聚合的方法(红角星标记)。

原理:对于一个窗口来说,先增量计算,关闭窗口前,增量计算结果发给 ProcessWindowFunction 作为输入再全量处理。

特点:既可以增量聚合,又可以访问窗口的元数据信息(比如开始时间、状态等)。


2.3.1 用法概述
input.keyBy(...).timeWindow(...).reduce(incrAggregator: ReduceFunction[IN],function: ProcessWindowFunction[IN, OUT, K, W])input.keyBy(...).timeWindow(...).aggregate(incrAggregator: AggregateFunction[IN, ACC, V],windowFunction: ProcessWindowFunction[V, OUT, K, W])

2.3.2 使用 ReduceFunction 进行增量窗口聚合

数据:

{"userID": "user_1", "eventTime": "2020-11-09 10:41:32", "eventType": "browse", "productID": "product_1", "productPrice": 10}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:33", "eventType": "browse", "productID": "product_1", "productPrice": 30}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:34", "eventType": "browse", "productID": "product_1", "productPrice": 20}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:36", "eventType": "browse", "productID": "product_1", "productPrice": 10}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:38", "eventType": "browse", "productID": "product_1", "productPrice": 70}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:40", "eventType": "browse", "productID": "product_1", "productPrice": 20}

例子:获取一段时间内(Window Size)每个用户(KeyBy)浏览的商品的最大价值的那条记录(ReduceFunction),并获得Key和Window信息。

package cn.itcast.day12.process;import com.alibaba.fastjson.JSON;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.apache.commons.collections.IteratorUtils;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import org.joda.time.DateTime;import java.text.SimpleDateFormat;
import java.time.Duration;
/*** @author lql* @time 2024-03-08 17:06:59* @description TODO*/
public class ReduceAndProcessFunction {public static void main(String[] args) throws Exception{StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();env.setParallelism(1);DataStreamSource<String> lines = env.socketTextStream("node1", 9999);// todo 3) 将获取的 json 数据解析成 java beanlines.process(new SocketProcessFunction()).assignTimestampsAndWatermarks(WatermarkStrategy.<UserActionLog>forBoundedOutOfOrderness(Duration.ZERO).withTimestampAssigner(new SerializableTimestampAssigner<UserActionLog>() {@Overridepublic long extractTimestamp(UserActionLog userActionLog, long l) {try {SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");return format.parse(userActionLog.getEventTime()).getTime();} catch (Exception e) {e.printStackTrace();return 0L;} }}))// 按照用户分组.keyBy( (KeySelector<UserActionLog,String>) UserActionLog::getUserID )// 构造窗口函数 TimeWindow:滚动事件时间窗口.window(TumblingEventTimeWindows.of(Time.seconds(5)))// 窗口函数: 获取这段窗口时间内每个用户浏览的商品的最大价值对应的那条记录.reduce(//增量聚合操作new ReduceFunction<UserActionLog>() {@Overridepublic UserActionLog reduce(UserActionLog value1, UserActionLog value2) throws Exception {return value1.getProductPrice() > value2.getProductPrice() ? value1 : value2;}},//窗口函数操作,其中迭代器中的数据只有一条,已经进行了增量聚合new ProcessWindowFunction<UserActionLog, String, String, TimeWindow>() {@Overridepublic void process(String key, Context context, Iterable<UserActionLog> elements, Collector<String> out) throws Exception {UserActionLog max = elements.iterator().next();System.out.println("集合中的数据:"+ IteratorUtils.toList(elements.iterator()).size());String windowStart = new DateTime(context.window().getStart()).toString("yyyy-MM-dd HH:mm:ss");String windowEnd = new DateTime(context.window().getEnd()).toString("yyyy-MM-dd HH:mm:ss");String record = "key:"+key+"\n"+"窗口开始时间:"+windowStart+"\n窗口结束时间:"+windowEnd+"\n浏览的商品最大价值对应的记录:"+max;out.collect(record);}}).print();// todo 4) 启动程序env.execute();}@Data@AllArgsConstructor@NoArgsConstructorpublic static class UserActionLog{private String userID;private String eventTime;private String eventType;private String productID;private Long productPrice;}/*** 将获取的JSON数据解析成Java Bean*/private static class SocketProcessFunction extends ProcessFunction<String,UserActionLog>{/*** 每条数据都需要执行的方法* @param s* @param context* @param collector* @throws Exception*/@Overridepublic void processElement(String s, Context context, Collector<UserActionLog> collector) throws Exception {collector.collect(JSON.parseObject( s, UserActionLog.class ));}}
}

结果:

集合中的数据:1
key:user_1
窗口开始时间:2020-11-09 10:41:30
窗口结束时间:2020-11-09 10:41:35
浏览的商品最大价值对应的记录:ReduceAndProcessFunction.UserActionLog(userID=user_1, eventTime=2020-11-09 10:41:33, eventType=browse, productID=product_1, productPrice=30)
集合中的数据:1
key:user_1
窗口开始时间:2020-11-09 10:41:35
窗口结束时间:2020-11-09 10:41:40
浏览的商品最大价值对应的记录:ReduceAndProcessFunction.UserActionLog(userID=user_1, eventTime=2020-11-09 10:41:38, eventType=browse, productID=product_1, productPrice=70)

总结:

  • 1- 需要先设置并行度为1,便于少量数据观察到结果
  • 2- reduce/aggregate 暂时不需要 RichreduceFunction,报错:ReduceFunction of apply can not be a RichFunction.

2.3.3 使用 AggerateFunction 进行增量窗口聚合

数据:

{"userID": "user_1", "eventTime": "2020-11-09 10:41:32", "eventType": "browse", "productID": "product_1", "productPrice": 10}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:33", "eventType": "browse", "productID": "product_1", "productPrice": 30}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:34", "eventType": "browse", "productID": "product_1", "productPrice": 20}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:36", "eventType": "browse", "productID": "product_1", "productPrice": 10}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:38", "eventType": "browse", "productID": "product_1", "productPrice": 70}
{"userID": "user_1", "eventTime": "2020-11-09 10:41:40", "eventType": "browse", "productID": "product_1", "productPrice": 20}

例子:获取一段时间内(Window Size)每个用户(KeyBy)浏览的平均价值(AggregateFunction),并获得Key和Window信息。

package cn.itcast.day12.process;import com.alibaba.fastjson.JSON;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.apache.commons.collections.IteratorUtils;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;import java.text.SimpleDateFormat;
import java.time.Duration;/*** @author lql* @time 2024-03-08 17:59:42* @description TODO*/
public class AggregateAndProcessFunction {public static void main(String[] args) throws Exception {StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();env.setParallelism(1);DataStreamSource<String> lines = env.socketTextStream("node1", 9999);// 将从Kafka获取的JSON数据解析成Java Beanlines.process(new KafkaProcessFunction()).assignTimestampsAndWatermarks(WatermarkStrategy.<UserActionLog>forBoundedOutOfOrderness(Duration.ZERO).withTimestampAssigner(new SerializableTimestampAssigner<UserActionLog>() {@Overridepublic long extractTimestamp(UserActionLog element, long recordTimestamp) {try {SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");return format.parse(element.getEventTime()).getTime();} catch (Exception e) {e.printStackTrace();return 0L;}}}))// 按用户分组.keyBy((KeySelector<UserActionLog, String>) UserActionLog::getUserID)// 构造TimeWindow.window(TumblingEventTimeWindows.of(Time.seconds(5)))// 窗口函数: 获取这段窗口时间内,每个用户浏览的商品的平均价值,并发出Key和Window信息.aggregate(new AggregateFunction<UserActionLog, Tuple2<Long, Long>, Double>() {// 1、初始值// 定义累加器初始值@Overridepublic Tuple2<Long, Long> createAccumulator() {return new Tuple2<>(0L, 0L);}// 2、累加// 定义累加器如何基于输入数据进行累加@Overridepublic Tuple2<Long, Long> add(UserActionLog value, Tuple2<Long, Long> accumulator) {accumulator.f0 += 1;accumulator.f1 += value.getProductPrice();return accumulator;}// 3、合并// 定义累加器如何和State中的累加器进行合并@Overridepublic Tuple2<Long, Long> merge(Tuple2<Long, Long> acc1, Tuple2<Long, Long> acc2) {acc1.f0 += acc2.f0;acc1.f1 += acc2.f1;return acc1;}@Overridepublic Double getResult(Tuple2<Long, Long> longLongTuple2) {return longLongTuple2.f1 / (longLongTuple2.f0 * 1.0);}},new ProcessWindowFunction<Double, String, String, TimeWindow>() {@Overridepublic void process(String key, Context context, Iterable<Double> elements, Collector<String> out) throws Exception {Double avg = elements.iterator().next();String windowStart = new DateTime(context.window().getStart(), DateTimeZone.forID("+08:00")).toString("yyyy-MM-dd HH:mm:ss");String windowEnd=new DateTime(context.window().getEnd(), DateTimeZone.forID("+08:00")).toString("yyyy-MM-dd HH:mm:ss");String record="Key: "+key+" 窗口开始时间: "+windowStart+" 窗口结束时间: "+windowEnd+" 浏览的商品的平均价值: "+String.format("%.2f",avg);out.collect(record);}}).print();env.execute();}@Data@AllArgsConstructor@NoArgsConstructorpublic static class UserActionLog{private String userID;private String eventTime;private String eventType;private String productID;private Long productPrice;}/*** 将从Kafka获取的JSON数据解析成Java Bean*/private static class KafkaProcessFunction extends ProcessFunction<String, UserActionLog> {@Overridepublic void processElement(String value, Context ctx, Collector<UserActionLog> out) throws Exception {out.collect(JSON.parseObject(value, UserActionLog.class));}}
}

结果:

Key: user_1 窗口开始时间: 2020-11-09 10:41:30 窗口结束时间: 2020-11-09 10:41:35 浏览的商品的平均价值: 20.00
Key: user_1 窗口开始时间: 2020-11-09 10:41:35 窗口结束时间: 2020-11-09 10:41:40 浏览的商品的平均价值: 40.00

总结:

  • 这种方法主要以 aggregate 的累加器思路为重点,processWindowFunction 的方法主要是为了更能输出状态数据等信息。

2.3.4 Using per-window state in ProcessWindowFunction

与 windowFunction 不同,使用 ProcessWindowFunction 不仅仅可以拿到窗口内数据信息,还可以获取两个状态:

  • WindowState:表示窗口的状态,该状态值和窗口绑定的,一旦窗口消亡状态消失。
  • GlobalState:表示窗口的状态,该状态和Key绑定的,可以累计多个窗口的值。

数据:

1000,spark,2
5000,spark,2
6000,spark,3
10000,spark,5

例子:

package cn.itcast.day12.process;import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;import java.time.Duration;
import java.util.Iterator;/*** @author lql* @time 2024-03-08 20:36:12* @description TODO*/public class WindowStateAndGlobalStateFunctionDemo {public static void main(String[] args) throws Exception {StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();env.enableCheckpointing(10000);//1000,spark,3//1200,spark,5//2000,hadoop,2//socketTextStream返回的DataStream并行度为1DataStreamSource<String> lines = env.socketTextStream("node1", 9999);SingleOutputStreamOperator<String> dataWithWaterMark = lines.assignTimestampsAndWatermarks(WatermarkStrategy.<String>forBoundedOutOfOrderness(Duration.ZERO).withTimestampAssigner((line, timestamp) -> Long.parseLong(line.split(",")[0])));SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndCount = dataWithWaterMark.map(new MapFunction<String, Tuple2<String, Integer>>() {@Overridepublic Tuple2<String, Integer> map(String value) throws Exception {String[] fields = value.split(",");return Tuple2.of(fields[1], Integer.parseInt(fields[2]));}});//调用keyByKeyedStream<Tuple2<String, Integer>, String> keyed = wordAndCount.keyBy(t -> t.f0);//NonKeyd Window: 不调用KeyBy,然后调用windowAll方法,传入windowAssinger// Keyd Window: 先调用KeyBy,然后调用window方法,传入windowAssingerWindowedStream<Tuple2<String, Integer>, String, TimeWindow> windowed = keyed.window(TumblingEventTimeWindows.of(Time.seconds(5)));//如果直接调用sum或reduce,只会聚合窗口内的数据,不去跟历史数据进行累加//需求:可以在窗口内进行增量聚合,并且还可以与历史数据进行聚合SingleOutputStreamOperator<String> result = windowed.aggregate(new MyAggFunc(), new MyWindowFunc());result.print();env.execute();}private static class MyAggFunc implements AggregateFunction<Tuple2<String, Integer>, Integer, Integer> {//创建一个初始值@Overridepublic Integer createAccumulator() {return 0;}//数据一条数据,与初始值或中间累加的结果进行聚合@Overridepublic Integer add(Tuple2<String, Integer> value, Integer accumulator) {return value.f1 + accumulator;}//返回的结果@Overridepublic Integer getResult(Integer accumulator) {return accumulator;}//如果使用的是非SessionWindow,可以不实现@Overridepublic Integer merge(Integer a, Integer b) {return null;}}private static class MyWindowFunc extends ProcessWindowFunction<Integer, String, String, TimeWindow> {// 一个是窗口描述器,一个是全局描述器private transient ReducingStateDescriptor<Integer> windowStateDescriptor;private transient ReducingStateDescriptor<Integer> globalStateDescriptor;@Overridepublic void open(Configuration parameters) throws Exception {windowStateDescriptor = new ReducingStateDescriptor<Integer>("window",new ReduceFunction<Integer>() {@Overridepublic Integer reduce(Integer value1, Integer value2) throws Exception {return value1 + value2;}}, TypeInformation.of(new TypeHint<Integer>() { }));globalStateDescriptor = new ReducingStateDescriptor<Integer>("global",new ReduceFunction<Integer>() {@Overridepublic Integer reduce(Integer value1, Integer value2) throws Exception {return value1 + value2;}}, TypeInformation.of(new TypeHint<Integer>() { }));}@Overridepublic void process(String key, Context context, Iterable<Integer> elements, Collector<String> out) throws Exception {Integer sum = 0;Iterator<Integer> iterator = elements.iterator();while (iterator.hasNext()){sum += iterator.next();}ReducingState<Integer> windowState = context.windowState().getReducingState(windowStateDescriptor);ReducingState<Integer> globalState = context.globalState().getReducingState(globalStateDescriptor);// lambda 表达式的遍历,每个元素 telements.forEach(t -> {try {windowState.add(t);globalState.add(t);} catch (Exception exception) {exception.printStackTrace();}});out.collect(key+",window:"+windowState.get()+",global:"+globalState.get());}}
}

结果:

1> spark,window:2,global:2
1> spark,window:5,global:7

总结:

  • 1- 注册两个状态描述器之后,需要重写 open 方法;
  • 2- 在 open 方法中,都需要 new 一个 ReducingStateDescriptor,然后重写 reduce 方法进行累加操作;
  • 3- 在 process 方法中,进行元素的迭代求和;
  • 4- 极为主要的是,运用 context.windowState() / globalState(),这个是主要区别!

3. 双流 Join

3.1 面试介绍

Join大体分类只有两种:Window Join 和 Interval Join

  • Window Join 将数据缓存在 Window State 中,窗口触发计算时执行join操作

    • Tumbling Window Join
    • Sliding Window Join
    • Session Widnow Join。
  • interval join

    • 也是利用state存储数据再处理,区别在于state中的数据有失效机制,依靠数据触发数据清理

3.2 Window Join
3.2.1 Tumbling Window Join

执行翻滚窗口联接时,具有公共键和公共翻滚窗口的所有元素将作为成对组合联接,并传递给 JoinFunction 或 FlatJoinFunction。

注意,在翻滚窗口[6,7]中没有发射任何东西,因为绿色流中不存在与橙色元素⑥和⑦结合的元素。

使用模板:

import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;...DataStream<Integer> orangeStream = ...
DataStream<Integer> greenStream = ...orangeStream.join(greenStream).where(<KeySelector>).equalTo(<KeySelector>).window(TumblingEventTimeWindows.of(Time.milliseconds(2))).apply (new JoinFunction<Integer, Integer, String> (){@Overridepublic String join(Integer first, Integer second) {return first + "," + second;}});

3.2.2 Sliding Window Join

在执行滑动窗口联接时,具有公共键和公共滑动窗口的所有元素将作为成对组合联接,并传递给 JoinFunction 或 FlatJoinFunction。

注意,在窗口[2,3]中,橙色②与绿色③连接,但在窗口[1,2]中没有与任何对象连接。

使用模板:

import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
...
DataStream<Integer> orangeStream = ...DataStream<Integer> greenStream = ...orangeStream.join(greenStream).where(<KeySelector>)
.equalTo(<KeySelector>)
.window(SlidingEventTimeWindows.of(Time.milliseconds(2) /* size */, Time.milliseconds(1) /* slide */)).apply (new JoinFunction<Integer, Integer, String> (){@Overridepublic String join(Integer first, Integer second) {return first + "," + second;}
});

3.2.3 Session Window Join

在执行会话窗口联接时,具有相同键(当“组合”时满足会话条件)的所有元素以成对组合方式联接,并传递给JoinFunction或FlatJoinFunction。

注意,在第三个会话中,绿色流中没有元素,所以⑧和⑨没有连接!

使用模板:

import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
...
DataStream<Integer> orangeStream = ...DataStream<Integer> greenStream = ...orangeStream.join(greenStream).where(<KeySelector>)
.equalTo(<KeySelector>)
.window(EventTimeSessionWindows.withGap(Time.milliseconds(1))).apply (new JoinFunction<Integer, Integer, String> (){@Overridepublic String join(Integer first, Integer second) {return first + "," + second;}
});

3.2.3 案例演示

例子:使用两个指定Source模拟数据,一个Source是订单明细,一个Source是商品数据。我们通过window join,将数据关联到一起。

package cn.itcast.day13.join;/*** @author lql* @time 2024-03-09 21:03:00* @description TODO:
思路
Window Join首先需要使用where和equalTo指定使用哪个key来进行关联,此处我们通过应用方法,基于GoodsId来关联两个流中的元素。
设置5秒的滚动窗口,流的元素关联都会在这个5秒的窗口中进行关联。
apply方法中实现将两个不同类型的元素关联并生成一个新类型的元素。*/import com.alibaba.fastjson.JSON;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.apache.flink.api.common.eventtime.*;
import org.apache.flink.api.common.functions.JoinFunction;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;import java.math.BigDecimal;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.UUID;
import java.util.concurrent.TimeUnit;/*** 来做个案例:* 使用两个指定Source模拟数据,一个Source是订单明细,一个Source是商品数据。我们通过window join,将数据关联到一起。*/
public class JoinDemo01 {public static void main(String[] args) throws Exception {//todo 1)环境初始化StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();//todo 2)设置并行度env.setParallelism(1);//todo 3)构建数据源//构建商品数据流// 因为继承的 Richsource 没有指出返回类型,所以这里需要指出了!!!SingleOutputStreamOperator<Goods> goodsDataStream = env.addSource(new GoodsSource11(), TypeInformation.of(Goods.class)).assignTimestampsAndWatermarks(new GoodsWatermark());//构建订单明细数据流SingleOutputStreamOperator<OrderItem> orderItemDataStream = env.addSource(new OrderItemSource(), TypeInformation.of(OrderItem.class)).assignTimestampsAndWatermarks(new OrderDetailWatermark());DataStream<FactOrderItem> result = goodsDataStream.join(orderItemDataStream)//第一个流的where.where(Goods::getGoodsId)//第二个流的where.equalTo(OrderItem::getGoodsId)//添加窗口.window(TumblingEventTimeWindows.of(Time.seconds(5))).apply(new JoinFunction<Goods, OrderItem, FactOrderItem>() {@Overridepublic FactOrderItem join(Goods goods, OrderItem orderItem) throws Exception {FactOrderItem factOrderItem = new FactOrderItem();factOrderItem.setGoodsId(goods.getGoodsId());factOrderItem.setGoodsName(goods.getGoodsName());factOrderItem.setCount(new BigDecimal(orderItem.getCount()));factOrderItem.setTotalMoney(goods.getGoodsPrice().multiply(new BigDecimal(orderItem.getCount())));return factOrderItem;}});result.printToErr();env.execute();}//商品类@Data@NoArgsConstructor@AllArgsConstructorpublic static class Goods {private String goodsId;private String goodsName;private BigDecimal goodsPrice;public static List<Goods> GOODS_LIST;public static Random r;static  {r = new Random();GOODS_LIST = new ArrayList<>();GOODS_LIST.add(new Goods("1", "小米12", new BigDecimal(4890)));GOODS_LIST.add(new Goods("2", "iphone12", new BigDecimal(12000)));GOODS_LIST.add(new Goods("3", "MacBookPro", new BigDecimal(15000)));GOODS_LIST.add(new Goods("4", "Thinkpad X1", new BigDecimal(9800)));GOODS_LIST.add(new Goods("5", "MeiZu One", new BigDecimal(3200)));GOODS_LIST.add(new Goods("6", "Mate 40", new BigDecimal(6500)));}public static Goods randomGoods() {int rIndex = r.nextInt(GOODS_LIST.size());return GOODS_LIST.get(rIndex);}@Overridepublic String toString() {return JSON.toJSONString(this);}}//订单明细类@Data@AllArgsConstructor@NoArgsConstructorpublic static class OrderItem {private String itemId;private String goodsId;private Integer count;@Overridepublic String toString() {return JSON.toJSONString(this);}}//关联结果@Data@AllArgsConstructor@NoArgsConstructorpublic static class FactOrderItem {private String goodsId;private String goodsName;private BigDecimal count;private BigDecimal totalMoney;private String itemId;@Overridepublic String toString() {return JSON.toJSONString(this);}}//构建一个商品Stream源(这个好比就是维表)public static class GoodsSource11 extends RichSourceFunction {private Boolean isCancel;@Overridepublic void open(Configuration parameters) throws Exception {isCancel = false;}@Overridepublic void run(SourceContext sourceContext) throws Exception {while(!isCancel) {// steam 可以将列表转化为流// lambda 表达式将返回对象逐个进行 collectGoods.GOODS_LIST.stream().forEach(goods -> sourceContext.collect(goods));TimeUnit.SECONDS.sleep(1);}}@Overridepublic void cancel() {isCancel = true;}}//构建订单明细Stream源public static class OrderItemSource extends RichSourceFunction {private Boolean isCancel;private Random r;@Overridepublic void open(Configuration parameters) throws Exception {isCancel = false;r = new Random();}@Overridepublic void run(SourceContext sourceContext) throws Exception {while(!isCancel) {Goods goods = Goods.randomGoods();OrderItem orderItem = new OrderItem();orderItem.setGoodsId(goods.getGoodsId());orderItem.setCount(r.nextInt(10) + 1);orderItem.setItemId(UUID.randomUUID().toString());sourceContext.collect(orderItem);orderItem.setGoodsId("111");sourceContext.collect(orderItem);TimeUnit.SECONDS.sleep(1);}}@Overridepublic void cancel() {isCancel = true;}}// 因为这里没有指定是哪一种水印,重写两个方法!/*** 定义商品水印信息*/private static class GoodsWatermark implements WatermarkStrategy<Goods> {// 因为这里看见水印生成器,所以一定要想到有继承方法,参考自定义水印章节@Overridepublic WatermarkGenerator<Goods> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {return new WatermarkGenerator<Goods>(){ // 继承两个方法@Overridepublic void onEvent(Goods goods, long eventTimestamp, WatermarkOutput watermarkOutput) {System.out.println("商品数据时间:"+System.currentTimeMillis());watermarkOutput.emitWatermark(new Watermark(System.currentTimeMillis()));}@Overridepublic void onPeriodicEmit(WatermarkOutput watermarkOutput) {watermarkOutput.emitWatermark(new Watermark(System.currentTimeMillis()));}};}@Overridepublic TimestampAssigner<Goods> createTimestampAssigner(TimestampAssignerSupplier.Context context) {// 在流处理过程中,每个 Goods 元素都将被分配一个当前的时间戳return (element, recordTimestamp) -> System.currentTimeMillis();}}/*** 定义订单明细数据流的水印*/public static class OrderDetailWatermark implements WatermarkStrategy<OrderItem>{@Overridepublic WatermarkGenerator<OrderItem> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {return new WatermarkGenerator<OrderItem>() {@Overridepublic void onEvent(OrderItem event, long eventTimestamp, WatermarkOutput output) {System.out.println("订单明细数据时间:"+System.currentTimeMillis());output.emitWatermark(new Watermark(System.currentTimeMillis()));}@Overridepublic void onPeriodicEmit(WatermarkOutput output) {output.emitWatermark(new Watermark(System.currentTimeMillis()));}};}@Overridepublic TimestampAssigner<OrderItem> createTimestampAssigner(TimestampAssignerSupplier.Context context) {return (element, recordTimestamp) -> System.currentTimeMillis();}}
}

结果:

订单明细数据时间:1709991872660
商品数据时间:1709991872660
订单明细数据时间:1709991872661
商品数据时间:1709991872661
商品数据时间:1709991872661
商品数据时间:1709991872661
商品数据时间:1709991872661
商品数据时间:1709991872661
订单明细数据时间:1709991873665
商品数据时间:1709991873665
订单明细数据时间:1709991873665
商品数据时间:1709991873665
商品数据时间:1709991873665
商品数据时间:1709991873665
商品数据时间:1709991873665
商品数据时间:1709991873665
订单明细数据时间:1709991874665
商品数据时间:1709991874665
商品数据时间:1709991874665
商品数据时间:1709991874665
商品数据时间:1709991874665
商品数据时间:1709991874665
商品数据时间:1709991874665
订单明细数据时间:1709991874665{"count":4,"goodsId":"5","goodsName":"MeiZu One","totalMoney":12800}
{"count":4,"goodsId":"5","goodsName":"MeiZu One","totalMoney":12800}
{"count":4,"goodsId":"5","goodsName":"MeiZu One","totalMoney":12800}
{"count":4,"goodsId":"4","goodsName":"Thinkpad X1","totalMoney":39200}
{"count":4,"goodsId":"4","goodsName":"Thinkpad X1","totalMoney":39200}
{"count":4,"goodsId":"4","goodsName":"Thinkpad X1","totalMoney":39200}
{"count":1,"goodsId":"2","goodsName":"iphone12","totalMoney":12000}
{"count":1,"goodsId":"2","goodsName":"iphone12","totalMoney":12000}
{"count":1,"goodsId":"2","goodsName":"iphone12","totalMoney":12000}

总结:

  • 1- 注意定义 java bean 类处理流信息的时候
  • 2- 窗口流注意水印操作的生成器方式,发生水印的时间
  • 3- joinFunction 需要重写 join 方法

3.3 Interval Join
3.3.1 Interval Join 介绍
  • interval join也是使用相同的key来join两个流(流A、流B),并且流B中的元素中的时间戳,和流A元素的时间戳,有一个时间间隔。

b.timestamp ∈ [a.timestamp + lowerBound; a.timestamp + upperBound]

a.timestamp + lowerBound <= b.timestamp <= a.timestamp + upperBound

流B的元素的时间戳 ≥ 流A的元素时间戳 + 下界(负号),且,流B的元素的时间戳 ≤ 流A的元素时间戳 + 上界(正号)。

这些边界是包含的,但是可以应用 .lowerBoundExclusive().upperBoundExclusive 来更改行为!

使用模板:

import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.functions.co.ProcessJoinFunction;
import org.apache.flink.streaming.api.windowing.time.Time;
...
DataStream<Integer> orangeStream = ...DataStream<Integer> greenStream = ...orangeStream.keyBy(<KeySelector>).intervalJoin(greenStream.keyBy(<KeySelector>)).between(Time.milliseconds(-2), Time.milliseconds(1)).process (new ProcessJoinFunction<Integer, Integer, String(){@Overridepublic void processElement(Integer left, Integer right, Context ctx, Collector<String> out) {out.collect(first + "," + second);}
});

3.3.2 案例演示

例子:

package cn.itcast.day13.join;import com.alibaba.fastjson.JSON;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.apache.flink.api.common.eventtime.*;
import org.apache.flink.api.common.functions.JoinFunction;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.co.ProcessJoinFunction;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;import java.math.BigDecimal;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.UUID;
import java.util.concurrent.TimeUnit;
/*** @author lql* @time 2024-03-09 22:27:18* @description TODO*/
public class JoinDemo02 {public static void main(String[] args) throws Exception {StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();// 构建商品数据流DataStream<Goods> goodsDS = env.addSource(new GoodsSource11(), TypeInformation.of(Goods.class)).assignTimestampsAndWatermarks(new GoodsWatermark());// 构建订单明细数据流DataStream<OrderItem> orderItemDS = env.addSource(new OrderItemSource(), TypeInformation.of(OrderItem.class)).assignTimestampsAndWatermarks(new OrderItemWatermark());// 进行关联查询SingleOutputStreamOperator<FactOrderItem> factOrderItemDS = orderItemDS.keyBy(item -> item.getGoodsId()).intervalJoin(goodsDS.keyBy(goods -> goods.getGoodsId())).between(Time.seconds(-1), Time.seconds(0)).upperBoundExclusive().process(new ProcessJoinFunction<OrderItem, Goods, FactOrderItem>() {@Overridepublic void processElement(OrderItem left, Goods right, Context ctx, Collector<FactOrderItem> out) throws Exception {FactOrderItem factOrderItem = new FactOrderItem();factOrderItem.setGoodsId(right.getGoodsId());factOrderItem.setGoodsName(right.getGoodsName());factOrderItem.setCount(new BigDecimal(left.getCount()));factOrderItem.setTotalMoney(right.getGoodsPrice().multiply(new BigDecimal(left.getCount())));out.collect(factOrderItem);}});factOrderItemDS.print();env.execute("Interval JOIN");}//商品类@Datapublic static class Goods {private String goodsId;private String goodsName;private BigDecimal goodsPrice;public static List<Goods> GOODS_LIST;public static Random r;static {r = new Random();GOODS_LIST = new ArrayList<>();GOODS_LIST.add(new Goods("1", "小米12", new BigDecimal(4890)));GOODS_LIST.add(new Goods("2", "iphone12", new BigDecimal(12000)));GOODS_LIST.add(new Goods("3", "MacBookPro", new BigDecimal(15000)));GOODS_LIST.add(new Goods("4", "Thinkpad X1", new BigDecimal(9800)));GOODS_LIST.add(new Goods("5", "MeiZu One", new BigDecimal(3200)));GOODS_LIST.add(new Goods("6", "Mate 40", new BigDecimal(6500)));}public static Goods randomGoods() {int rIndex = r.nextInt(GOODS_LIST.size());return GOODS_LIST.get(rIndex);}public Goods() {}public Goods(String goodsId, String goodsName, BigDecimal goodsPrice) {this.goodsId = goodsId;this.goodsName = goodsName;this.goodsPrice = goodsPrice;}@Overridepublic String toString() {return JSON.toJSONString(this);}}//订单明细类@Datapublic static class OrderItem {private String itemId;private String goodsId;private Integer count;@Overridepublic String toString() {return JSON.toJSONString(this);}}//关联结果@Datapublic static class FactOrderItem {private String goodsId;private String goodsName;private BigDecimal count;private BigDecimal totalMoney;@Overridepublic String toString() {return JSON.toJSONString(this);}}//构建一个商品Stream源(这个好比就是维表)public static class GoodsSource11 extends RichSourceFunction {private Boolean isCancel;@Overridepublic void open(Configuration parameters) throws Exception {isCancel = false;}@Overridepublic void run(SourceContext sourceContext) throws Exception {while (!isCancel) {Goods.GOODS_LIST.stream().forEach(goods -> sourceContext.collect(goods));TimeUnit.SECONDS.sleep(1);}}@Overridepublic void cancel() {isCancel = true;}}//构建订单明细Stream源public static class OrderItemSource extends RichSourceFunction {private Boolean isCancel;private Random r;@Overridepublic void open(Configuration parameters) throws Exception {isCancel = false;r = new Random();}@Overridepublic void run(SourceContext sourceContext) throws Exception {while (!isCancel) {Goods goods = Goods.randomGoods();OrderItem orderItem = new OrderItem();orderItem.setGoodsId(goods.getGoodsId());orderItem.setCount(r.nextInt(10) + 1);orderItem.setItemId(UUID.randomUUID().toString());sourceContext.collect(orderItem);orderItem.setGoodsId("111");sourceContext.collect(orderItem);TimeUnit.SECONDS.sleep(1);}}@Overridepublic void cancel() {isCancel = true;}}//构建水印分配器(此处为了简单),直接使用系统时间了public static class GoodsWatermark implements WatermarkStrategy<Goods> {@Overridepublic TimestampAssigner<Goods> createTimestampAssigner(TimestampAssignerSupplier.Context context) {return (element, recordTimestamp) -> System.currentTimeMillis();}@Overridepublic WatermarkGenerator<Goods> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {return new WatermarkGenerator<Goods>() {@Overridepublic void onEvent(Goods event, long eventTimestamp, WatermarkOutput output) {output.emitWatermark(new Watermark(System.currentTimeMillis()));}@Overridepublic void onPeriodicEmit(WatermarkOutput output) {output.emitWatermark(new Watermark(System.currentTimeMillis()));}};}}public static class OrderItemWatermark implements WatermarkStrategy<OrderItem> {@Overridepublic TimestampAssigner<OrderItem> createTimestampAssigner(TimestampAssignerSupplier.Context context) {return (element, recordTimestamp) -> System.currentTimeMillis();}@Overridepublic WatermarkGenerator<OrderItem> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {return new WatermarkGenerator<OrderItem>() {@Overridepublic void onEvent(OrderItem event, long eventTimestamp, WatermarkOutput output) {output.emitWatermark(new Watermark(System.currentTimeMillis()));}@Overridepublic void onPeriodicEmit(WatermarkOutput output) {output.emitWatermark(new Watermark(System.currentTimeMillis()));}};}}
}

结果:

5> {"count":4,"goodsId":"5","goodsName":"MeiZu One","totalMoney":12800}
3> {"count":9,"goodsId":"3","goodsName":"MacBookPro","totalMoney":135000}

总结:

  • 1- connect + broadcast 连接适用于数据几乎不变的情况下
  • 2- BroadcastState 连接适用于数据变化不那么快的情况下
  • 3- 双流 Join 连接适用于流式数据变化很快的情况下(类似于股价)

这篇关于flink重温笔记(十三): flink 高级特性和新特性(2)——ProcessFunction API 和 双流 join的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/792958

相关文章

Python中列表的高级索引技巧分享

《Python中列表的高级索引技巧分享》列表是Python中最常用的数据结构之一,它允许你存储多个元素,并且可以通过索引来访问这些元素,本文将带你深入了解Python列表的高级索引技巧,希望对... 目录1.基本索引2.切片3.负数索引切片4.步长5.多维列表6.列表解析7.切片赋值8.删除元素9.反转列表

正则表达式高级应用与性能优化记录

《正则表达式高级应用与性能优化记录》本文介绍了正则表达式的高级应用和性能优化技巧,包括文本拆分、合并、XML/HTML解析、数据分析、以及性能优化方法,通过这些技巧,可以更高效地利用正则表达式进行复杂... 目录第6章:正则表达式的高级应用6.1 模式匹配与文本处理6.1.1 文本拆分6.1.2 文本合并6

【学习笔记】 陈强-机器学习-Python-Ch15 人工神经网络(1)sklearn

系列文章目录 监督学习:参数方法 【学习笔记】 陈强-机器学习-Python-Ch4 线性回归 【学习笔记】 陈强-机器学习-Python-Ch5 逻辑回归 【课后题练习】 陈强-机器学习-Python-Ch5 逻辑回归(SAheart.csv) 【学习笔记】 陈强-机器学习-Python-Ch6 多项逻辑回归 【学习笔记 及 课后题练习】 陈强-机器学习-Python-Ch7 判别分析 【学

系统架构师考试学习笔记第三篇——架构设计高级知识(20)通信系统架构设计理论与实践

本章知识考点:         第20课时主要学习通信系统架构设计的理论和工作中的实践。根据新版考试大纲,本课时知识点会涉及案例分析题(25分),而在历年考试中,案例题对该部分内容的考查并不多,虽在综合知识选择题目中经常考查,但分值也不高。本课时内容侧重于对知识点的记忆和理解,按照以往的出题规律,通信系统架构设计基础知识点多来源于教材内的基础网络设备、网络架构和教材外最新时事热点技术。本课时知识

论文阅读笔记: Segment Anything

文章目录 Segment Anything摘要引言任务模型数据引擎数据集负责任的人工智能 Segment Anything Model图像编码器提示编码器mask解码器解决歧义损失和训练 Segment Anything 论文地址: https://arxiv.org/abs/2304.02643 代码地址:https://github.com/facebookresear

数学建模笔记—— 非线性规划

数学建模笔记—— 非线性规划 非线性规划1. 模型原理1.1 非线性规划的标准型1.2 非线性规划求解的Matlab函数 2. 典型例题3. matlab代码求解3.1 例1 一个简单示例3.2 例2 选址问题1. 第一问 线性规划2. 第二问 非线性规划 非线性规划 非线性规划是一种求解目标函数或约束条件中有一个或几个非线性函数的最优化问题的方法。运筹学的一个重要分支。2

【C++学习笔记 20】C++中的智能指针

智能指针的功能 在上一篇笔记提到了在栈和堆上创建变量的区别,使用new关键字创建变量时,需要搭配delete关键字销毁变量。而智能指针的作用就是调用new分配内存时,不必自己去调用delete,甚至不用调用new。 智能指针实际上就是对原始指针的包装。 unique_ptr 最简单的智能指针,是一种作用域指针,意思是当指针超出该作用域时,会自动调用delete。它名为unique的原因是这个

查看提交历史 —— Git 学习笔记 11

查看提交历史 查看提交历史 不带任何选项的git log-p选项--stat 选项--pretty=oneline选项--pretty=format选项git log常用选项列表参考资料 在提交了若干更新,又或者克隆了某个项目之后,你也许想回顾下提交历史。 完成这个任务最简单而又有效的 工具是 git log 命令。 接下来的例子会用一个用于演示的 simplegit

记录每次更新到仓库 —— Git 学习笔记 10

记录每次更新到仓库 文章目录 文件的状态三个区域检查当前文件状态跟踪新文件取消跟踪(un-tracking)文件重新跟踪(re-tracking)文件暂存已修改文件忽略某些文件查看已暂存和未暂存的修改提交更新跳过暂存区删除文件移动文件参考资料 咱们接着很多天以前的 取得Git仓库 这篇文章继续说。 文件的状态 不管是通过哪种方法,现在我们已经有了一个仓库,并从这个仓

忽略某些文件 —— Git 学习笔记 05

忽略某些文件 忽略某些文件 通过.gitignore文件其他规则源如何选择规则源参考资料 对于某些文件,我们不希望把它们纳入 Git 的管理,也不希望它们总出现在未跟踪文件列表。通常它们都是些自动生成的文件,比如日志文件、编译过程中创建的临时文件等。 通过.gitignore文件 假设我们要忽略 lib.a 文件,那我们可以在 lib.a 所在目录下创建一个名为 .gi