本文主要是介绍Kafka Consumer APIs,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
Kafka 0.10.X 版本 及 之前
http://kafka.apache.org/0100/documentation.html#impl_consumer
We have 2 levels of consumer APIs. The low-level "simple" API maintains a connection to a single broker and has a close correspondence to the network requests sent to the server. This API is completely stateless, with the offset being passed in on every request, allowing the user to maintain this metadata however they choose.
The high-level API hides the details of brokers from the consumer and allows consuming off the cluster of machines without concern for the underlying topology. It also maintains the state of what has been consumed. The high-level API also provides the ability to subscribe to topics that match a filter expression (i.e., either a whitelist or a blacklist regular expression).
我们有两个级别的消费者API。低级的“简单”API维护与单个代理的连接,并与发送到服务器的网络请求有密切的对应关系。这个API是完全无状态的,每一个请求都传递一个偏移量,允许用户根据自己的选择来维护这个元数据。
高级API将代理的详细信息隐藏在消费者面前,并允许在不考虑底层拓扑的情况下使用机器集群。它还保持已消费的状态。高级API还提供了订阅与过滤器表达式(即白名单或黑名单正则表达式)匹配的主题的能力。
Low-level API
The low-level API is used to implement the high-level API as well as being used directly for some of our offline consumers which have particular requirements around maintaining state.
低级API用于实现高级API,并直接用于我们的一些离线用户,这些用户对维护状态有特定的要求。
High-level API
This API is centered around iterators, implemented by the KafkaStream class. Each KafkaStream represents the stream of messages from one or more partitions on one or more servers. Each stream is used for single threaded processing, so the client can provide the number of desired streams in the create call. Thus a stream may represent the merging of multiple server partitions (to correspond to the number of processing threads), but each partition only goes to one stream.
The createMessageStreams call registers the consumer for the topic, which results in rebalancing the consumer/broker assignment. The API encourages creating many topic streams in a single call in order to minimize this rebalancing. The createMessageStreamsByFilter call (additionally) registers watchers to discover new topics that match its filter. Note that each stream that createMessageStreamsByFilter returns may iterate over messages from multiple topics (i.e., if multiple topics are allowed by the filter).
这个API以迭代器为中心,由kafkastream类实现。每个kafkastream表示来自一个或多个服务器上一个或多个分区的消息流。每个流都用于单线程处理,因此客户机可以在创建调用中提供所需流的数量。因此,流可能表示多个服务器分区的合并(对应于处理线程的数量),但每个分区只转到一个流。
createMessageStreams调用为主题注册使用者,从而重新平衡使用者/代理分配。API鼓励在单个调用中创建多个主题流,以最小化这种重新平衡。createMessageStreamsByfilter调用(另外)注册观察者以发现与其筛选器匹配的新主题。注意,createMessageStreamsByfilter返回的每个流可能会迭代来自多个主题的消息(即,如果筛选器允许多个主题)。
http://kafka.apache.org/0100/documentation.html
Kafka 2.2.X 最新版本
已经不区分 Low-level API 与 High-level API 两种api了。
建议使用 kafkaConsumer 来消费kafka数据。
偏移量管理
- 偏移量-自动提交
- 偏移量-手动按消费者提交
- 偏移量-手动按分区提交
- 消费者从指定分区拉取数据
- 偏移量由消费者管理
Spark Streaming + Kafka 2.4.3 (就是用的High-level API)
Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
这篇关于Kafka Consumer APIs的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!