本文主要是介绍【Kafka源码走读】消息生产者与服务端的连接过程,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
说明:以下描述的源码都是基于最新版,老版本可能会有所不同。
一. 查找源码入口
kafka-console-producer.sh是消息生产者的脚本,我们从这里入手,可以看到源码的入口:
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; thenexport KAFKA_HEAP_OPTS="-Xmx512M"
fi
exec $(dirname $0)/kafka-run-class.sh kafka.tools.ConsoleProducer "$@"
从上面的代码可以得知,源码是kafka.tools.ConsoleProducer,这是一个scala的文件。
二. 利用源码启动生产者进行调试
阅读源码最好的方式就是在debug下,边看边断点跟踪,所以我们先把环境配置好,以便程序可以run起来。
由于kafka server端配置了认证模式,那么在client侧,也需要加上认证的配置,否则会导致连接server失败。如何开启认证模式,可参考我之前写的这篇文章。我们可以参考kafka-console-producer.sh脚本运行时传入的参数,对应填入idea的Run/Debug Configurations界面中。
脚本:
/kafka/bin/kafka-console-producer.sh --bootstrap-server=127.0.0.1:9092 --topic=notif.test --producer.config=/kafka/config/topic.properties
topic.properties的内容如下:
security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN
这个配置也可以放在producer.properties里面,下面会看到。
idea界面:
红色框起来的部分,就是和无认证模式下的区别,没有这两个参数,连接server就会失败。client.jaas.conf里面的参数,请参考上面提到的开启认证模式的文章。producer.properties是kafka自带配置文件,我们仅需要增加如下配置即可:
security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN
好了,一切就绪,就可以执行run了,控制台如果没有错误,那就说明启动成功了,如下:
说到这里,不得不感慨一下,平时基本上没有run过命令行输入内容的代码。然后,我停留在这个界面半个小时,一直以为没有连接成功,各种排查是哪里配置的不对之类的。突然间想起去看下server端的日志,结果发现连上了。然后试着在上面红色日志下方去输入内容(见下图),好家伙,consumer侧收到了,大写的尴尬!
kafka同学,你说你要是在我输入的上方再写点提示日志该多好啊。。。
三. 查看生产者连接服务端的过程
既然代码跑起来了,那就开始我们的阅读之旅。首先,在ConsoleProducer.scala中找到入口函数main()方法,这是任何编程语言的启动之源:
def main(args: Array[String]): Unit = {try {val config = new ProducerConfig(args)val input = System.inval producer = new KafkaProducer[Array[Byte], Array[Byte]](producerProps(config))try loopReader(producer, newReader(config.readerClass, getReaderProps(config)), input, config.sync)finally producer.close()Exit.exit(0)} catch {case e: joptsimple.OptionException =>System.err.println(e.getMessage)Exit.exit(1)case e: Exception =>e.printStackTrace()Exit.exit(1)}}
可以看出,它调用了val producer = new KafkaProducer[Array[Byte], Array[Byte]](producerProps(config))。
KafkaProducer是java代码,查看其最终调用的构造函数:
KafkaProducer(ProducerConfig config,Serializer<K> keySerializer,Serializer<V> valueSerializer,ProducerMetadata metadata,KafkaClient kafkaClient,ProducerInterceptors<K, V> interceptors,Time time) {try {this.producerConfig = config;this.time = time;// 此处省略多行代码this.errors = this.metrics.sensor("errors");this.sender = newSender(logContext, kafkaClient, this.metadata);String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;this.ioThread = new KafkaThread(ioThreadName, this.sender, true);this.ioThread.start();config.logUnused();AppInfoParser.registerAppInfo(JMX_PREFIX, clientId, metrics, time.milliseconds());log.debug("Kafka producer started");} catch (Throwable t) {// call close methods if internal objects are already constructed this is to prevent resource leak. see KAFKA-2121close(Duration.ofMillis(0), true);// now propagate the exceptionthrow new KafkaException("Failed to construct kafka producer", t);}}
关注this.sender = newSender(logContext, kafkaClient, this.metadata);这行代码,进入newSender()函数:
Sender newSender(LogContext logContext, KafkaClient kafkaClient, ProducerMetadata metadata) {// 此处省略部分代码KafkaClient client = kafkaClient != null ? kafkaClient : ClientUtils.createNetworkClient(producerConfig,this.metrics,"producer",logContext,apiVersions,time,maxInflightRequests,metadata,throttleTimeSensor,clientTelemetryReporter.map(ClientTelemetryReporter::telemetrySender).orElse(null));short acks = Short.parseShort(producerConfig.getString(ProducerConfig.ACKS_CONFIG));return new Sender(参数省略);}
注意这行代码:
KafkaClient client = kafkaClient != null ? kafkaClient : ClientUtils.createNetworkClient(参数省略);
前面都没有对kafkaClient进行赋值,所以这行代码可简化为:
KafkaClient client = ClientUtils.createNetworkClient(参数省略)
接下来查看ClientUtils.createNetworkClient()函数,最终会调用下面这个方法:
public static NetworkClient createNetworkClient(入参省略) {ChannelBuilder channelBuilder = null;Selector selector = null;try {channelBuilder = ClientUtils.createChannelBuilder(config, time, logContext);selector = new Selector(config.getLong(CommonClientConfigs.CONNECTIONS_MAX_IDLE_MS_CONFIG),metrics,time,metricsGroupPrefix,channelBuilder,logContext);return new NetworkClient(metadataUpdater,metadata,selector,clientId,maxInFlightRequestsPerConnection,后续参数省略);} catch (Throwable t) {closeQuietly(selector, "Selector");closeQuietly(channelBuilder, "ChannelBuilder");throw new KafkaException("Failed to create new NetworkClient", t);}}
我们在第二步调试的时候,不是加了认证的配置参数吗,处理认证配置的方法就在上面的方法里面,具体是如下代码:
channelBuilder = ClientUtils.createChannelBuilder(config, time, logContext);public static ChannelBuilder createChannelBuilder(AbstractConfig config, Time time, LogContext logContext) {SecurityProtocol securityProtocol = SecurityProtocol.forName(config.getString(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG));String clientSaslMechanism = config.getString(SaslConfigs.SASL_MECHANISM);return ChannelBuilders.clientChannelBuilder(securityProtocol, JaasContext.Type.CLIENT, config, null,clientSaslMechanism, time, true, logContext);
ChannelBuilders.createChannelBuilder()方法只是外层的判断:
public static ChannelBuilder clientChannelBuilder(入参省略) {if (securityProtocol == SecurityProtocol.SASL_PLAINTEXT || securityProtocol == SecurityProtocol.SASL_SSL) {if (contextType == null)throw new IllegalArgumentException("`contextType` must be non-null if `securityProtocol` is `" + securityProtocol + "`");if (clientSaslMechanism == null)throw new IllegalArgumentException("`clientSaslMechanism` must be non-null in client mode if `securityProtocol` is `" + securityProtocol + "`");}return create(securityProtocol, ConnectionMode.CLIENT, contextType, config, listenerName, false, clientSaslMechanism,saslHandshakeRequestEnable, null, null, time, logContext, null);}
详细的处理逻辑是在ChannelBuilders.create()方法里面:
private static ChannelBuilder create(入参省略) {Map<String, Object> configs = channelBuilderConfigs(config, listenerName);ChannelBuilder channelBuilder;switch (securityProtocol) {case SSL:requireNonNullMode(connectionMode, securityProtocol);channelBuilder = new SslChannelBuilder(connectionMode, listenerName, isInterBrokerListener, logContext);break;case SASL_SSL:case SASL_PLAINTEXT:// 业务代码太长,省略break;case PLAINTEXT:channelBuilder = new PlaintextChannelBuilder(listenerName);break;default:throw new IllegalArgumentException("Unexpected securityProtocol " + securityProtocol);}channelBuilder.configure(configs);return channelBuilder;}
好了,现在又回到ClientUtils.createNetworkClient()方法:
public static NetworkClient createNetworkClient(入参省略) {ChannelBuilder channelBuilder = null;Selector selector = null;try {channelBuilder = ClientUtils.createChannelBuilder(config, time, logContext);selector = new Selector(参数省略);return new NetworkClient(metadataUpdater,metadata,selector,clientId,maxInFlightRequestsPerConnection,后续参数省略);} catch (Throwable t) {closeQuietly(selector, "Selector");closeQuietly(channelBuilder, "ChannelBuilder");throw new KafkaException("Failed to create new NetworkClient", t);}}
创建channelBuilder之后,紧接着是创建一个Selector对象,然后再创建一个NetworkClient对象,并返回。创建Selector和NetworkClient对象的构造函数都只是初始化各类参数,没有值得需要注意的地方,所以这里就跳过了。
上述代码执行完毕,则会回到KafkaProducer.newSender()方法:
Sender newSender(LogContext logContext, KafkaClient kafkaClient, ProducerMetadata metadata) {// 此处省略部分代码KafkaClient client = kafkaClient != null ? kafkaClient : ClientUtils.createNetworkClient(参数省略);short acks = Short.parseShort(producerConfig.getString(ProducerConfig.ACKS_CONFIG));return new Sender(参数省略);}
从前面的代码可知,ClientUtils.createNetworkClient()方法返回一个NetworkClient对象,kafkaClient是NetworkClient的父类,所以kafkaClient client即NetworkClient client。kafkaClient client赋值完成之后,接着是创建一个Sender对象,并返回。 因为Sender对象也只是一些初始化操作,所以这里也跳过。
KafkaProducer.newSender()方法返回一个Sender对象,然后回到KafkaProducer的构造方法:
KafkaProducer(入参省略) {try {// 此处省略多行代码this.sender = newSender(logContext, kafkaClient, this.metadata);String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;this.ioThread = new KafkaThread(ioThreadName, this.sender, true);this.ioThread.start();config.logUnused();AppInfoParser.registerAppInfo(JMX_PREFIX, clientId, metrics, time.milliseconds());log.debug("Kafka producer started");} catch (Throwable t) {// 此处省略多行代码}}
赋值sender之后,接下来是创建KafkaThread对象,构造方法如下:
public KafkaThread(final String name, Runnable runnable, boolean daemon) {super(runnable, name);configureThread(name, daemon);}
由此可以看出KafkaThread只是对线程做了一些附加的工作,KafkaThread对象创建完成,下一步就是执行start()方法。在KafkaThread的构造函数中传入的Runable参数是Sender对象,所以,我们需要去看下Sender的run()方法:
/*** The main run loop for the sender thread*/@Overridepublic void run() {log.debug("Starting Kafka producer I/O thread.");if (transactionManager != null)transactionManager.setPoisonStateOnInvalidTransition(true);// main loop, runs until close is calledwhile (running) {try {runOnce();} catch (Exception e) {log.error("Uncaught error in kafka producer I/O thread: ", e);}}log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");// okay we stopped accepting requests but there may still be// requests in the transaction manager, accumulator or waiting for acknowledgment,// wait until these are completed.while (!forceClose && ((this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0) || hasPendingTransactionalRequests())) {try {runOnce();} catch (Exception e) {log.error("Uncaught error in kafka producer I/O thread: ", e);}}// Abort the transaction if any commit or abort didn't go through the transaction manager's queuewhile (!forceClose && transactionManager != null && transactionManager.hasOngoingTransaction()) {if (!transactionManager.isCompleting()) {log.info("Aborting incomplete transaction due to shutdown");try {// It is possible for the transaction manager to throw errors when aborting. Catch these// so as not to interfere with the rest of the shutdown logic.transactionManager.beginAbort();} catch (Exception e) {log.error("Error in kafka producer I/O thread while aborting transaction when during closing: ", e);// Force close in case the transactionManager is in error states.forceClose = true;}}try {runOnce();} catch (Exception e) {log.error("Uncaught error in kafka producer I/O thread: ", e);}}if (forceClose) {// We need to fail all the incomplete transactional requests and batches and wake up the threads waiting on// the futures.if (transactionManager != null) {log.debug("Aborting incomplete transactional requests due to forced shutdown");transactionManager.close();}log.debug("Aborting incomplete batches due to forced shutdown");this.accumulator.abortIncompleteBatches();}try {this.client.close();} catch (Exception e) {log.error("Failed to close network client", e);}log.debug("Shutdown of Kafka producer I/O thread has completed.");}
由于这部分代码是重点,所以就没有对代码做简化。上面的代码可以看出,多次调用了runOnce()方法,所以我们来看下这个方法是在做什么:
/*** Run a single iteration of sending**/void runOnce() {if (transactionManager != null) {try {transactionManager.maybeResolveSequences();RuntimeException lastError = transactionManager.lastError();// do not continue sending if the transaction manager is in a failed stateif (transactionManager.hasFatalError()) {if (lastError != null)maybeAbortBatches(lastError);client.poll(retryBackoffMs, time.milliseconds());return;}if (transactionManager.hasAbortableError() && shouldHandleAuthorizationError(lastError)) {return;}// Check whether we need a new producerId. If so, we will enqueue an InitProducerId// request which will be sent belowtransactionManager.bumpIdempotentEpochAndResetIdIfNeeded();if (maybeSendAndPollTransactionalRequest()) {return;}} catch (AuthenticationException e) {// This is already logged as error, but propagated here to perform any clean ups.log.trace("Authentication exception while processing transactional request", e);transactionManager.authenticationFailed(e);}}long currentTimeMs = time.milliseconds();long pollTimeout = sendProducerData(currentTimeMs);client.poll(pollTimeout, currentTimeMs);}
上述代码中最重要的方法应该就是client.poll()了吧,查看poll()方法的注释信息,定义在KafkaClient中:
/*** Do actual reads and writes from sockets.** @param timeout The maximum amount of time to wait for responses in ms, must be non-negative. The implementation* is free to use a lower value if appropriate (common reasons for this are a lower request or* metadata update timeout)* @param now The current time in ms* @throws IllegalStateException If a request is sent to an unready node*/List<ClientResponse> poll(long timeout, long now);
上面注释表示该方法用于对报文进行读写工作。
好了,现在回到KafkaProducer的构造方法,当执行this.ioThread.start()代码之后,KafkaProducer对象的初始化基本上就算完成了。但是,你们发现没有,上面的代码执行流程,都没有发现连接kafka server的代码呢?
起初我怀疑是不是阅读源码时,把哪里的代码给遗漏了,于是又回头走了一遍,还是没发现连接server的过程。没办法了,开启debug模式吧。为了避免一步步debug,根据我的经验,在开启debug之前,我们可以回头看下,上述的各个java类中,哪一个类里面包含了连接server的方法,然后把断点加上去。
因为上述代码就只有几个类,寻找的过程还是很简单的。很快,我就锁定到Selector这个类里面,代码如下:
/*** Begin connecting to the given address and add the connection to this nioSelector associated with the given id* number.* <p>* Note that this call only initiates the connection, which will be completed on a future {@link #poll(long)}* call. Check {@link #connected()} to see which (if any) connections have completed after a given poll call.* @param id The id for the new connection* @param address The address to connect to* @param sendBufferSize The send buffer for the new connection* @param receiveBufferSize The receive buffer for the new connection* @throws IllegalStateException if there is already a connection for that id* @throws IOException if DNS resolution fails on the hostname or if the broker is down*/@Overridepublic void connect(String id, InetSocketAddress address, int sendBufferSize, int receiveBufferSize) throws IOException {ensureNotRegistered(id);SocketChannel socketChannel = SocketChannel.open();SelectionKey key = null;try {configureSocketChannel(socketChannel, sendBufferSize, receiveBufferSize);boolean connected = doConnect(socketChannel, address);key = registerChannel(id, socketChannel, SelectionKey.OP_CONNECT);if (connected) {// OP_CONNECT won't trigger for immediately connected channelslog.debug("Immediately connected to node {}", id);immediatelyConnectedKeys.add(key);key.interestOps(0);}} catch (IOException | RuntimeException e) {if (key != null)immediatelyConnectedKeys.remove(key);channels.remove(id);socketChannel.close();throw e;}}
看方法上面的注释,也很符合我的猜测,来吧,上断点。 然后查看断点处的线程栈:
没想到吧, 连接server的流程,是执行KafkaThread.start()方法才触发的。前面提到Sender.run()方法是重点,贴出的代码未作简化处理,原因正源于此。执行顺序:
run()->runOnce()->maybeSendAndPollTransactionalRequest()->.......
看下Sender.maybeSendAndPollTransactionalRequest()的源码:
/*** Returns true if a transactional request is sent or polled, or if a FindCoordinator request is enqueued*/private boolean maybeSendAndPollTransactionalRequest() {// 省略部分代码try {// 省略部分代码if (targetNode != null) {if (!awaitNodeReady(targetNode, coordinatorType)) {log.trace("Target node {} not ready within request timeout, will retry when node is ready.", targetNode);maybeFindCoordinatorAndRetry(nextRequestHandler);return true;}} else if (coordinatorType != null) {// 省略部分代码} else {// 省略部分代码}// 省略部分代码}}
进入Sender.awaitNodeReady()方法:
private boolean awaitNodeReady(Node node, FindCoordinatorRequest.CoordinatorType coordinatorType) throws IOException {if (NetworkClientUtils.awaitReady(client, node, time, requestTimeoutMs)) {if (coordinatorType == FindCoordinatorRequest.CoordinatorType.TRANSACTION) {// Indicate to the transaction manager that the coordinator is ready, allowing it to check ApiVersions// This allows us to bump transactional epochs even if the coordinator is temporarily unavailable at// the time when the abortable error is handledtransactionManager.handleCoordinatorReady();}return true;}return false;}
接着进入NetworkClientUtils.awaitReady() :
public static boolean awaitReady(KafkaClient client, Node node, Time time, long timeoutMs) throws IOException {if (timeoutMs < 0) {throw new IllegalArgumentException("Timeout needs to be greater than 0");}long startTime = time.milliseconds();if (isReady(client, node, startTime) || client.ready(node, startTime))return true;// 省略部分代码}
接着进入NetworkClientUtils.isReady():
public static boolean isReady(KafkaClient client, Node node, long currentTime) {client.poll(0, currentTime);return client.isReady(node, currentTime);}
接着进入NetworkClient.poll():
@Overridepublic List<ClientResponse> poll(long timeout, long now) {ensureActive();// 省略部分代码long metadataTimeout = metadataUpdater.maybeUpdate(now);long telemetryTimeout = telemetrySender != null ? telemetrySender.maybeUpdate(now) : Integer.MAX_VALUE;// 省略部分代码return responses;}
继续进入NetworkClient.DefaultMetadataUpdater.maybeUpdate()方法:
class DefaultMetadataUpdater implements MetadataUpdater {// 省略部分代码DefaultMetadataUpdater(Metadata metadata) {this.metadata = metadata;this.inProgress = null;}// 省略部分代码public long maybeUpdate(long now) {// 省略部分代码return maybeUpdate(now, leastLoadedNode.node());}}
继续进入NetworkClient.DefaultMetadataUpdater.maybeUpdate()方法:
private long maybeUpdate(long now, Node node) {// 省略部分代码if (connectionStates.canConnect(nodeConnectionId, now)) {// We don't have a connection to this node right now, make onelog.debug("Initialize connection to node {} for sending metadata request", node);initiateConnect(node, now);return reconnectBackoffMs;}return Long.MAX_VALUE;}
继续进入NetworkClient.initiateConnect()方法:
private void initiateConnect(Node node, long now) {String nodeConnectionId = node.idString();try {connectionStates.connecting(nodeConnectionId, now, node.host());InetAddress address = connectionStates.currentAddress(nodeConnectionId);log.debug("Initiating connection to node {} using address {}", node, address);// 这里就是连接server的终极入口了selector.connect(nodeConnectionId,new InetSocketAddress(address, node.port()),this.socketSendBuffer,this.socketReceiveBuffer);} catch (IOException e) {// 省略部分代码}}
好了,终于看到希望了,进入Selector.connect()方法,正是我之前打断点的代码,这里就不再占用篇幅了。
通过打断点跟踪的方式,终于找到了生产者连接server的过程。连接成功之后,就可以发送消息了。我们再回过头来看下ConsoleProducer.main()方法:
def main(args: Array[String]): Unit = {try {val config = new ProducerConfig(args)// 接受控制台输入val input = System.in// 连接serverval producer = new KafkaProducer[Array[Byte], Array[Byte]](producerProps(config))// 发送消息try loopReader(producer, newReader(config.readerClass, getReaderProps(config)), input, config.sync)finally producer.close()Exit.exit(0)} catch {// 省略部分代码}}
总结一下,main()方法就做了三件事:
- 接受控制台输入
- 连接server
- 发送消息
发送消息后续有机会再研究吧,本章内容到此完结,撒花!
这篇关于【Kafka源码走读】消息生产者与服务端的连接过程的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!