本文主要是介绍Aeron:Aeron Tooling,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
一、Aeron Stat
Aeron Stat 输出来自 Aeron 的关键计数器,以及所有活动流和最近活动流的位置和关键计数器。
要使用 Aeron Stat,您必须提供要检查的Media Driver文件夹,例如,如果您将Media Driver context配置为:
final MediaDriver.Context mediaDriverCtx = new MediaDriver.Context().aeronDirectoryName("/dev/shm/md");
那么提供给 AeronStat 的路径如下:
java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.AeronStat
输出(查看运行中的Archive Replication Client)
17:03:52 - Aeron Stat (CnC v0.2.0), pid 2771, heartbeat age 451ms
======================================================================
0: 60,704 - Bytes sent
1: 122,848 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 0 - NAKs sent
6: 0 - NAKs received
7: 1,875 - Status Messages sent
8: 941 - Status Messages received
9: 1,865 - Heartbeats sent
10: 3,610 - Heartbeats received
11: 0 - Retransmits sent
12: 0 - Flow control under runs
13: 0 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 0 - Failed attempts to free log buffers
18: 0 - Sender flow control limits, i.e. back-pressure events
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 0 - Client liveness timeouts
25: 0 - Resolution changes: driverName=null hostname=archive-client
26: 150,858,350 - Conductor max cycle time doing its work in ns: SHARED
27: 0 - Conductor work cycle exceeded threshold count: threshold=1000000000ns SHARED
28: 149,104,126 - Sender max cycle time doing its work in ns: SHARED
29: 0 - Sender work cycle exceeded threshold count: threshold=1000000000ns SHARED
30: 149,144,918 - Receiver max cycle time doing its work in ns: SHARED
31: 0 - Receiver work cycle exceeded threshold count: threshold=1000000000ns SHARED
32: 1,838,850 - NameResolver max time in ns
33: 0 - NameResolver exceeded threshold count
36: 1,692,637,432,558 - client-heartbeat: 1
52: 1 - rcv-channel: aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=10.1.0.4:0 10.1.0.4:45494
53: 1 - rcv-local-sockaddr: 52 10.1.0.4:45494
54: 1 - snd-channel: aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000 10.1.0.4:33378
55: 1 - snd-local-sockaddr: 54 10.1.0.4:33378
56: 448 - pub-pos (sampled): 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
57: 33,216 - pub-lmt: 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
58: 448 - snd-pos: 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
59: 32,768 - snd-lmt: 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
60: 0 - snd-bpe: 15 -1436025328 10 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=archive-backup:17000
61: 608 - sub-pos: 14 1817141198 20 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=10.1.0.4:0 @0
62: 608 - rcv-hwm: 17 1817141198 20 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=10.1.0.4:0
63: 608 - rcv-pos: 17 1817141198 20 aeron:udp?term-length=65536|sparse=true|mtu=1408|endpoint=10.1.0.4:0
64: 1 - rcv-channel: aeron:udp?endpoint=10.1.0.4:0 10.1.0.4:33933
65: 1 - rcv-local-sockaddr: 64 10.1.0.4:33933
66: 6,016 - sub-pos: 19 1817141199 200 aeron:udp?endpoint=10.1.0.4:0 @1280
67: 6,016 - rcv-hwm: 21 1817141199 200 aeron:udp?endpoint=10.1.0.4:0
68: 6,016 - rcv-pos: 21 1817141199 200 aeron:udp?endpoint=10.1.0.4:0
--
Core Counters
Row | Description |
---|---|
Top Line | 这里最重要的数据是hearbeat age - 这是自 cnc.dat 中上一次Media Driver心跳以来所经过的时间。如果这个数字很大(超过 1000 毫秒),请检查Media Driver是否仍在运行 |
0 | 当前Media Driver通过 UDP 发送的总字节数,不包括 IP headers。如果该数据没有按照应用程序预期的速度增加,则说明出了问题。 |
1 | 当前Media Driver通过 UDP 接收到的总字节数,不包括 IP headers。如果该数据没有按照应用程序预期的速度增加,则说明出了问题。 |
2 | 向Media Driver's Receiver Proxy发出的请求失败;这表明存在背压 |
3 | 向Media Driver's Sender Proxy发出的请求失败;这表明存在背压 |
4 | 向Media Driver's Conductor Proxy发出的请求失败;这表明存在背压 |
5 | 发送 NAK 的总数。这是该Media Driver为请求丢失数据包而发送 NAK 的次数。 |
6 | 收到的 NAK 总数。这是该Media Driver收到 NAK 的次数,以便向远程Media Driver重放丢失的数据包。 |
7 | 已发送的状态信息(Status Messages sent.)。这是该Media Driver为流量控制而发送的状态信息数量的运行计数。随着时间的推移,该计数应该会增加。 |
8 | 收到的状态信息(Status Messages received.)。这是该Media Driver接收到的用于流量控制的状态信息数量的运行计数。随着时间的推移,该计数应该会增加。 |
9 | 已发送心跳(Heartbeats sent.)。这是当没有数据可发送时,该Media Driver为向另一个Media Driver显示有效性而发送的心跳次数。随着时间的推移,该计数应该会增加。 |
10 | 收到的心跳(Heartbeats received.)。这是当没有数据可发送时,该Media Driver从另一个Media Driver接收到的心跳次数。随着时间的推移,该计数应该会增加。 |
11 | 已发送的重传。这是该Media Driver因 NAK 消息而发送的数据包重传次数。在一个健康的网络中(以及运行良好的进程中),该值通常为零或很低。(Retransmits sent. This is how many packet retransmits have been sent by this Media Driver as a result of a NAK message. This will typically stay zero or very low in a healthy network (and with well behaved processes).) |
12 | 流量控制不足。这是在当前流量控制窗口下运行的数据包计数。(Flow control under runs. This is the count of packets which under-run the current flow control window for Images) |
13 | 流量控制超时。这是超过当前流量控制窗口的数据包计数。(Flow control over runs. This is the count of packets which over-run the current flow control window for Images) |
14 | 该Media Driver接收到的无效数据包计数(Count of invalid packets received by this Media Driver) |
15 | 该Media Driver观察到的错误计数。ErrorStat(见下文)将提供详细信息。(Count of errors observed by this Media Driver. ErrorStat (see below) will provide details.) |
16 | 短发送计数。当Media Driver's Sender代理希望通过网络发送给定缓冲区的数据,但套接字没有从缓冲区中获取所有数据时,就会发生短发送。通常情况下,Aeron 会对此进行恢复。当这种情况增加到一个较低的数字后,要解决的问题就会变得复杂,原因可能是缓冲区大小不正确,也可能是网络设备故障。首先要查看的通常是网络缓冲区大小的设置:aeron.socket.so_rcvbuf 和 aeron.socket.so_sndbuf 。aeron.rcv.initial.window.length 必须小于或等于 aeron.socket.so_rcvbuf 。正确调整大小是一门艺术,在 RTT 差异较大的网络中尤其具有挑战性。另请参阅 Bandwidth Delay Product。注意:您可能需要更新操作系统中的最大套接字缓冲区大小。(Short send count. A short send happens when the Media Driver's Sender agent expects to send a given buffer of data over the network, but the socket did not take all the data from the buffer. Typically, Aeron will recover from this. When this increases beyond a low number, it can be a complex problem to solve with causes ranging from incorrect buffer sizing to network equipment failure. The first place to look is typically the settings for the network buffer sizes: aeron.socket.so_rcvbuf and aeron.socket.so_sndbuf . aeron.rcv.initial.window.length must be less than or equal to aeron.socket.so_rcvbuf . Correct sizing can be an art, and can be especially challenging in a network with a large RTT variance. See also Bandwidth Delay Product. Note: you may need to update maximum socket buffer sizes in your operating system.) |
17 | Media Driver无法释放日志缓冲区的次数(The number of times the Media Driver could not free a log buffer) |
18 | 所有流的背压事件总数。See also Back pressure(Total number of back-pressure events over all streams. See also Back pressure) |
19 | 客户端在超时时间内commit() or abort() a tryClaim 失败后,publication 被解除阻塞的次数(see Publication TryClaim and Log Buffer Unblocking)。(Count of times a publication has been unblocked after a client failed to commit() or abort() a tryClaim within timeout (see Publication TryClaim and Log Buffer Unblocking)) |
20 | 客户未能在超时内完成offer后,命令被解除锁定的次数(Count of times a command has been unblocked after a client failed to complete an offer within a timeout) |
21 | 通道端点检测到其配置与连接之间可能存在 TTL 不对称的次数(The number of times a channel endpoint detected a possible TTL asymmetry between its config and a connection) |
23 | 这是在禁用 NAK 时填补损失缺口的次数(This is the number of times a loss gap has been filled when NAKs have been disabled) |
24 | 在未优雅关闭的情况下超时的 Aeron 客户端数量(如该Media Driver的 Aeron 客户端)。(The number of Aeron clients that have timed out without a graceful close (as in Aeron clients of this Media Driver)) |
25 | 端点重新解析(即名称解析name resolution)导致变更的次数(The number of times the endpoints have been re-resolved (i.e. name resolution) resulting in a change) |
26 | conductor工作周期的最大时间(纳秒)。Found in Aeron 1.33.0+(The maximum time taken in a conductor duty cycle in nanoseconds. Found in Aeron 1.33.0+) |
27 | conductor工作周期时间超过可配置阈值(默认为 1 秒)的次数。Found in Aeron 1.33.0+(The number of times the time spent in a conductor duty cycle exceeded a configurable threshold (1s default). Found in Aeron 1.33.0+) |
28 | sender工作周期的最长时间(纳秒)。(The maximum time taken in a sender duty cycle in nanoseconds.) |
29 | sender工作周期时间超过可配置阈值(默认为 1 秒)的次数。(The number of times the time spent in a sender duty cycle exceeded a configurable threshold (1s default).) |
30 | receiver工作周期的最长时间(纳秒)。(The maximum time taken in a receiver duty cycle in nanoseconds.) |
31 | receiver工作周期时间超过可配置阈值(默认为 1 秒)的次数。(The number of times the time spent in a receiver duty cycle exceeded a configurable threshold (1s default).) |
32 | Name Resolution所需的最长时间(纳秒)。Found in Aeron 1.42.0+(The maximum time taken for Name Resolution in nanoseconds. Found in Aeron 1.42.0+) |
33 | Name Resolution所用时间超过可配置阈值的次数。Found in Aeron 1.42.0+(The number of times the time spent in Name Resolution exceeded a configurable threshold. Found in Aeron 1.42.0+) |
Variable Counters
Row | Description |
---|---|
36 in above example; varies | 来自指定客户端的最后一次客户端心跳的毫秒值。此处的客户端是Media Driver上的 Aeron 客户端。(Epoch millisecond value of the last client heartbeat from the given client. The client in this context is the Aeron Client on the Media Driver.) |
52 in above example; varies | Receive channel |
53 in above example; varies | Receive socket address |
54 in above example; varies | Send channel |
55 in above example; varies | Send socket address |
第 31 至 45 行包含位置值。有关如何理解这些值的更多信息,请参阅 Understanding Aeron Position。带有 @
的行,如第 32 行中的 sub-pos
,指的是订阅的连接位置—在本例中,订阅在位置 0
处连接。
注:Aeron Stat 工具有一个 C 语言版本。它是用 C Media Driver编译和构建的。See C Media Driver.
AeronStat options
Arg | Description |
---|---|
-h | Shows the help text |
watch=true or false | 如果设置为 true,则每 n 秒刷新一次。如果设置为 false,则运行一次后退出。默认为 true。(If set to true, refreshes every n seconds. If set to false, runs once and exits. Defaults to true.) |
delay=seconds | 指定刷新输出的频率。更新间隔的延迟时间(以秒为单位)。仅当 watch=true 时有效(或未指定 watch)(Specifies how often to refresh the output. Delay in seconds between update. Valid only if watch=true (or watch not specified)) |
stream={regex} | 只过滤与 regex 匹配的数据流。例如:stream=101(Filters streams to only those that match the regex. Example: stream=101) |
type={regex} | 筛选输出类型(如计数器类型),只筛选符合以下条件的类型(Filters output type (as in the counter type) to only those that match) |
session={regex} | Filters sessions to only those that match |
channel={regex} | Filters channels to only those that match |
identity={regex} | Filters identity to only those that match |
二、Error Stat
Error Stat 可打印 Aeron 进程中出现的所有错误。与 AeronStat 一样,您必须将 ErrorStat 指向Media Driver目录。
java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.ErrorStat
当一切按预期运行时,错误统计将产生以下输出:
0 distinct errors observed.
Note: There is a C version of the Error Stat tool. It's compiled and built with the C Media Driver. See C Media Driver.
三、Stream Stat
Stream Stat 位于 Aeron samples 目录中,可从 aeron-all jar 启动,如下所示。与 AeronStat 一样,必须将 StreamStat 指向Media Driver 目录。
java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.StreamStat
Stream stat 提供了媒Media Driver中每个流的视图,包括publisher和sender视图。该视图与 aeron stat 很相似,只是视图是扁平的。为便于在页面上显示,单行 2 被分成下面的第 2-10 行。
Command `n Control file /dev/shm/md/cnc.dat
sessionId=-1245628686 streamId=10 channel=aeron:udp?endpoint=localhost:40123 : pub-pos (sampled):3:320 pub-lmt:3:8388992 snd-pos:3:384 snd-lmt:3:131456 sub-pos:1:384 rcv-hwm:4:384 rcv-pos:4:384
四、Backlog Stat
Backlog Stat 是一款突出显示数据流积压情况的工具。它可在 IPC 和 UDP 通道上运行。与 AeronStat 一样,您必须将 BacklogStat 指向Media Driver目录。
java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.BacklogStat
Sample output:
sessionId=1155221173 streamId=8 channel=aeron:udp?endpoint=10.1.1.1:4000 :
┌─for publisher 77 the last sampled position is 187392 (~0 bytes before back-pressure)
└─sender 77 has to send 0 bytes (2031779 butes remaining in the sender window)sessionId=-614368527 streamId=9 channel=aeron:udp?endpoint=10.1.1.1:4001 :
┌─for publisher 6333 the last sampled position is 12739208 (~0 bytes before back-pressure)
└─sender 6333 has to send 65373 bytes (2031779 butes remaining in the sender window)
该工具可突出显示指定通道中的数据积压问题。在上面运行的示例中,顶部会话没有积压数据,而底部会话有 65373 字节的未清积压数据。利用这些信息调查网络、进程和/或设计(network, process and/or design)问题。
五、Loss Stat
LossStat 会记录 Aeron 遭受的所有数据丢失事件。请注意,IPC 数据不会丢失,也不会出现在 LossStat 中。与 AeronStat 一样,您必须将 LossStat 指向Media Driver目录。
java -cp aeron-all-*.jar -Daeron.dir=/dev/shm/md io.aeron.samples.LossStat
An example run:
#OBSERVATION_COUNT,TOTAL_BYTES_LOST,FIRST_OBSERVATION,LAST_OBSERVATION,SESSION_ID,STREAM_ID,CHANNEL,SOURCE
688,4167028,2020-08-16 13:53:39.053+0000,2020-08-16 13:53:41.003+0000,1155221173,8,aeron:udp?endpoint=10.1.1.1:4000;10.1.1.2:60950
这将告诉我们以下有关流 8 ⤌⤍ 10.1.1.2:60950 流量上通道 aeron:udp?endpoint=10.1.1.1:4000 的会话 1155221173 的信息:
- there were 688 data loss events
- 共影响 4,167,028 个字节
- the loss first happened at 2020-08-16 16:53:39.053+0000
- the last loss happened at 2020-08-16 16:53:41.003+0000
有了这些信息,您就可以在这些时间段内调查任何网络或主机问题。请注意,少量损失是相当常见的。
Note: There is a C version of the Loss Stat tool. It's compiled and built with the C Media Driver. See C Media Driver.
六、Log Inspector
Log Inspector 位于 Aeron samples 文件夹中,可从 aeron-all jar 启动,如下所示。您必须将Log Inspector 直接指向一个 LogBuffer 文件。
java -cp aeron-all-*.jar io.aeron.samples.LogInspector <logbuffer file>
日志检查器(Log Inspector )允许我们检查日志缓冲区(Log Buffer )文件,包括:
- if the log buffer is connected
- log buffer经过了多少term(how many terms the log buffer has been through (see Log Buffers & Images))
- log buffer中3个term的状态(the state of the 3 terms in the log buffer)
- 和术语(term)内的数据,以十六进制转储。其中包括产生数据的会话和数据流的详细信息。(and the data within a term, dumped as hex. This includes details on which session and stream produced the data.)
======================================================================
Thu Dec 31 09:46:19 EST 2020 Inspection dump for 3.logbuffer
======================================================================Is Connected: true
Initial term id: -1822262504Term Count: 20Active index: 2Term length: 67108864MTU length: 1408Page Size: 4096EOS Position: 9223372036854775807default DATA Header{frame-length=0 version=0 flags=11000000 type=1 term-offset=0 session-id=301746870 stream-id=10 term-id=-1822262504 reserved-value=0}Index 0 Term Meta Data termOffset=67108928 termId=-1822262486 rawTail=-7826557782030548928 position=1275068416
Index 1 Term Meta Data termOffset=67108928 termId=-1822262485 rawTail=-7826557777735581632 position=1342177280
Index 2 Term Meta Data termOffset=1822720 termId=-1822262484 rawTail=-7826557773505900544 position=1344000000======================================================================
Index 0 Term Data0: DATA Header{frame-length=0 version=0 flags=00000000 type=0 term-offset=0 session-id=0 stream-id=0 term-id=0 reserved-value=0}
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000======================================================================
Index 1 Term Data0: DATA Header{frame-length=0 version=0 flags=00000000 type=0 term-offset=0 session-id=0 stream-id=0 term-id=0 reserved-value=0}
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000======================================================================
Index 2 Term Data0: DATA Header{frame-length=36 version=0 flags=11000000 type=1 term-offset=0 session-id=301746870 stream-id=10 term-id=-1822262484 reserved-value=0}
02004001
64: DATA Header{frame-length=36 version=0 flags=11000000 type=1 term-offset=64 session-id=301746870 stream-id=10 term-id=-1822262484 reserved-value=0}
03004001
...
这篇关于Aeron:Aeron Tooling的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!