【Hadoop】Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)中一些知识点

本文主要是介绍【Hadoop】Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)中一些知识点,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(一)

Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(二)

Flume Properties

Property Name

Default 

Description

flume.called.from.service

–                                                      

If this property is specified then the Flume agent will continue polling for the config file even if the config file is not found at the expected location. Otherwise, the Flume agent will terminate if the config doesn’t exist at the expected location. No property value is needed when setting this property (eg, just specifying -Dflume.called.from.service is enough)

如果这个属性被指定了,那么Flume agent会轮询配置文档即使在指定路径找不到该文档。此外,FLume agent将会结束如果配置文档不在指定位置上。这个属性不需要设置值(例如,只是指定-Dflume.called.from.service就足够了)

Property: flume.called.from.service

Flume periodically polls, every 30 seconds, for changes to the specified config file. A Flume agent loads a new configuration from the config file if either an existing file is polled for the first time, or if an existing file’s modification date has changed since the last time it was polled. Renaming or moving a file does not change its modification time. When a Flume agent polls a non-existent file then one of two things happens: 1. When the agent polls a non-existent config file for the first time, then the agent behaves according to the flume.called.from.service property. If the property is set, then the agent will continue polling (always at the same period – every 30 seconds). If the property is not set, then the agent immediately terminates. ...OR... 2. When the agent polls a non-existent config file and this is not the first time the file is polled, then the agent makes no config changes for this polling period. The agent continues polling rather than terminating.

Flume每30秒周期轮询配置文档是否改变。如果一个文档是第一次被轮询到或者在上次轮询后修改时间被改变了,那么Flume agent会加载新的配置文档。重命名和移动一个文档不会改变文档的修改时间。当一个Flume agent轮询一个不存在的文档会出现以下两种情况的一种:1. 当在指定目录下轮询不到配置文件时,agent会根据flume.called.from.service property这个属性来决定他的行为。如果这个属性设置了,那么会以30秒为周期地进行轮询;如果没设置,那么找不到就立即停止。2. 如果agent在加载过配置文件后在指定路径轮询不到文件的话,那么将不会做任何改变,然后继续轮询。

Log4J Appender(Log4J 日志存储器)

Appends Log4j events to a flume agent’s avro source. A client using this appender must have the flume-ng-sdk in the classpath (eg, flume-ng-sdk-1.8.0-SNAPSHOT.jar). Required properties are in bold.

将Log4j events 添加到一个flume agent的avro source。一个客户端想要使用这个appender必须要有 flume-ng-sdk在类路径下(例如flume-ng-sdk-1.8.0-SNAPSHOT.jar)。必须要的属性用黑体加粗。

Property Name

Default

Description

Hostname

–             

The hostname on which a remote Flume agent is running with an avro source.

运行avro source的远程Flumeagent的主机名

Port

The port at which the remote Flume agent’s avro source is listening.

远程Flume agent的avro source所监听的端口

UnsafeMode

false           

If true, the appender will not throw exceptions on failure to send the events.

如果设为真,appender将不会在发送events失败时抛出异常。

AvroReflectionEnabled

false

Use Avro Reflection to serialize Log4j events. (Do not use when users log strings)

使用 Avro反射来序列化 Log4j events。

AvroSchemaUrl

A URL from which the Avro schema can be retrieved.

一个用来恢复数据的URL,该URL是从 Avro schema来的。

Sample log4j.properties file:

#...log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppenderlog4j.appender.flume.Hostname = example.comlog4j.appender.flume.Port = 41414log4j.appender.flume.UnsafeMode = true# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...

By default each event is converted to a string by calling toString(), or by using the Log4j layout, if specified.

If the event is an instance of org.apache.avro.generic.GenericRecord, org.apache.avro.specific.SpecificRecord, or if the property AvroReflectionEnabled  is set to true then the event will be serialized using Avro serialization.

Serializing every event with its Avro schema is inefficient, so it is good practice to provide a schema URL from which the schema can be retrieved by the downstream sink, typically the HDFS sink. If AvroSchemaUrl is not specified, then the schema will be included as a Flume header.

Sample log4j.properties file configured to use Avro serialization:

每个event默认都可以通过toString()来转换成字符串,或者有指定的话可用Log4j layout。

如果event是一个org.apache.avro.generic.GenericRecord, org.apache.avro.specific.SpecificRecord类的实例,或者它的属性AvroReflectionEnabled的值为true,那么会使用Avro serialization进行序列化。

对每个event和它的Avro schema进行序列化是低效的,所以,一个好的实践是提供一个可以从下流的sink中恢复的schemaURL,通常是HDFS sink。如果没有指定AvroSchemaUrl的话,schema会被纳入到Flume haeder。

一个使用Avro serialization的log4j属性文档的例子如下:

#...log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppenderlog4j.appender.flume.Hostname = example.comlog4j.appender.flume.Port = 41414log4j.appender.flume.AvroReflectionEnabled = truelog4j.appender.flume.AvroSchemaUrl = hdfs://namenode/path/to/schema.avsc# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...

Load Balancing Log4J Appender

Appends Log4j events to a list of flume agent’s avro source. A client using this appender must have the flume-ng-sdk in the classpath (eg, flume-ng-sdk-1.8.0-SNAPSHOT.jar). This appender supports a round-robin and random scheme for performing the load balancing. It also supports a configurable backoff timeout so that down agents are removed temporarily from the set of hosts .Required properties are in bold.

将Log4j events 添加到一个flume agent的avro source。一个客户端想要使用这个appender必须要有 flume-ng-sdk在类路径下(例如flume-ng-sdk-1.8.0-SNAPSHOT.jar)。这个日志存储器支持一个循环和随机计划来执行负载均衡。它也支持一个可配置的退避超时以便将在冲突中被击败的agent被暂时移除。黑体字标注的属性是必须要的。

Property Name

Default

Description

Hosts

A space-separated list of host:port at which Flume (through an AvroSource) is listening for events

列出监听events的主机列表,每个host:port用空格隔开。

Selector

ROUND_ROBIN 

Selection mechanism. Must be either ROUND_ROBIN, RANDOM or custom FQDN to class that inherits from LoadBalancingSelector.

选择机制。必须从ROUND_ROBIN,RANDOM或者继承LoadBalancingSelector的自定义FQDN类。

MaxBackoff

–                       

A long value representing the maximum amount of time in milliseconds the Load balancing client will backoff from a node that has failed to consume an event. Defaults to no backoff.

这个值代表以毫秒为单位的退避超时最大值,也就是当一个节点在消费event时失效了,等待超时时间再进行重发event。默认是没有退避的

UnsafeMode

false

If true, the appender will not throw exceptions on failure to send the events.

如果设为真,appender将不会在发送events失败时抛出异常。

AvroReflectionEnabled

false

Use Avro Reflection to serialize Log4j events.

使用 Avro反射来序列化 Log4j events。

AvroSchemaUrl

A URL from which the Avro schema can be retrieved.

一个用来恢复数据的URL,该URL是从 Avro schema来的。

Sample log4j.properties file configured using defaults:

#...log4j.appender.out2 = org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppenderlog4j.appender.out2.Hosts = localhost:25430 localhost:25431# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...Sample log4j.properties file configured using RANDOM load balancing:#...log4j.appender.out2 = org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppenderlog4j.appender.out2.Hosts = localhost:25430 localhost:25431log4j.appender.out2.Selector = RANDOM# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...Sample log4j.properties file configured using backoff:#...log4j.appender.out2 = org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppenderlog4j.appender.out2.Hosts = localhost:25430 localhost:25431 localhost:25432log4j.appender.out2.Selector = ROUND_ROBINlog4j.appender.out2.MaxBackoff = 30000# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...

Security(安全性)

The HDFS sink, HBase sink, Thrift source, Thrift sink and Kite Dataset sink all support Kerberos authentication. Please refer to the corresponding sections for configuring the Kerberos-related options.

Flume agent will authenticate to the kerberos KDC as a single principal, which will be used by different components that require kerberos authentication. The principal and keytab configured for Thrift source, Thrift sink, HDFS sink, HBase sink and DataSet sink should be the same, otherwise the component will fail to start.

HDFS sink、HBase sink、Thrift source、Thrift sink和Kite Dataset sink支持Kerberos认证。请参考配置Kerberos相关选项的章节。

当agent中的不同组件需要kerberos验证,Flume agent会作为kerberos KDC验证的主体。Thrift source, Thrift sink, HDFS sink, HBase sink and DataSet sink的密钥和主体都应该相同,否则组件无法启动。

这篇关于【Hadoop】Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)中一些知识点的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1070222

相关文章

Hadoop企业开发案例调优场景

需求 (1)需求:从1G数据中,统计每个单词出现次数。服务器3台,每台配置4G内存,4核CPU,4线程。 (2)需求分析: 1G / 128m = 8个MapTask;1个ReduceTask;1个mrAppMaster 平均每个节点运行10个 / 3台 ≈ 3个任务(4    3    3) HDFS参数调优 (1)修改:hadoop-env.sh export HDFS_NAMENOD

Hadoop集群数据均衡之磁盘间数据均衡

生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

hadoop开启回收站配置

开启回收站功能,可以将删除的文件在不超时的情况下,恢复原数据,起到防止误删除、备份等作用。 开启回收站功能参数说明 (1)默认值fs.trash.interval = 0,0表示禁用回收站;其他值表示设置文件的存活时间。 (2)默认值fs.trash.checkpoint.interval = 0,检查回收站的间隔时间。如果该值为0,则该值设置和fs.trash.interval的参数值相等。

Hadoop数据压缩使用介绍

一、压缩原则 (1)运算密集型的Job,少用压缩 (2)IO密集型的Job,多用压缩 二、压缩算法比较 三、压缩位置选择 四、压缩参数配置 1)为了支持多种压缩/解压缩算法,Hadoop引入了编码/解码器 2)要在Hadoop中启用压缩,可以配置如下参数

基本知识点

1、c++的输入加上ios::sync_with_stdio(false);  等价于 c的输入,读取速度会加快(但是在字符串的题里面和容易出现问题) 2、lower_bound()和upper_bound() iterator lower_bound( const key_type &key ): 返回一个迭代器,指向键值>= key的第一个元素。 iterator upper_bou

活用c4d官方开发文档查询代码

当你问AI助手比如豆包,如何用python禁止掉xpresso标签时候,它会提示到 这时候要用到两个东西。https://developers.maxon.net/论坛搜索和开发文档 比如这里我就在官方找到正确的id描述 然后我就把参数标签换过来

计算机毕业设计 大学志愿填报系统 Java+SpringBoot+Vue 前后端分离 文档报告 代码讲解 安装调试

🍊作者:计算机编程-吉哥 🍊简介:专业从事JavaWeb程序开发,微信小程序开发,定制化项目、 源码、代码讲解、文档撰写、ppt制作。做自己喜欢的事,生活就是快乐的。 🍊心愿:点赞 👍 收藏 ⭐评论 📝 🍅 文末获取源码联系 👇🏻 精彩专栏推荐订阅 👇🏻 不然下次找不到哟~Java毕业设计项目~热门选题推荐《1000套》 目录 1.技术选型 2.开发工具 3.功能

论文翻译:arxiv-2024 Benchmark Data Contamination of Large Language Models: A Survey

Benchmark Data Contamination of Large Language Models: A Survey https://arxiv.org/abs/2406.04244 大规模语言模型的基准数据污染:一项综述 文章目录 大规模语言模型的基准数据污染:一项综述摘要1 引言 摘要 大规模语言模型(LLMs),如GPT-4、Claude-3和Gemini的快

flume系列之:查看flume系统日志、查看统计flume日志类型、查看flume日志

遍历指定目录下多个文件查找指定内容 服务器系统日志会记录flume相关日志 cat /var/log/messages |grep -i oom 查找系统日志中关于flume的指定日志 import osdef search_string_in_files(directory, search_string):count = 0

Maven创建项目中的groupId, artifactId, 和 version的意思

文章目录 groupIdartifactIdversionname groupId 定义:groupId 是 Maven 项目坐标的第一个部分,它通常表示项目的组织或公司的域名反转写法。例如,如果你为公司 example.com 开发软件,groupId 可能是 com.example。作用:groupId 被用来组织和分组相关的 Maven artifacts,这样可以避免