【Hadoop】Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(二)

2024-06-17 18:32

本文主要是介绍【Hadoop】Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(二),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(一)

Logging raw data(记录原始数据)

Logging the raw stream of data flowing through the ingest pipeline is not desired behaviour in many production environments because this may result in leaking sensitive data or security related configurations, such as secret keys, to Flume log files. By default, Flume will not log such information. On the other hand, if the data pipeline is broken, Flume will attempt to provide clues for debugging the problem.

One way to debug problems with event pipelines is to set up an additional Memory Channel connected to a Logger Sink, which will output all event data to the Flume logs. In some situations, however, this approach is insufficient.

In order to enable logging of event- and configuration-related data, some Java system properties must be set in addition to log4j properties.

To enable configuration-related logging, set the Java system property -Dorg.apache.flume.log.printconfig=true. This can either be passed on the command line or by setting this in the JAVA_OPTS variable in flume-env.sh.

To enable data logging, set the Java system property -Dorg.apache.flume.log.rawdata=true in the same way described above. For most components, the log4j logging level must also be set to DEBUG or TRACE to make event-specific logging appear in the Flume logs.

Here is an example of enabling both configuration logging and raw data logging while also setting the Log4j loglevel to DEBUG for console output:

通过摄取管道获取记录到Flume log文件的原始数据流大多不会描述生产环境的行为因为数据里面可能包含敏感数据或者安全相关的配置,例如安全密钥。默认情况,Flume不会记录这些信息。另一方面,如果数据管道损坏,Flume会试图提供一些线索来调试问题。

一个调试事件管道的方法是设置一个额外的内存channel来连接Logger Sink,用来将所有事件数据都记录到Flume log。然而,在一些情况之下,这种方法还是不足以解决问题。

为了能够记录配置相关的日志,设置-Dorg.apache.flume.log.printconfig=true。这个也可以通过命令行或者在flume-env.sh设置JAVA_OPTS属性。

为了能够记录数据,通过跟上面相同的方式来设置-Dorg.apache.flume.log.rawdata=true 。对于大部分组件来说,log4j的打印级别必须设置为DEBUG或者TRACE来让指定事件的日志信息出现在Flume log中。

下面是一个例子能够保证将配置信息和原始数据在log4j的打印级别设置在DEBUG的情况下输出到控制台:

$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=DEBUG,console-Dorg.apache.flume.log.printconfig=true
-Dorg.apache.flume.log.rawdata=true                   

Zookeeper based ConfigurationZookeeper基础配置)

Flume supports Agent configurations via Zookeeper. This is an experimental feature. The configuration file needs to be uploaded in the Zookeeper, under a configurable prefix. The configuration file is stored in Zookeeper Node data. Following is how the Zookeeper Node tree would look like for agents a1 and a2

Flume支持通过ZooKeeper来配置Agent。这是个实验性的特性。配置文档必须上传到ZooKeeper中,(在一个可配置的前缀下)。这个配置文档存储在ZooKeeper节点数据下。下面是ZooKeeper下的节点树结构:

- /flume|- /a1 [Agent config file]|- /a2 [Agent config file]

Once the configuration file is uploaded, start the agent with following options

一旦配置文档上传完成,通过下面选项来启动agent

$ bin/flume-ng agent –conf conf -z zkhost:2181,zkhost1:2181 -p /flume –name a1 -Dflume.root.logger=INFO,console

Argument Name

Default

Description

z

Zookeeper connection string. Comma separated list of hostname:port

p

/flume

Base Path in Zookeeper to store Agent configurations

Installing third-party plugins(安装第三方插件)

Flume has a fully plugin-based architecture. While Flume ships with many out-of-the-box sources, channels, sinks, serializers, and the like, many implementations exist which ship separately from Flume.

While it has always been possible to include custom Flume components by adding their jars to the FLUME_CLASSPATH variable in the flume-env.sh file, Flume now supports a special directory called plugins.d which automatically picks up plugins that are packaged in a specific format. This allows for easier management of plugin packaging issues as well as simpler debugging and troubleshooting of several classes of issues, especially library dependency conflicts.

Flume拥有一个全备的插件架构。虽然Flume自带许多开箱即用的sources、channel,sinks和serializers等,同时也存在许多跟Flume之外的实现。

Flume曾经支持在flume-env.sh中的FLUME_CLASSPATH中添加一些自定义的Flume组件,现在Flume支持一个特殊路径plugins.d自动地安装那些按照指定格式存储的插件。

The plugins.d directory(插件目录)

The plugins.d directory is located at $FLUME_HOME/plugins.d. At startup time, the flume-ng start script looks in the plugins.d directory for plugins that conform to the below format and includes them in proper paths when starting up java.

这个plugins.d目录位于$FLUME_HOME/plugins.d。在启动的时候,flume-ng启动脚本扫描plugins.d目录下的遵循格式的插件并在启动java时将它们放在合适的路径。

Directory layout for plugins(插件的底层目录)

Each plugin (subdirectory) within plugins.d can have up to three sub-directories:

    1. lib - the plugin’s jar(s)
    2. libext - the plugin’s dependency jar(s)
    3. native - any required native libraries, such as .so files

Example of two plugins within the plugins.d directory:

plugins.d下的每个插件(子目录)包含三个子目录:

    1. lib – 插件的jar包
    2. libext – 插件的依赖jar包
    3. native – 任何需要的本地库,例如.so文档。

plugins.d目录下的两个插件:

plugins.d/plugins.d/custom-source-1/plugins.d/custom-source-1/lib/my-source.jarplugins.d/custom-source-1/libext/spring-core-2.5.6.jarplugins.d/custom-source-2/plugins.d/custom-source-2/lib/custom.jarplugins.d/custom-source-2/native/gettext.so

Data ingestion(数据获取)

Flume supports a number of mechanisms to ingest data from external sources.

Flume支持从外部来源获取数据的一系列机制。

RPC

An Avro client included in the Flume distribution can send a given file to Flume Avro source using avro RPC mechanism:

Flume 中的Avro client 可以用avro RPC 机制来发送一个给定文档给Flume Avro source。

$ bin/flume-ng avro-client -H localhost -p 41414 -F /usr/logs/log.10

The above command will send the contents of /usr/logs/log.10 to to the Flume source listening on that ports.

上面的命令行发送/usr/logs/log.10的内容给监听在那个端口的Flume source。

Executing commands(执行命令行)

There’s an exec source that executes a given command and consumes the output. A single ‘line’ of output ie. text followed by carriage return (‘\r’) or line feed (‘\n’) or both together.

有一个执行source来执行给出的命令和消费输出。输出的一行文本带着(‘\r’)或者(‘\n’)或者两者皆有。

Note : Flume does not support tail as a source. One can wrap the tail command in an exec source to stream the file.

说明:Fluem不支持一个结尾符作为一个资源。所以可以用一个可执行的源码来包装结尾命令输出文件。)

Network streams

Flume supports the following mechanisms to read data from popular log stream types, such as:

Flume支持下面的机制来读取受欢迎的日志类型,例如:

    1. Avro
    2. Thrift
    3. Syslog
    4. Netcat

Setting multi-agent flow(设置多个agent流)

 

In order to flow the data across multiple agents or hops, the sink of the previous agent and source of the current hop need to be avro type with the sink pointing to the hostname (or IP address) and port of the source.

为了让数据可以流过多个agents或者hops,前面那个agent的sink和当前的hop的source都必须是avro类型并且sink还要指向source的主机名(IP地址)和端口。

Consolidation(结合)

A very common scenario in log collection is a large number of log producing clients sending data to a few consumer agents that are attached to the storage subsystem. For example, logs collected from hundreds of web servers sent to a dozen of agents that write to HDFS cluster.

一个日志收集中非常常见的情形是大量日志发送到一些消费数据的绑定到子存储系统的agent上。例如,从上百个web 服务器日志收集而来日志发送到一打agents写到HDFS 集群

 

This can be achieved in Flume by configuring a number of first tier agents with an avro sink, all pointing to an avro source of single agent (Again you could use the thrift sources/sinks/clients in such a scenario). This source on the second tier agent consolidates the received events into a single channel which is consumed by a sink to its final destination.

这个可以在Flume中配置第一层包含avro sink的agents,所有的sink都执行一个单独的拥有avro source的agent(你也可以在这样的情形下使用thrift sources/sinks/cleints)。这个在第二层agent中的source将接收到的数据存储在一个channel中用来给sink输入到最终的目的。

Multiplexing the flow(选择分流)

Flume supports multiplexing the event flow to one or more destinations. This is achieved by defining a flow multiplexer that can replicate or selectively route an event to one or more channels.

Flume支持将事件流向一个或者多个目的地。这个可以通过定义一个流的能够复制或者可选路径的多路选择器来将事件导向一个或者多个Channel来实现。

 

The above example shows a source from agent “foo” fanning out the flow to three different channels. This fan out can be replicating or multiplexing. In case of replicating flow, each event is sent to all three channels. For the multiplexing case, an event is delivered to a subset of available channels when an event’s attribute matches a preconfigured value. For example, if an event attribute called “txnType” is set to “customer”, then it should go to channel1 and channel3, if it’s “vendor” then it should go to channel2, otherwise channel3. The mapping can be set in the agent’s configuration file.

上面那个例子展示“foo”agent中的source将事件流分到三个不同的Channel。这个分流可以是复制或者多路复用。在复制流的情况下,每个实践都会被发送到三个channel中。对于分流的情况,一个事件将会匹配与配置好的value来发送到可达的channel中。例如,假如一个事件的属性“txnType”设为“customer”,那么它将被发送到channel1和channel3,如果值为“vendor”,那么会被送到channel2,否则就去channel3。这个映射关系可以在agent的配置文档中设置。

Configuration(配置)

As mentioned in the earlier section, Flume agent configuration is read from a file that resembles a Java property file format with hierarchical property settings.

正如在前面部分所提到的,Flume agent配置是从一个类似于Java属性文件格式和层级属性设置的文档中读取的。

Defining the flow(定义流)

To define the flow within a single agent, you need to link the sources and sinks via a channel. You need to list the sources, sinks and channels for the given agent, and then point the source and sink to a channel. A source instance can specify multiple channels, but a sink instance can only specify one channel. The format is as follows:

在一个单点agent中定义流。你必须通过一个channel来连接source和sink。你必须列出给定的agent的sources,sinks和channel,然后指出source和sink所指定的channel。一个source实例可以指定多个channel,但是一个sink实例只能指定一个channel。格式如下:

# list the sources, sinks and channels for the agent<Agent>.sources = <Source><Agent>.sinks = <Sink><Agent>.channels = <Channel1> <Channel2># set channel for source<Agent>.sources.<Source>.channels = <Channel1> <Channel2> ...# set channel for sink<Agent>.sinks.<Sink>.channel = <Channel1>For example, an agent named agent_foo is reading data from an external avro client and sending it to HDFS via a memory channel. The config file weblog.config could look like:例如,一个agent命名为agent_foo从一个外部的avro客户端读取数据通过一个内存channel发送到HDFS。配置文件可以如下:# list the sources, sinks and channels for the agentagent_foo.sources = avro-appserver-src-1agent_foo.sinks = hdfs-sink-1agent_foo.channels = mem-channel-1# set channel for sourceagent_foo.sources.avro-appserver-src-1.channels = mem-channel-1# set channel for sinkagent_foo.sinks.hdfs-sink-1.channel = mem-channel-1

This will make the events flow from avro-AppSrv-source to hdfs-Cluster1-sink through the memory channel mem-channel-1. When the agent is started with the weblog.config as its config file, it will instantiate that flow.

这样就可以使得事件流从avro-AppSrv-source通过内存channel mem-channel-1流向hdfs-Cluster1-sink。当agent将weblog.config作为他的配置文件启动时,就会实例化这样一个流。

Configuring individual components(配置单个组件)

After defining the flow, you need to set properties of each source, sink and channel. This is done in the same hierarchical namespace fashion where you set the component type and other values for the properties specific to each component:

定义好一个流之后,你必须为每个source、sink和channel配置属性。这跟你为每个组件设置组件类型和其他属性时使用的命名空间层级格式是一样的。

# properties for sources<Agent>.sources.<Source>.<someProperty> = <someValue># properties for channels<Agent>.channel.<Channel>.<someProperty> = <someValue># properties for sinks<Agent>.sources.<Sink>.<someProperty> = <someValue>

The property “type” needs to be set for each component for Flume to understand what kind of object it needs to be. Each source, sink and channel type has its own set of properties required for it to function as intended. All those need to be set as needed. In the previous example, we have a flow from avro-AppSrv-source to hdfs-Cluster1-sink through the memory channel mem-channel-1. Here’s an example that shows configuration of each of those components:

每个组件的“type”属性是必须设置的,以保证Flume框架能够知道他们是哪种类型的。每个source、sink和channel类型都有它们被设计的预期功能而自己独有的属性。所有这些都必须设置。在前面的例子当中。我们拥有一个avro-AppSrv-source通过内存channel mem-channel-1连接hdfs-Cluster1-sink的流。下面将展示这些组件的配置情况

agent_foo.sources = avro-AppSrv-sourceagent_foo.sinks = hdfs-Cluster1-sinkagent_foo.channels = mem-channel-1# set channel for sources, sinks# properties of avro-AppSrv-sourceagent_foo.sources.avro-AppSrv-source.type = avroagent_foo.sources.avro-AppSrv-source.bind = localhostagent_foo.sources.avro-AppSrv-source.port = 10000# properties of mem-channel-1agent_foo.channels.mem-channel-1.type = memoryagent_foo.channels.mem-channel-1.capacity = 1000agent_foo.channels.mem-channel-1.transactionCapacity = 100# properties of hdfs-Cluster1-sinkagent_foo.sinks.hdfs-Cluster1-sink.type = hdfsagent_foo.sinks.hdfs-Cluster1-sink.hdfs.path = hdfs://namenode/flume/webdata#...

Adding multiple flows in an agent(一个Agent多个流)

A single Flume agent can contain several independent flows. You can list multiple sources, sinks and channels in a config. These components can be linked to form multiple flows:

单个Flume agent可以包含多个独立的流。你可以在一个配置文件中列出多个sources、sinks和channels。这些组件将连接组成多个流。

# list the sources, sinks and channels for the agent<Agent>.sources = <Source1> <Source2><Agent>.sinks = <Sink1> <Sink2><Agent>.channels = <Channel1> <Channel2>

Then you can link the sources and sinks to their corresponding channels (for sources) of channel (for sinks) to setup two different flows. For example, if you need to setup two flows in an agent, one going from an external avro client to external HDFS and another from output of a tail to avro sink, then here’s a config to do that:

然后你可以将sources和sink是通过相应的channels连接来配置两个不同的流。例如,你必须在一个agent中配置两个流,一个是从外部avro客户端到外部HDFS和另一个是从一个avro sink获取数据,以下配置可达到这个目标:

# list the sources, sinks and channels in the agentagent_foo.sources = avro-AppSrv-source1 exec-tail-source2agent_foo.sinks = hdfs-Cluster1-sink1 avro-forward-sink2agent_foo.channels = mem-channel-1 file-channel-2# flow #1 configurationagent_foo.sources.avro-AppSrv-source1.channels = mem-channel-1agent_foo.sinks.hdfs-Cluster1-sink1.channel = mem-channel-1# flow #2 configurationagent_foo.sources.exec-tail-source2.channels = file-channel-2agent_foo.sinks.avro-forward-sink2.channel = file-channel-2Configuring a multi agent flow(配置一个多agent流)

To setup a multi-tier flow, you need to have an avro/thrift sink of first hop pointing to avro/thrift source of the next hop. This will result in the first Flume agent forwarding events to the next Flume agent. For example, if you are periodically sending files (1 file per event) using avro client to a local Flume agent, then this local agent can forward it to another agent that has the mounted for storage.

为了配置一个多层的流,你必须要有一个avro/thriftsink 指向下一个hop的avro/thrift source。这将会使得第一个Flume agent将events传给下一个Flume agent。例如,如果你用avro client周期性地向一个本地的Flume agent发送数据,这个本地的Flume agent将events传到另外一个挂载内存的agent。

Weblog agent config:

# list sources, sinks and channels in the agentagent_foo.sources = avro-AppSrv-sourceagent_foo.sinks = avro-forward-sinkagent_foo.channels = file-channel# define the flowagent_foo.sources.avro-AppSrv-source.channels = file-channelagent_foo.sinks.avro-forward-sink.channel = file-channel# avro sink propertiesagent_foo.sources.avro-forward-sink.type = avroagent_foo.sources.avro-forward-sink.hostname = 10.1.1.100agent_foo.sources.avro-forward-sink.port = 10000# configure other pieces#...HDFS agent config:# list sources, sinks and channels in the agentagent_foo.sources = avro-collection-sourceagent_foo.sinks = hdfs-sinkagent_foo.channels = mem-channel# define the flowagent_foo.sources.avro-collection-source.channels = mem-channelagent_foo.sinks.hdfs-sink.channel = mem-channel# avro sink propertiesagent_foo.sources.avro-collection-source.type = avroagent_foo.sources.avro-collection-source.bind = 10.1.1.100agent_foo.sources.avro-collection-source.port = 10000# configure other pieces#...

Here we link the avro-forward-sink from the weblog agent to the avro-collection-source of the hdfs agent. This will result in the events coming from the external appserver source eventually getting stored in HDFS.

在这里,我们将weblog agent的avro-forward-sink连到hdfs agent的avro-collection-source。这将使得从外部app服务器来的events最终储存到HDFS中。

Fan out flow(分流)

As discussed in previous section, Flume supports fanning out the flow from one source to multiple channels. There are two modes of fan out, replicating and multiplexing. In the replicating flow, the event is sent to all the configured channels. In case of multiplexing, the event is sent to only a subset of qualifying channels. To fan out the flow, one needs to specify a list of channels for a source and the policy for the fanning it out. This is done by adding a channel “selector” that can be replicating or multiplexing. Then further specify the selection rules if it’s a multiplexer. If you don’t specify a selector, then by default it’s replicating:

正如前面部分所讨论的,Flume支持将来自一个source的events分到多个channels。将有两个模式的分流(暂且叫分流吧),复制流和选择流。在复制流中,所有的events将会发送到所有的channel中。在选择流中,event会被分到特定的channel中。在分流中,必须为source指定一组channel和相应的策略。通过给source增加一个selector.type的属性来选择复制还是选择。如果是选择流,那么就要指定选择规则。如果没有指定的话,默认就是复制流。

# List the sources, sinks and channels for the agent<Agent>.sources = <Source1><Agent>.sinks = <Sink1> <Sink2><Agent>.channels = <Channel1> <Channel2># set list of channels for source (separated by space)<Agent>.sources.<Source1>.channels = <Channel1> <Channel2># set channel for sinks<Agent>.sinks.<Sink1>.channel = <Channel1><Agent>.sinks.<Sink2>.channel = <Channel2><Agent>.sources.<Source1>.selector.type = replicating

The multiplexing select has a further set of properties to bifurcate the flow. This requires specifying a mapping of an event attribute to a set for channel. The selector checks for each configured attribute in the event header. If it matches the specified value, then that event is sent to all the channels mapped to that value. If there’s no match, then the event is sent to set of channels configured as default:

选择流拥有一组属性来进行分流。这个需要为事件属性和channel指定一个映射关系。这个选择器检查每个事件的header中的配置属性。如果他匹配到指定的值,该事件将会发送到所有跟指定值存在映射关系的channel。如果没有匹配成功,该event会发送到默认的channel。

# Mapping for multiplexing selector<Agent>.sources.<Source1>.selector.type = multiplexing<Agent>.sources.<Source1>.selector.header = <someHeader><Agent>.sources.<Source1>.selector.mapping.<Value1> = <Channel1><Agent>.sources.<Source1>.selector.mapping.<Value2> = <Channel1> <Channel2><Agent>.sources.<Source1>.selector.mapping.<Value3> = <Channel2>#...<Agent>.sources.<Source1>.selector.default = <Channel2>

The mapping allows overlapping the channels for each value.

该映射允许一个channel对应多个值。

The following example has a single flow that multiplexed to two paths. The agent named agent_foo has a single avro source and two channels linked to two sinks:

接下来的例子是一个拥有两条路径的选择流。名字为agent_foo的agent拥有单个avro source和两个channel连接两个sinks。

# list the sources, sinks and channels in the agentagent_foo.sources = avro-AppSrv-source1agent_foo.sinks = hdfs-Cluster1-sink1 avro-forward-sink2agent_foo.channels = mem-channel-1 file-channel-2# set channels for sourceagent_foo.sources.avro-AppSrv-source1.channels = mem-channel-1 file-channel-2# set channel for sinksagent_foo.sinks.hdfs-Cluster1-sink1.channel = mem-channel-1agent_foo.sinks.avro-forward-sink2.channel = file-channel-2# channel selector configurationagent_foo.sources.avro-AppSrv-source1.selector.type = multiplexingagent_foo.sources.avro-AppSrv-source1.selector.header = Stateagent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1

The selector checks for a header called “State”. If the value is “CA” then its sent to mem-channel-1, if its “AZ” then it goes to file-channel-2 or if its “NY” then both. If the “State” header is not set or doesn’t match any of the three, then it goes to mem-channel-1 which is designated as ‘default’.

选择器检查名为“State”的header。如果值为“CA”会被送到 mem-channel-1,如果值为“AZ”将会被送file-channel-2或者值为“NY”那么就会被送到两个channel。

The selector also supports optional channels. To specify optional channels for a header, the config parameter ‘optional’ is used in the following way:

选择器也支持可选channels。可以为一个header指定可选channel,可按以下方式来使用“optional”配置参数:

# channel selector configurationagent_foo.sources.avro-AppSrv-source1.selector.type = multiplexingagent_foo.sources.avro-AppSrv-source1.selector.header = Stateagent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.optional.CA = mem-channel-1 file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1

The selector will attempt to write to the required channels first and will fail the transaction if even one of these channels fails to consume the events. The transaction is reattempted on all of the channels. Once all required channels have consumed the events, then the selector will attempt to write to the optional channels. A failure by any of the optional channels to consume the event is simply ignored and not retried.

If there is an overlap between the optional channels and required channels for a specific header, the channel is considered to be required, and a failure in the channel will cause the entire set of required channels to be retried. For instance, in the above example, for the header “CA” mem-channel-1 is considered to be a required channel even though it is marked both as required and optional, and a failure to write to this channel will cause that event to be retried on all channels configured for the selector.

Note that if a header does not have any required channels, then the event will be written to the default channels and will be attempted to be written to the optional channels for that header. Specifying optional channels will still cause the event to be written to the default channels, if no required channels are specified. If no channels are designated as default and there are no required, the selector will attempt to write the events to the optional channels. Any failures are simply ignored in that case.

选择器会试图第一时间将数据写到需求channel和当这些channel中某些channel没法消费这些events时会停止这次事务。该事务会重新连接所有channel。一旦所有channel都在消费了所有events,那么选择器会试图将events写到备选channel中。备选channel消费event产生的失效会被简单地忽略和不再重试。

如果对于一个指定的header存在备选channel和需求channel的重叠,那么选择需求channel,并且当一个需求channel发生失效时将会引起所有需求channel的重试。举个例子,在上面的案例中,为header“CA”指定了一个需求channel mem-channel-1,尽管备选channel和需求channel都指定了,但是一旦需求channel发生失效,name会引起该选择器中所有channel的重试。

需要说明的是如果一个header没有指定任何需求channel,那么events会写到默认channel和试图写到备选channel中。如果没有指定需求channel,就算指定了备选channel,events还是会被写到默认channel中。如果没有指定需求channel和默认channel,选择器会说将events写到备选channel。在这些情况中,失效会被忽略。

这篇关于【Hadoop】Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(二)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1070221

相关文章

Mybatis官方生成器的使用方式

《Mybatis官方生成器的使用方式》本文详细介绍了MyBatisGenerator(MBG)的使用方法,通过实际代码示例展示了如何配置Maven插件来自动化生成MyBatis项目所需的实体类、Map... 目录1. MyBATis Generator 简介2. MyBatis Generator 的功能3

SpringBoot3集成swagger文档的使用方法

《SpringBoot3集成swagger文档的使用方法》本文介绍了Swagger的诞生背景、主要功能以及如何在SpringBoot3中集成Swagger文档,Swagger可以帮助自动生成API文档... 目录一、前言1. API 文档自动生成2. 交互式 API 测试3. API 设计和开发协作二、使用

提示:Decompiled.class file,bytecode version如何解决

《提示:Decompiled.classfile,bytecodeversion如何解决》在处理Decompiled.classfile和bytecodeversion问题时,通过修改Maven配... 目录问题原因总结问题1、提示:Decompiled .class file,China编程 bytecode

基于C#实现将图片转换为PDF文档

《基于C#实现将图片转换为PDF文档》将图片(JPG、PNG)转换为PDF文件可以帮助我们更好地保存和分享图片,所以本文将介绍如何使用C#将JPG/PNG图片转换为PDF文档,需要的可以参考下... 目录介绍C# 将单张图片转换为PDF文档C# 将多张图片转换到一个PDF文档介绍将图片(JPG、PNG)转

Hadoop企业开发案例调优场景

需求 (1)需求:从1G数据中,统计每个单词出现次数。服务器3台,每台配置4G内存,4核CPU,4线程。 (2)需求分析: 1G / 128m = 8个MapTask;1个ReduceTask;1个mrAppMaster 平均每个节点运行10个 / 3台 ≈ 3个任务(4    3    3) HDFS参数调优 (1)修改:hadoop-env.sh export HDFS_NAMENOD

Hadoop集群数据均衡之磁盘间数据均衡

生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

hadoop开启回收站配置

开启回收站功能,可以将删除的文件在不超时的情况下,恢复原数据,起到防止误删除、备份等作用。 开启回收站功能参数说明 (1)默认值fs.trash.interval = 0,0表示禁用回收站;其他值表示设置文件的存活时间。 (2)默认值fs.trash.checkpoint.interval = 0,检查回收站的间隔时间。如果该值为0,则该值设置和fs.trash.interval的参数值相等。

Hadoop数据压缩使用介绍

一、压缩原则 (1)运算密集型的Job,少用压缩 (2)IO密集型的Job,多用压缩 二、压缩算法比较 三、压缩位置选择 四、压缩参数配置 1)为了支持多种压缩/解压缩算法,Hadoop引入了编码/解码器 2)要在Hadoop中启用压缩,可以配置如下参数

活用c4d官方开发文档查询代码

当你问AI助手比如豆包,如何用python禁止掉xpresso标签时候,它会提示到 这时候要用到两个东西。https://developers.maxon.net/论坛搜索和开发文档 比如这里我就在官方找到正确的id描述 然后我就把参数标签换过来

计算机毕业设计 大学志愿填报系统 Java+SpringBoot+Vue 前后端分离 文档报告 代码讲解 安装调试

🍊作者:计算机编程-吉哥 🍊简介:专业从事JavaWeb程序开发,微信小程序开发,定制化项目、 源码、代码讲解、文档撰写、ppt制作。做自己喜欢的事,生活就是快乐的。 🍊心愿:点赞 👍 收藏 ⭐评论 📝 🍅 文末获取源码联系 👇🏻 精彩专栏推荐订阅 👇🏻 不然下次找不到哟~Java毕业设计项目~热门选题推荐《1000套》 目录 1.技术选型 2.开发工具 3.功能