hadoop入门--通过Apache Flume向HDFS存储数据

本文主要是介绍hadoop入门--通过Apache Flume向HDFS存储数据，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

本笔记基于Hadoop2.7.3，Apache Flume 1.8.0。其中flume source为netcat，flume channel为memory，flume sink为hdfs。

1，配置flume代理文件

配置一个flume agent代理,在此名称为shaman。配置文件（netcat-memory-hdfs.conf）如下：

# Identify the components on agent shaman:
shaman.sources = netcat_s1
shaman.sinks = hdfs_w1
shaman.channels = in-mem_c1
# Configure the source:
shaman.sources.netcat_s1.type = netcat
shaman.sources.netcat_s1.bind = localhost
shaman.sources.netcat_s1.port = 44444
# Describe the sink:
shaman.sinks.hdfs_w1.type = hdfs
shaman.sinks.hdfs_w1.hdfs.path = hdfs://localhost:8020/user/root/test
shaman.sinks.hdfs_w1.hdfs.writeFormat = Text
shaman.sinks.hdfs_w1.hdfs.fileType = DataStream# Configure a channel that buffers events in memory:
shaman.channels.in-mem_c1.type = memory
shaman.channels.in-mem_c1.capacity = 20000
shaman.channels.in-mem_c1.transactionCapacity = 100
# Bind the source and sink to the channel:
shaman.sources.netcat_s1.channels = in-mem_c1
shaman.sinks.hdfs_w1.channel = in-mem_c1

备注：
hdfs://localhost:8020/user/root/test，其中hdfs://localhost:8020为hadoop配置文件core-site.xml中
fs.defaultFS属性的值，root为hadoop的登陆用户。

2，启动flume代理

bin/flume-ng agent -f agent/netcat-memory-hdfs.conf -n shaman  -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true