本文主要是介绍hadoop入门--通过Apache Flume向HDFS存储数据,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
本笔记基于Hadoop2.7.3,Apache Flume 1.8.0。其中flume source为netcat,flume channel为memory,flume sink为hdfs。
1,配置flume代理文件
配置一个flume agent代理,在此名称为shaman。配置文件(netcat-memory-hdfs.conf)如下:
# Identify the components on agent shaman:
shaman.sources = netcat_s1
shaman.sinks = hdfs_w1
shaman.channels = in-mem_c1
# Configure the source:
shaman.sources.netcat_s1.type = netcat
shaman.sources.netcat_s1.bind = localhost
shaman.sources.netcat_s1.port = 44444
# Describe the sink:
shaman.sinks.hdfs_w1.type = hdfs
shaman.sinks.hdfs_w1.hdfs.path = hdfs://localhost:8020/user/root/test
shaman.sinks.hdfs_w1.hdfs.writeFormat = Text
shaman.sinks.hdfs_w1.hdfs.fileType = DataStream# Configure a channel that buffers events in memory:
shaman.channels.in-mem_c1.type = memory
shaman.channels.in-mem_c1.capacity = 20000
shaman.channels.in-mem_c1.transactionCapacity = 100
# Bind the source and sink to the channel:
shaman.sources.netcat_s1.channels = in-mem_c1
shaman.sinks.hdfs_w1.channel = in-mem_c1
备注:
hdfs://localhost:8020/user/root/test,其中hdfs://localhost:8020
为hadoop配置文件core-site.xml中
fs.defaultFS
属性的值,root
为hadoop的登陆用户。
2,启动flume代理
bin/flume-ng agent -f agent/netcat-memory-hdfs.conf -n shaman -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true
3,打开telnet客户端,输入字母测试
telnet localhost 44444
然后输入文字
4,查看hdfs test目录
hdfs dfs -ls /user/root/test
会发现有新的文件出现,文件里面的内容即是通过telent输入的字母。
学习资料:
1,《Hadoop For Dummies》
2,Flume 1.8.0 User Guide
这篇关于hadoop入门--通过Apache Flume向HDFS存储数据的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!