本文主要是介绍【hadoop】 3003-mapreduce任务的提交,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
一、通过Eclipse下本地运行
17507 DataNode
2721
22413 ResourceManager
可以参考 【hadoop】 3002-mapreduce程序统计单词个数示例 章节的演示
二、集群方式通过jar包形式运行
1、处理数据的作业达成jar包并上传hdfs
[hadoop@cloud01 HDFSdemo]$ pwd
/home/hadoop/workspace/HDFSdemo
[hadoop@cloud01 HDFSdemo]$ ll
total 139844
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 24 18:10 bin
-rw-rw-r--. 1 hadoop hadoop 440 Feb 20 06:56 core-site.xml
-rw-rw-r--. 1 hadoop hadoop 256 Feb 20 06:56 hdfs-site.xml
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 20 06:34 lib
-rw-rw-r--. 1 hadoop hadoop 253 Feb 20 06:56 mapred-site.xml
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 24 18:10 src
-rw-rw-r--. 1 hadoop hadoop 143167974 Feb 24 21:41 wc.jar
-rw-rw-r--. 1 hadoop hadoop 434 Feb 20 06:56 yarn-site.xml
/home/hadoop/workspace/HDFSdemo
[hadoop@cloud01 HDFSdemo]$ ll
total 139844
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 24 18:10 bin
-rw-rw-r--. 1 hadoop hadoop 440 Feb 20 06:56 core-site.xml
-rw-rw-r--. 1 hadoop hadoop 256 Feb 20 06:56 hdfs-site.xml
drwxrwxr-x. 2 hadoop hadoop 4096 Feb 20 06:34 lib
-rw-rw-r--. 1 hadoop hadoop 253 Feb 20 06:56 mapred-site.xml
drwxrwxr-x. 5 hadoop hadoop 4096 Feb 24 18:10 src
-rw-rw-r--. 1 hadoop hadoop 143167974 Feb 24 21:41 wc.jar
-rw-rw-r--. 1 hadoop hadoop 434 Feb 20 06:56 yarn-site.xml
2、启动yarn,执行start-yarn.sh 命令
[hadoop@cloud01 HDFSdemo]$ start-yarn.sh
[hadoop@cloud01 HDFSdemo]$ jps
22901 Jps 17507 DataNode
22510 NodeManager
17414 NameNode 2721
22413 ResourceManager
3、分布式执行wc.jar
[hadoop@cloud01 ~]$ hadoop jar workspace/HDFSdemo/wc.jar mapreduce.WordCount
3.1 执行过程日志情况
-- 连接ResourceManager: client.RMProxy: Connecting to ResourceManager
-- 获取分片,每个分片对应一个Map任务:input.FileInputFormat: Total input paths to process : 1
--生成本次运行的job编码:mapreduce.JobSubmitter: Submitting tokens for job: job_1424843731958_0002
--运行要执行的jar文件:mapreduce.Job: Running job: job_1424843731958_0002
--显示map和reduce执行进度
15/02/24 22:09:30 INFO mapreduce.Job: map 0% reduce 0%
15/02/24 22:09:39 INFO mapreduce.Job: map 100% reduce 0%
15/02/24 22:09:52 INFO mapreduce.Job: map 100% reduce 100%
15/02/24 22:09:53 INFO mapreduce.Job: Job job_1424843731958_0002 completed successfully
15/02/24 22:09:39 INFO mapreduce.Job: map 100% reduce 0%
15/02/24 22:09:52 INFO mapreduce.Job: map 100% reduce 100%
15/02/24 22:09:53 INFO mapreduce.Job: Job job_1424843731958_0002 completed successfully
3.2 MR整个过程的进程变化情况
ResourceManage,NodeManager->RunJar->MRAppMaster->YarnChild
随着MR程序进度的执行,响应的进程也随着退出,退出的顺序为
YarnChild->MRAppMaster->RunJar
3.3 图形方式给出对应的处理流程
图1
图2
file:/tmp/hadoop-hadoop/mapred/staging/hadoop1721666591/.staging/job_local1721666591_0001
file:/tmp/hadoop-hadoop/mapred/staging/hadoop1721666591/.staging/job_local1721666591_0001/job.xml
常见问题
1、INFO ipc.Client: Retrying connect to server: cloud01/192.168.2.31:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
该问题是因为yarn没有启动,需要执行start-yarn.sh
这篇关于【hadoop】 3003-mapreduce任务的提交的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!