Hadoop词频统计(二)之本地模式运行

2024-06-09 10:38

本文主要是介绍Hadoop词频统计(二)之本地模式运行,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

想要在windows上以本地模式运行hadoop就必须要在windows上配置好hadoop的本地运行环境。我们需要下载编译好的hadoop二进制包。

下载地址如下:

链接:http://pan.baidu.com/s/1skE4fQt 密码:or48

下载完成后配置windows环境变量:

HADOOP_HOME=C:\Program Files (x86)\hadoop-2.6.0

PATH=%PATH%:%HADOOP_HOME%\bin

map:

package cn.hadoop.mr;import java.io.IOException;import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.StringUtils;public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>{@Overrideprotected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, LongWritable>.Context context)throws IOException, InterruptedException {// TODO Auto-generated method stubString line = value.toString();String[] words = StringUtils.split(line,' ');for(String word : words) {context.write(new Text(word), new LongWritable(1));}}
}
reduce:

package cn.hadoop.mr;import java.io.IOException;import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {@Overrideprotected void reduce(Text key, Iterable<LongWritable> values,Reducer<Text, LongWritable, Text, LongWritable>.Context context) throws IOException, InterruptedException {long count = 0;for(LongWritable value : values) {count += value.get();}context.write(key, new LongWritable(count));}
}

run:

package cn.hadoop.mr;import java.io.IOException;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WCRunner {public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {Configuration conf = new Configuration();Job wcjob = Job.getInstance(conf);wcjob.setJarByClass(WCRunner.class);wcjob.setMapperClass(WCMapper.class);wcjob.setReducerClass(WCReducer.class);wcjob.setOutputKeyClass(Text.class);wcjob.setOutputValueClass(LongWritable.class);wcjob.setMapOutputKeyClass(Text.class);wcjob.setMapOutputValueClass(LongWritable.class);FileInputFormat.setInputPaths(wcjob, "E:/wc/inputdata/");FileOutputFormat.setOutputPath(wcjob, new Path("E:/wc/output/"));wcjob.waitForCompletion(true);}
}

缺少jar包的话就把C:\Program Files (x86)\hadoop-2.6.0\share\hadoop文件夹下面的所有jar包引入进项目。

然后在eclipse中直接以java application方式运行main方法即可。

输出结果如下:

2016-07-25 15:47:06,565 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2016-07-25 15:47:06,569 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2016-07-25 15:47:06,751 WARN  [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(153)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2016-07-25 15:47:06,752 WARN  [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(261)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2016-07-25 15:47:06,796 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(281)) - Total input paths to process : 1
2016-07-25 15:47:06,836 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(494)) - number of splits:1
2016-07-25 15:47:06,910 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(583)) - Submitting tokens for job: job_local1228851727_0001
2016-07-25 15:47:07,087 INFO  [main] mapreduce.Job (Job.java:submit(1300)) - The url to track the job: http://localhost:8080/
2016-07-25 15:47:07,088 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) - Running job: job_local1228851727_0001
2016-07-25 15:47:07,089 INFO  [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(471)) - OutputCommitter set in config null
2016-07-25 15:47:07,094 INFO  [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(489)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2016-07-25 15:47:07,131 INFO  [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for map tasks
2016-07-25 15:47:07,132 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(224)) - Starting task: attempt_local1228851727_0001_m_000000_0
2016-07-25 15:47:07,156 INFO  [LocalJobRunner Map Task Executor #0] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(181)) - ProcfsBasedProcessTree currently is supported only on Linux.
2016-07-25 15:47:07,182 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:initialize(587)) -  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@6db06d7d
2016-07-25 15:47:07,185 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:runNewMapper(753)) - Processing split: file:/E:/wc/inputdata/in.dat:0+78
2016-07-25 15:47:07,225 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:setEquator(1202)) - (EQUATOR) 0 kvi 26214396(104857584)
2016-07-25 15:47:07,225 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(995)) - mapreduce.task.io.sort.mb: 100
2016-07-25 15:47:07,225 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(996)) - soft limit at 83886080
2016-07-25 15:47:07,225 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(997)) - bufstart = 0; bufvoid = 104857600
2016-07-25 15:47:07,225 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(998)) - kvstart = 26214396; length = 6553600
2016-07-25 15:47:07,228 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:createSortingCollector(402)) - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2016-07-25 15:47:07,234 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 
2016-07-25 15:47:07,234 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1457)) - Starting flush of map output
2016-07-25 15:47:07,234 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1475)) - Spilling map output
2016-07-25 15:47:07,234 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1476)) - bufstart = 0; bufend = 174; bufvoid = 104857600
2016-07-25 15:47:07,234 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1478)) - kvstart = 26214396(104857584); kvend = 26214352(104857408); length = 45/6553600
2016-07-25 15:47:07,243 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:sortAndSpill(1660)) - Finished spill 0
2016-07-25 15:47:07,248 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:done(1001)) - Task:attempt_local1228851727_0001_m_000000_0 is done. And is in the process of committing
2016-07-25 15:47:07,256 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - map
2016-07-25 15:47:07,256 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:sendDone(1121)) - Task 'attempt_local1228851727_0001_m_000000_0' done.
2016-07-25 15:47:07,256 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(249)) - Finishing task: attempt_local1228851727_0001_m_000000_0
2016-07-25 15:47:07,256 INFO  [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2016-07-25 15:47:07,259 INFO  [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for reduce tasks
2016-07-25 15:47:07,259 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(302)) - Starting task: attempt_local1228851727_0001_r_000000_0
2016-07-25 15:47:07,266 INFO  [pool-3-thread-1] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(181)) - ProcfsBasedProcessTree currently is supported only on Linux.
2016-07-25 15:47:07,294 INFO  [pool-3-thread-1] mapred.Task (Task.java:initialize(587)) -  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@57baec0e
2016-07-25 15:47:07,297 INFO  [pool-3-thread-1] mapred.ReduceTask (ReduceTask.java:run(362)) - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@7c165ec0
2016-07-25 15:47:07,306 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:<init>(196)) - MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2016-07-25 15:47:07,308 INFO  [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(61)) - attempt_local1228851727_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2016-07-25 15:47:07,334 INFO  [localfetcher#1] reduce.LocalFetcher (LocalFetcher.java:copyMapOutput(141)) - localfetcher#1 about to shuffle output of map attempt_local1228851727_0001_m_000000_0 decomp: 200 len: 204 to MEMORY
2016-07-25 15:47:07,338 INFO  [localfetcher#1] reduce.InMemoryMapOutput (InMemoryMapOutput.java:shuffle(100)) - Read 200 bytes from map-output for attempt_local1228851727_0001_m_000000_0
2016-07-25 15:47:07,361 INFO  [localfetcher#1] reduce.MergeManagerImpl (MergeManagerImpl.java:closeInMemoryFile(314)) - closeInMemoryFile -> map-output of size: 200, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->200
2016-07-25 15:47:07,362 INFO  [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(76)) - EventFetcher is interrupted.. Returning
2016-07-25 15:47:07,363 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2016-07-25 15:47:07,363 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(674)) - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2016-07-25 15:47:07,369 INFO  [pool-3-thread-1] mapred.Merger (Merger.java:merge(597)) - Merging 1 sorted segments
2016-07-25 15:47:07,370 INFO  [pool-3-thread-1] mapred.Merger (Merger.java:merge(696)) - Down to the last merge-pass, with 1 segments left of total size: 193 bytes
2016-07-25 15:47:07,371 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(751)) - Merged 1 segments, 200 bytes to disk to satisfy reduce memory limit
2016-07-25 15:47:07,372 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(781)) - Merging 1 files, 204 bytes from disk
2016-07-25 15:47:07,373 INFO  [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(796)) - Merging 0 segments, 0 bytes from memory into reduce
2016-07-25 15:47:07,373 INFO  [pool-3-thread-1] mapred.Merger (Merger.java:merge(597)) - Merging 1 sorted segments
2016-07-25 15:47:07,373 INFO  [pool-3-thread-1] mapred.Merger (Merger.java:merge(696)) - Down to the last merge-pass, with 1 segments left of total size: 193 bytes
2016-07-25 15:47:07,374 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2016-07-25 15:47:07,377 INFO  [pool-3-thread-1] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2016-07-25 15:47:07,385 INFO  [pool-3-thread-1] mapred.Task (Task.java:done(1001)) - Task:attempt_local1228851727_0001_r_000000_0 is done. And is in the process of committing
2016-07-25 15:47:07,387 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2016-07-25 15:47:07,387 INFO  [pool-3-thread-1] mapred.Task (Task.java:commit(1162)) - Task attempt_local1228851727_0001_r_000000_0 is allowed to commit now
2016-07-25 15:47:07,387 INFO  [pool-3-thread-1] output.FileOutputCommitter (FileOutputCommitter.java:commitTask(439)) - Saved output of task 'attempt_local1228851727_0001_r_000000_0' to file:/E:/wc/output/_temporary/0/task_local1228851727_0001_r_000000
2016-07-25 15:47:07,387 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > reduce
2016-07-25 15:47:07,387 INFO  [pool-3-thread-1] mapred.Task (Task.java:sendDone(1121)) - Task 'attempt_local1228851727_0001_r_000000_0' done.
2016-07-25 15:47:07,387 INFO  [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(325)) - Finishing task: attempt_local1228851727_0001_r_000000_0
2016-07-25 15:47:07,388 INFO  [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - reduce task executor complete.
2016-07-25 15:47:08,090 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1366)) - Job job_local1228851727_0001 running in uber mode : false
2016-07-25 15:47:08,094 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1373)) -  map 100% reduce 100%
2016-07-25 15:47:08,097 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1384)) - Job job_local1228851727_0001 completed successfully
2016-07-25 15:47:08,128 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1391)) - Counters: 33File System CountersFILE: Number of bytes read=890FILE: Number of bytes written=525466FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0Map-Reduce FrameworkMap input records=6Map output records=12Map output bytes=174Map output materialized bytes=204Input split bytes=93Combine input records=0Combine output records=0Reduce input groups=5Reduce shuffle bytes=204Reduce input records=12Reduce output records=5Spilled Records=24Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=0CPU time spent (ms)=0Physical memory (bytes) snapshot=0Virtual memory (bytes) snapshot=0Total committed heap usage (bytes)=504758272Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=78File Output Format Counters Bytes Written=56
文件内容如下:

haha    4
hehe    2
heiheihei    2
lalala    1
lololo    3


这篇关于Hadoop词频统计(二)之本地模式运行的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1044949

相关文章

0基础租个硬件玩deepseek,蓝耘元生代智算云|本地部署DeepSeek R1模型的操作流程

《0基础租个硬件玩deepseek,蓝耘元生代智算云|本地部署DeepSeekR1模型的操作流程》DeepSeekR1模型凭借其强大的自然语言处理能力,在未来具有广阔的应用前景,有望在多个领域发... 目录0基础租个硬件玩deepseek,蓝耘元生代智算云|本地部署DeepSeek R1模型,3步搞定一个应

Java实现状态模式的示例代码

《Java实现状态模式的示例代码》状态模式是一种行为型设计模式,允许对象根据其内部状态改变行为,本文主要介绍了Java实现状态模式的示例代码,文中通过示例代码介绍的非常详细,需要的朋友们下面随着小编来... 目录一、简介1、定义2、状态模式的结构二、Java实现案例1、电灯开关状态案例2、番茄工作法状态案例

一文教你使用Python实现本地分页

《一文教你使用Python实现本地分页》这篇文章主要为大家详细介绍了Python如何实现本地分页的算法,主要针对二级数据结构,文中的示例代码简洁易懂,有需要的小伙伴可以了解下... 在项目开发的过程中,遇到分页的第一页就展示大量的数据,导致前端列表加载展示的速度慢,所以需要在本地加入分页处理,把所有数据先放

本地搭建DeepSeek-R1、WebUI的完整过程及访问

《本地搭建DeepSeek-R1、WebUI的完整过程及访问》:本文主要介绍本地搭建DeepSeek-R1、WebUI的完整过程及访问的相关资料,DeepSeek-R1是一个开源的人工智能平台,主... 目录背景       搭建准备基础概念搭建过程访问对话测试总结背景       最近几年,人工智能技术

如何在本地部署 DeepSeek Janus Pro 文生图大模型

《如何在本地部署DeepSeekJanusPro文生图大模型》DeepSeekJanusPro模型在本地成功部署,支持图片理解和文生图功能,通过Gradio界面进行交互,展示了其强大的多模态处... 目录什么是 Janus Pro1. 安装 conda2. 创建 python 虚拟环境3. 克隆 janus

本地私有化部署DeepSeek模型的详细教程

《本地私有化部署DeepSeek模型的详细教程》DeepSeek模型是一种强大的语言模型,本地私有化部署可以让用户在自己的环境中安全、高效地使用该模型,避免数据传输到外部带来的安全风险,同时也能根据自... 目录一、引言二、环境准备(一)硬件要求(二)软件要求(三)创建虚拟环境三、安装依赖库四、获取 Dee

通过prometheus监控Tomcat运行状态的操作流程

《通过prometheus监控Tomcat运行状态的操作流程》文章介绍了如何安装和配置Tomcat,并使用Prometheus和TomcatExporter来监控Tomcat的运行状态,文章详细讲解了... 目录Tomcat安装配置以及prometheus监控Tomcat一. 安装并配置tomcat1、安装

deepseek本地部署使用步骤详解

《deepseek本地部署使用步骤详解》DeepSeek是一个开源的深度学习模型,支持自然语言处理和推荐系统,本地部署步骤包括克隆仓库、创建虚拟环境、安装依赖、配置模型和数据、启动服务、调试与优化以及... 目录环境要求部署步骤1. 克隆 DeepSeek 仓库2. 创建虚拟环境3. 安装依赖4. 配置模型

DeepSeek模型本地部署的详细教程

《DeepSeek模型本地部署的详细教程》DeepSeek作为一款开源且性能强大的大语言模型,提供了灵活的本地部署方案,让用户能够在本地环境中高效运行模型,同时保护数据隐私,在本地成功部署DeepSe... 目录一、环境准备(一)硬件需求(二)软件依赖二、安装Ollama三、下载并部署DeepSeek模型选

mysqld_multi在Linux服务器上运行多个MySQL实例

《mysqld_multi在Linux服务器上运行多个MySQL实例》在Linux系统上使用mysqld_multi来启动和管理多个MySQL实例是一种常见的做法,这种方式允许你在同一台机器上运行多个... 目录1. 安装mysql2. 配置文件示例配置文件3. 创建数据目录4. 启动和管理实例启动所有实例