MapReduce： combiner

2023-12-16 03:32

文章标签 mapreduce combiner

本文主要是介绍MapReduce： combiner，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

1、什么是combiner？

combiner就是规约操作，通过对map输出的数量进行规约，可以减少reduce的数量，提高执行效率combiner的输入输出类型必须和mapper的输出以及reducer的输入类型一致

2、什么情况要使用 combiner，什么情况不使用？

求平均数的时候就不需要用combiner，因为不会减少reduce执行数量。在其他的时候，可以依据情况，使用combiner，来减少map的输出数量，减少拷贝到reduce的文件，从而减轻reduce的压力，节省网络开销，提升执行效率

3、combine出现在哪个过程

map阶段的最后一个过程。

4、combine代码实现

/**** * <p>Description: 减少Reduce的压力，设置在job.setCombinerClass(WordCountReducer.class);中</p>* @author	余辉* @date	2016年3月14日下午4:31:10* @version 1.0*/
public class WordCountCombiner extends Reducer<Text, IntWritable, Text, IntWritable>{@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {//定义一个计数器int count = 0;//遍历这一组kv的所有v，累加到count中for(IntWritable value:values){count += value.get();}context.write(key, new IntWritable(count));}}

public class WordCountRunner {static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{protected void map(LongWritable key, Text value, Context context ) throws IOException, InterruptedException{String line = value.toString();String[] words = StringUtils.split(line, " ");for(String word : words){context.write(new Text(word), new IntWritable(1));}}}static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{int counter = 0;for(IntWritable value:values){//累加每一个valuecounter += value.get();}context.write(key, new IntWritable(counter));}	}public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {//封装任务信息的对象为Job对象,所以要先构造一个Job对象Configuration conf = new Configuration();Job job = Job.getInstance(conf);//设置本次job作业所在的jar包job.setJarByClass(WordCountRunner.class);//本次job作业使用的mapper类是哪个？job.setMapperClass(WordCountMapper.class);//本次job作业使用的reducer类是哪个？job.setReducerClass(WordCountReducer.class);//指定自定义的combiner类job.setCombinerClass(WordCountReducer.class);//本次job作业mapper类的输出数据key类型job.setMapOutputKeyClass(Text.class);//本次job作业mapper类的输出数据value类型job.setMapOutputValueClass(IntWritable.class);//本次job作业reducer类的输出数据key类型job.setOutputKeyClass(Text.class);//本次job作业reducer类的输出数据value类型job.setOutputValueClass(IntWritable.class);//本次job作业要处理的原始数据所在的路径FileInputFormat.setInputPaths(job,  new Path("/home/hadoop/Desktop/input"));//本次job作业产生的结果输出路径FileOutputFormat.setOutputPath(job, new Path("/home/hadoop/Desktop/output"));		//提交本次作业job.waitForCompletion(true);}}

北京小辉微信公众号

在这里插入图片描述