MapReduce应用案例2:简单数据排序

1.需求描述

针对简单数据的排序需求并不复杂,大数据量文本中每行只存在一个数值,要求按照数值大小输出,且为数值标记行数。本案例对理解MR的原理深有帮助。
输入

12
58
1283
45
9
...

输出

1  9
2  12
3  45
4  58
5  1283
...

2.实现思路

MapReduce的Reduce阶段会按照key-velue对中的key进行排序,如果key为封装int的IntWritable类型,那么MapReduce按照数字大小对key排序,如果key为封装为String的Text类型,那么MapReduce按照字典顺序对字符串排序。

3.代码实现

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * 简单数据排序
 */
public class DataSort {
    /**
     * map将输入中的value化成IntWritable类型,作为输出的key
     */
    public static class SortMapper extends Mapper<Object, Text, IntWritable, IntWritable> {
        private static IntWritable data = new IntWritable();
        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            data.set(Integer.parseInt(line));
            context.write(data, new IntWritable(1));
        }
    }

    /**
     * reduce将输入中的key复制到输出数据的key上,然后根据输入的value-list中元素的个数决定key的输出次数
     * 用全局linenum来代表key的序
     */
    public static class SortReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
        private static IntWritable linenum = new IntWritable(1);
        public void reduce(IntWritable key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            for (IntWritable val : values) {
                context.write(linenum, key);
                linenum = new IntWritable(linenum.get() + 1);
            }
        }
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        if (args.length < 2) {
            System.out.println("参数不足");
            System.exit(1);
        }
        String inputPath = args[0];
        String outputPath = args[1];
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        job.setJobName("word count");
        job.setJarByClass(DataSort.class);
        job.setMapperClass(DataSort.SortMapper.class);
        job.setReducerClass(DataSort.SortReducer.class);
        /**
         * 设置map输出key-value类型
         */
        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(IntWritable.class);
        /**
         * 设置reduce输出key-value类型
         */
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(IntWritable.class);
        
        FileInputFormat.setInputPaths(job, new Path(inputPath));
        FileOutputFormat.setOutputPath(job, new Path(outputPath));
        job.waitForCompletion(true);
    }
}

4.问题

如果要实现倒序输出该如何做?


版权声明:本文为majianxiong_lzu原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。