Ubuntu下，使用Eclipse配置Hadoop环境并编写MapReduce程序

准备工作

希望你在开始之前已经在Linux中安装好了以下几样东西：

（1）jdk1.8（最新版的eclipse必须得jdk1.8以上，根据你下的eclipse版本来安装jdk，楼主原来是jdk1.7，后来换成了jdk1.8。如果不知道怎么切换多个jdk请参照文章https://blog.csdn.net/qq_27435059/article/details/80513553）

（2）eclipse

（3）hadoop集群或者单机环境

在命令行执行hadoop指令没有问题：

并且已经下载好了hadoop-eclipse-plugin-2.6.0.jar：

注意：因为楼主的hadoop是2.6.0版本的，根据你自己的版本下载相应的jar包

配置Eclipse的Hadoop环境

（1）把你下载好的hadoop-eclipse-plugin-2.6.0.jar复制到你的eclipse安装目录下的plugins文件夹（楼主的eclipse安装路径为 /opt/eclipse）

可用以下命令执行：

sudo cp ~/Downloads/hadoop-eclipse-plugin-2.6.0.jar /opt/eclipse/plugins/
#前面是我的jar包位置  后面是我的eclipse安装路径  根据你自己的路径来填写

也可以直接在图形界面中操作：

（2）打开Eclipse，添加Hadoop

那么在Eclipse中点击最上方的Window--->Preferences

假如以上步骤你都执行正确，那么在左侧你应该能看见Hadoop Map/Reduce，点击它

然后添加你的Hadoop路径

点击右下角的Apply and Close关闭对话框

（3）添加Map/Reduce视图

点击最上方的Window--->Perspective--->Open Perspective--->Other...

选择Map/Reduce，右下角Open打开

（4）配置Hadoop端口

到这里，你可以看见Eclipse最下方出现的一个Map/Reduce Locations的视图，点击这个视图右上角的小象标志

把这三个空填一下，第三个空一般Hadoop中默认的都是9000，你可以根据你的core-site.xml文件来查看

Host那里默认的应该是localhost，你可以先不改，楼主之所以改成ip是由于我这里用localhost连接不上9000端口，无法查看Eclipse里的DFS，你的如果用localhost没问题就不用改了，如果你的也出现9000端口无法访问的问题的话，再来试试把localhost改成你的Master主机的ip地址

（5）配置完成，打开DFS文件系统

好了，现在你可以在Eclipse的左侧视图中点开DFS Locations查看你的HDFS文件系统，以后可以直接在这里查看HDFS的文件，不需要通过命令行进行控制

到这里Eclipse中的Hadoop环境已经配置好了！

编写第一个MapReduce小程序

在你的DFS文件系统中，右键-->新建文件夹，建立如下的目录结构，主要是user/hadoop/input（这个文件夹是你的程序的数据来源）

（1）创建数据源

打开命令终端，输入

gedit /tmp/text

这样就在/tmp目录下就新建了一个text的空文件，我们往里面添加数据，这个数据是之后要用来程序测试的数据

源数据如下（直接复制即可）

201825
201835
201815
201836
201722
201726
201711
201633
201655
201666
201525
201566
201567
201833
201755
201639
201588
201544
201528
201578
201699
201846
201710

（数据说明：前四位为年份，后两位为数据，该程序用来判断每一年的数据中出现的最大数据）

（2）上传数据源

打开终端，使用以下命令把text文件上传到HDFS文件系统的/user/hadoop/input/下

hadoop fs -put /tmp/text /user/hadoop/input

然后在Eclipse中我们右键hadoop文件夹刷新一下，可以看见文件已经成功上传到了HDFS文件系统

（3）创建项目

新建一个Map/Reduce项目

填写项目名称，然后finish

项目结构如图

（4）添加需要的依赖文件

把你的Hadoop安装目录下的core-site.xml、hdfs-site.xml、log4j.properties三个文件复制到你的项目中的src下

（5）新建Java类

右键src，新建一个Class

填写你的包名org.apache.hadoop.examples和你的类名years_maxNumber，点击finish

（6）编写源代码

以下代码直接复制即可

package org.apache.hadoop.examples;
     
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
     
public class years_maxNumber {
        
	    /**
	     * Map
	     */
    	public static class mapper extends Mapper<Object, Text, Text, IntWritable>{
    		
    		public void map(Object key, Text value, Context context) throws IOException, InterruptedException{
                //value为每一行的值
                //Text相当于String，IntWritable相当于int
    			String date = value.toString().substring(0, 4); //取前四位年份
    			String tem = value.toString().substring(4);  //取后两位的数据
    			int tem_ = Integer.parseInt(tem);   //转换为int型
                
                context.write(new Text(date), new IntWritable(tem_));
                //这里可以理解为用context将数据打包给reduce
                //假设这里的数据为201835，经过以上步骤后date=2018，tmp_=35
                //假设下一个数据为201833，经过以上步骤后date=2018，tmp_=33
                //…………
                //经历了Map后，会将年份相同的所有数据放在一起 可看作一个list
                //因此在Reduce端接收到的数据中 key=2018，values=[33,35……]
    		}
    		
    	}
    	
    	/**
    	 * Reduce
    	 */
    	public static class reducer extends Reducer<Text, IntWritable, Text, IntWritable>{
    		
    		public void reduce(Text key,Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
    			int max = 0;
    			for(IntWritable val:values) {  //迭代values值
    				int temp = Integer.parseInt(val.toString());  //取val值并转换成int型
    				if(temp > max) {   //比较最大值
    					max = temp;
    				}
    			}
    			context.write(key, new IntWritable(max));  //最终结果，key为年份，max为最大数据
    		}
    		
    	}
     
      public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
          System.err.println("Usage: wordcount <in> <out>");
          System.exit(2);
        }
        Job job = new Job(conf, "firstTest");   //创建一个作业
        job.setJarByClass(years_maxNumber.class);   //指定class
        job.setMapperClass(mapper.class);     //设置对应的Map类
        job.setCombinerClass(reducer.class);    //这里设置为Reduce类
        job.setReducerClass(reducer.class);     //设置对应的Reduce类
        job.setOutputKeyClass(Text.class);     //设置最终结果中key的类型，这里是Text
        job.setOutputValueClass(IntWritable.class);   //设置最终结果中value的类型，这里是IntWritable
        FileInputFormat.addInputPath(job, new Path(otherArgs[0])); //设置数据来源
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));   //设置输出结果存储位置
        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
    }

（7）运行程序

右击类名years_maxNumber--->Run As--->Run Configurations...