I write this stuff since it confused me a lot.
First, the input Iterable values for reduce method is not sorted in any
order. It is mentioned on Page 277 in Hadoop: The Definitive Guide,
Third Edition.
Job has three methods related to second sort:
1) setPartitionerClass
2) setSortComparatorClass
3) setGroupingComparatorClass
For 0.18.3 API, JobConf's following three methods are used:
1) setPartitionerClass
2) setOutputKeyComparatorClass
3) setOutputValueGroupingComparator
The confusing part is setGroupingComparatorClass. For the following
input for reduce:
<k1, v1> <k2, v2> // sorted by key from map phase
If setGroupingComparatorClass decides that k1 is equivalent to k2 even
thought k1 and k2 are different, the output from reduce method will be:
<k1, <v1, v2>> // <v1, v2> are in the original key order, k2 is lost.
Now let's talk about MaxTemperatureUsingSecondarySort. Input is like:
1901 99
1901 98
1902 80
1902 10
The output from map is:
<<1901, 99>, null>
<<1901, 98>, null>
<<1902, 80>, null>
<<1902, 10>, null>
The reduce method will be invoked twice. The output is:
<1901, 99>
<1902, 80>
Now turns to SecondarySort in Hadoop examples. For the same input, the
output from map is:
<<1901, 98>, 98>
<<1901, 99>, 99>
<<1902, 10>, 10>
<<1902, 80>, 80>
The reduce method will also be invoked twice. The output is:
<1901, 98>
<1901, 99>
<1902, 10>
<1902, 80>
分享到:
相关推荐
Hadoop大数据期末考试重点,选择、判断、简答
王家林的“云计算分布式大数据Hadoop实战高手之路---从零开始”的第九讲Hadoop图文训练课程:剖析NameNode和Secondary NameNode的工作机制和流程. 此教程来自于王家林免费发布的3本Hadoop教程:云计算分布式大数据...
Secondary Indexing in Phoenix
基于Hadoop部署实践对网站日志分析 1. 项目概述 本次要实践的数据日志来源于国内某技术学习论坛,该论坛由某培训机构主办,汇聚了众多技术学习者,每天都有人发帖、回帖。...Secondary NameNode.....
1.6 用Hadoop统计单词——运行第一个程序 1.7 Hadoop历史 1.8 小结 1.9 资源 第2章 初识Hadoop 2.1 Hadoop 的构造模块 2.1.1 NameNode 2.1.2 DataNode 2.1.3 Secondary NameNode 2.1.4 JobTracker 2.1.5 ...
Challenges Facing Secondary Education in the Third World Countries,Magreth Columbani Komba,,The aim of this paper is to explain the main challenges facing secondary education in the third world ...
91.6 用Hadoop统计单词——运行第一个程序 111.7 Hadoop历史 151.8 小结 161.9 资源 16第2章 初识Hadoop 172.1 Hadoop的构造模块 172.1.1 NameNode 172.1.2 DataNode 182.1.3 Secondary NameNode ...
基于模仿机制的认知无线电频谱动态接入,傅洛伊,陈海波,在认知无线电技术网络中,动态频谱分配是一项重要技术。本文主要关注二级用户如何通过协作或非协作方式使用授权频谱用户的空闲频
Secondary Sort 276 Joins 281 Map-Side Joins 282 Reduce-Side Joins 284 Side Data Distribution 287 vi | Table of Contents www.it-ebooks.info Using the Job Configuration 287 Distributed Cache 288 ...
Secondary Data Research in a Digital Age.ppt
包括NameNode、 Secondary NameNode、DataNode、JobTracker、 TaskTrack start-dfs.sh 启动Hadoop HDFS守护进程NameNode、SecondaryNameNode和DataNode stop-dfs.sh 停⽌Hadoop HDFS守护进程NameNode、...
hadoop快速入门,hadoop安装及hadoop配置
基于Hadoop+Spark招聘推荐可视化系统 后台:springboot 推荐模块:spark als模型 大数据、Hadoop、spark 首页: http://localhost:8080/springbootjlvpC/front/index.html 后台: ...
速生杨形成层和次生木质部的季节性发育,席恩华,赵广杰,本文运用传统的木材解剖学方法研究了活动期内速生欧美杨107形成层的解剖特征和次生木质部的细胞的堆积过程,同时探讨了季节性形成
91.6 用Hadoop统计单词——运行第一个程序 111.7 Hadoop历史 151.8 小结 161.9 资源 16第2章 初识Hadoop 172.1 Hadoop的构造模块 172.1.1 NameNode 172.1.2 DataNode 182.1.3 Secondary NameNode 192.1.4 JobTracker...
初识Hadoop2.1 Hadoop 的构造模块2.1.1 NameNode2.1.2 DataNode2.1.3 Secondary NameNode2.1.4 JobTracker2.1.5 TaskTracker2.2 为Hadoop 集群安装SSH2.2.1 定义一个公共账号2.2.2 验证SSH安装2.2.3 生成SSH密钥对...
对于一个pdb文件,要统计其中的secondary structure的种类,可以用此程序
• Secondary services such as a workflow system access Hadoop on behalf of users. • A Hadoop cluster scales to thousands of servers and tens of thousands of concurrent tasks. A Hadoop-powered "Data ...
Ni-Mn-Sn-Co快凝薄带材料中第二相晶体结构表征及形成机制,余金科,王戊,本文采用XRD衍射、扫描电镜以及透射电镜技术对Ni-Mn-Sn-Co快凝薄带材料中的第二相晶体结构进行了详细表征,并对其形成机制进行探索。...