Secondary Sort in Hadoop - 小雨 - ITeye博客

`

yaojingguo

浏览: 202451 次
性别:
来自: 北京

最近访客更多访客>>

844700118

hubo.0508

viyondpay

aqi_008

博主相关

博客

微博

相册

收藏

留言

关于我

文章分类

社区版块

存档分类

最新评论

fuliang： more cleanner than before
Big Integer Arithmetic
yaojingguo： Hi, Liang LiangThanks for the i ...
Redirect and restore stdout in C
fuliang：使用gcc编译.cpp,可以使用-lstdc++选项，这样gc ...
Redirect and restore stdout in C

Secondary Sort in Hadoop

博客分类：

java

阅读更多

I write this stuff since it confused me a lot.

First, the input Iterable values for reduce method is not sorted in any
order. It is mentioned on Page 277 in Hadoop: The Definitive Guide,
Third Edition.

Job has three methods related to second sort:

1) setPartitionerClass
2) setSortComparatorClass
3) setGroupingComparatorClass

For 0.18.3 API, JobConf's following three methods are used:
1) setPartitionerClass
2) setOutputKeyComparatorClass
3) setOutputValueGroupingComparator

The confusing part is setGroupingComparatorClass. For the following
input for reduce:

<k1, v1> <k2, v2> // sorted by key from map phase

If setGroupingComparatorClass decides that k1 is equivalent to k2 even
thought k1 and k2 are different, the output from reduce method will be:

<k1, <v1, v2>> // <v1, v2> are in the original key order, k2 is lost.

Now let's talk about MaxTemperatureUsingSecondarySort. Input is like:

1901 99
1901 98
1902 80
1902 10

The output from map is:

<<1901, 99>, null>
<<1901, 98>, null>
<<1902, 80>, null>
<<1902, 10>, null>

The reduce method will be invoked twice. The output is:

<1901, 99>
<1902, 80>

Now turns to SecondarySort in Hadoop examples. For the same input, the
output from map is:

<<1901, 98>, 98>
<<1901, 99>, 99>
<<1902, 10>, 10>
<<1902, 80>, 80>

The reduce method will also be invoked twice. The output is:

<1901, 98>
<1901, 99>
<1902, 10>
<1902, 80>

分享到：

Oracle Logging | How to read a reseach paper

2013-03-26 16:08
浏览 819
评论(0)
分类:数据库
查看更多

评论

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

Hadoop大数据期末考试重点: Hadoop大数据期末考试重点，选择、判断、简答

王家林的“云计算分布式大数据Hadoop第九讲Hadoop图文训练课程：剖析NameNode和Secondary NameNode的工作机制和流程.: 王家林的“云计算分布式大数据Hadoop实战高手之路---从零开始”的第九讲Hadoop图文训练课程：剖析NameNode和Secondary NameNode的工作机制和流程. 此教程来自于王家林免费发布的3本Hadoop教程：云计算分布式大数据...

Secondary Indexing in Phoenix: Secondary Indexing in Phoenix

基于hadoop对某网站日志分析部署实践课程设计报告参考模板.doc: 基于Hadoop部署实践对网站日志分析 1. 项目概述本次要实践的数据日志来源于国内某技术学习论坛，该论坛由某培训机构主办，汇聚了众多技术学习者，每天都有人发帖、回帖。...Secondary NameNode.....

Hadoop实战中文版: 1.6 用Hadoop统计单词——运行第一个程序　1.7 Hadoop历史　1.8 小结　1.9 资源　第2章初识Hadoop　2.1 Hadoop 的构造模块　2.1.1 NameNode　2.1.2 DataNode　2.1.3 Secondary NameNode　2.1.4 JobTracker　2.1.5 ...

Challenges Facing Secondary Education in the Third World Countries: Challenges Facing Secondary Education in the Third World Countries，Magreth Columbani Komba，，The aim of this paper is to explain the main challenges facing secondary education in the third world ...

Hadoop实战中文版.PDF: 91.6　用Hadoop统计单词——运行第一个程序　111.7　Hadoop历史　151.8　小结　161.9　资源　16第2章　初识Hadoop　172.1　Hadoop的构造模块　172.1.1　NameNode　172.1.2　DataNode　182.1.3　Secondary NameNode　...

论文研究-Secondary Users in Cognitive Radio Networks: Replication by Imitation.pdf: 基于模仿机制的认知无线电频谱动态接入，傅洛伊，陈海波，在认知无线电技术网络中，动态频谱分配是一项重要技术。本文主要关注二级用户如何通过协作或非协作方式使用授权频谱用户的空闲频

hadoop_the_definitive_guide_3nd_edition: Secondary Sort 276 Joins 281 Map-Side Joins 282 Reduce-Side Joins 284 Side Data Distribution 287 vi | Table of Contents www.it-ebooks.info Using the Job Configuration 287 Distributed Cache 288 ...

Secondary Data Research in a Digital Age.ppt: Secondary Data Research in a Digital Age.ppt

【大数据】Hadoop常用启动命令.pdf: 包括NameNode、 Secondary NameNode、DataNode、JobTracker、 TaskTrack start-dfs.sh 启动Hadoop HDFS守护进程NameNode、SecondaryNameNode和DataNode stop-dfs.sh 停⽌Hadoop HDFS守护进程NameNode、...

hadoop快速入门: hadoop快速入门，hadoop安装及hadoop配置

已过基于Hadoop+Spark招聘推荐可视化系统大数据项目毕业设计（源码下载）: 基于Hadoop+Spark招聘推荐可视化系统后台：springboot 推荐模块：spark als模型大数据、Hadoop、spark 首页： http://localhost:8080/springbootjlvpC/front/index.html 后台： ...

Seasonal development of cambium and secondary xylem in poplar: 速生杨形成层和次生木质部的季节性发育，席恩华，赵广杰，本文运用传统的木材解剖学方法研究了活动期内速生欧美杨107形成层的解剖特征和次生木质部的细胞的堆积过程，同时探讨了季节性形成

Hadoop实战: 91.6 用Hadoop统计单词——运行第一个程序 111.7 Hadoop历史 151.8 小结 161.9 资源 16第2章初识Hadoop 172.1 Hadoop的构造模块 172.1.1 NameNode 172.1.2 DataNode 182.1.3 Secondary NameNode 192.1.4 JobTracker...

Hadoop实战（陆嘉恒）译: 初识Hadoop2.1 Hadoop 的构造模块2.1.1 NameNode2.1.2 DataNode2.1.3 Secondary NameNode2.1.4 JobTracker2.1.5 TaskTracker2.2 为Hadoop 集群安装SSH2.2.1 定义一个公共账号2.2.2 验证SSH安装2.2.3 生成SSH密钥对...

calcuate the secondary structure: 对于一个pdb文件，要统计其中的secondary structure的种类，可以用此程序

sec_hdp_security_overview.pdf: • Secondary services such as a workflow system access Hadoop on behalf of users. • A Hadoop cluster scales to thousands of servers and tens of thousands of concurrent tasks. A Hadoop-powered "Data ...

Crystal structure and formation mechanism of the secondary phase in melt-spun Ni-Mn-Sn-Co ribbons: Ni-Mn-Sn-Co快凝薄带材料中第二相晶体结构表征及形成机制，余金科，王戊，本文采用XRD衍射、扫描电镜以及透射电镜技术对Ni-Mn-Sn-Co快凝薄带材料中的第二相晶体结构进行了详细表征，并对其形成机制进行探索。...

Global site tag (gtag.js) - Google Analytics