Hadoop 1.2.1 - 多节点集群 - Wordcount 程序的 Reducer 阶段挂起?

本文介绍了Hadoop 1.2.1 - 多节点集群 - Wordcount 程序的 Reducer 阶段挂起?的处理方法,对大家解决问题具有一定的参考价值

问题描述

我的问题在这里可能听起来多余,但之前问题的解决方案都是临时的.我尝试过的很少,但还没有运气.

实际上,我正在研究 hadoop-1.2.1(在 ubuntu 14 上),最初我有 单节点设置 然后我运行了 WordCount 程序成功.然后我根据 this 教程向其中添加了一个节点.它成功启动,没有任何错误,但是现在当我运行相同的 WordCount 程序时,它处于缩减阶段.我查看了任务跟踪器日志,如下所示:-

INFO org.apache.hadoop.mapred.TaskTracker:LaunchTaskAction(registerTask):attempt_201509110037_0001_m_000002_0 任务状态:UNASSIGNED信息 org.apache.hadoop.mapred.TaskTracker: 尝试启动:尝试_201509110037_0001_m_000002_0 需要 1 个插槽信息 org.apache.hadoop.mapred.TaskTracker:在 TaskLauncher 中,当前空闲插槽:2 并尝试启动需要 1 个插槽的尝试_201509110037_0001_m_000002_0信息 org.apache.hadoop.mapred.JobLocalizer:在这个 TT 上初始化用户 hadoopuser.INFO org.apache.hadoop.mapred.JvmManager:在JvmRunner中构造的JVM ID:jvm_201509110037_0001_m_18975496信息 org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_m_18975496 产生.信息 org.apache.hadoop.mapred.TaskController:将命令写入/app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_m_0000002.sh信息 org.apache.hadoop.mapred.TaskTracker:JVM,ID:jvm_201509110037_0001_m_18975496 给定任务:attempt_201509110037_0001_m_000002_0信息 org.apache.hadoop.mapred.TaskTracker: 尝试_201509110037_0001_m_000002_0 0.0% hdfs://HadoopMaster:54310/input/file02:25+3信息 org.apache.hadoop.mapred.TaskTracker:任务尝试_201509110037_0001_m_000002_0 已完成.信息 org.apache.hadoop.mapred.TaskTracker: 报告的尝试的输出大小_201509110037_0001_m_000002_0 是 6信息 org.apache.hadoop.mapred.TaskTracker: addFreeSlot : 当前空闲插槽 : 2信息 org.apache.hadoop.mapred.JvmManager:JVM:jvm_201509110037_0001_m_18975496 退出,退出代码为 0.它运行的任务数:1INFO org.apache.hadoop.mapred.TaskTracker:LaunchTaskAction(registerTask):attempt_201509110037_0001_r_000000_0 任务状态:UNASSIGNED信息 org.apache.hadoop.mapred.TaskTracker:尝试启动:尝试_201509110037_0001_r_000000_0 需要 1 个插槽信息 org.apache.hadoop.mapred.TaskTracker:在 TaskLauncher 中,当前可用插槽数:2 并尝试启动需要 1 个插槽的尝试_201509110037_0001_r_000000_0信息 org.apache.hadoop.io.nativeio.NativeIO:UID 到用户映射的初始化缓存,缓存超时为 14400 秒.信息 org.apache.hadoop.io.nativeio.NativeIO:从本机实现中获得了 UID 10 的用户名 hadoopuserINFO org.apache.hadoop.mapred.JvmManager:在JvmRunner中构造的JVM ID:jvm_201509110037_0001_r_18975496信息 org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_r_18975496 产生.信息 org.apache.hadoop.mapred.TaskController:将命令写入/app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_r_0000000.sh信息 org.apache.hadoop.mapred.TaskTracker:JVM,ID:jvm_201509110037_0001_r_18975496 给定任务:attempt_201509110037_0001_r_000000_0信息 org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:500, dest: 127.0.0.1:55946, bytes: 6, op: MAPRED_SHUFFLE, cliID:attempt_201509110037_0009, duration: 127.0.1:55946信息 org.apache.hadoop.mapred.TaskTracker: 尝试_201509110037_0001_r_000000_0 0.11111112% 减少 >复制(3 个中的 1 个,速度为 0.00 MB/s)>信息 org.apache.hadoop.mapred.TaskTracker: 尝试_201509110037_0001_r_000000_0 0.11111112% 减少 >复制(3 个中的 1 个,速度为 0.00 MB/s)>信息 org.apache.hadoop.mapred.TaskTracker: 尝试_201509110037_0001_r_000000_0 0.11111112% 减少 >复制(3 个中的 1 个,速度为 0.00 MB/s)>信息 org.apache.hadoop.mapred.TaskTracker: 尝试_201509110037_0001_r_000000_0 0.11111112% 减少 >复制(3 个中的 1 个,速度为 0.00 MB/s)>信息 org.apache.hadoop.mapred.TaskTracker: 尝试_201509110037_0001_r_000000_0 0.11111112% 减少 >复制(3 个中的 1 个,速度为 0.00 MB/s)>信息 org.apache.hadoop.mapred.TaskTracker: 尝试_201509110037_0001_r_000000_0 0.11111112% 减少 >复制(3 个中的 1 个,速度为 0.00 MB/s)>

也在我运行程序的控制台上它挂在 -

00:39:24 WARN mapred.JobClient:使用 GenericOptionsParser 解析参数.应用程序应该实现同样的工具.00:39:24 INFO util.NativeCodeLoader: 加载了 native-hadoop 库00:39:24 WARN snappy.LoadSnappy:未加载 Snappy 本机库00:39:24 INFO mapred.FileInputFormat:要处理的总输入路径:200:39:24 INFO mapred.JobClient:运行作业:job_201509110037_000100:39:25 信息 mapred.JobClient:映射 0% 减少 0%00:39:28 信息 mapred.JobClient:映射 100% 减少 0%00:39:35 信息 mapred.JobClient:映射 100% 减少 11%

我的配置文件如下:-

//core-site.xml

<预><代码><配置><财产><name>hadoop.tmp.dir</name><value>/app/hadoop/tmp</value><description>其他临时目录的基础.</description></属性><财产><name>fs.default.name</name><value>hdfs://HadoopMaster:54310</value><描述> 默认文件系统的名称.一个 URI,其方案和权限决定了文件系统的实现.这uri 的方案决定了配置属性 (fs.SCHEME.impl) 命名FileSystem 实现类.uri的权限用于确定文件系统的主机、端口等.</description></属性></配置>

//hdfs-site.xml

<预><代码><配置><财产><name>dfs.replication</name><值>1</值><描述>默认块复制.可以在创建文件时指定实际的复制次数.如果在创建时间中未指定复制,则使用默认值.</描述></属性></配置>

//mapred-site.xml

<预><代码><配置><财产><name>mapred.job.tracker</name><value>HadoopMaster:54311</value><描述> MapReduce 作业跟踪器运行的主机和端口在.如果是local",则作业作为单个映射在进程中运行并减少任务.</描述></属性><财产><name>mapred.reduce.slowstart.completed.maps</name><值>0.80</值></属性></配置>

/etc/hosts

127.0.0.1 本地主机127.0.1.1 M-1947#HADOOP 集群设置172.50.88.54 HadoopMaster172.50.88.60 HadoopSlave1# 以下几行适用于支持 IPv6 的主机::1 ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters

/etc/hostname

<块引用>

M-1947

//大师

<块引用>

HadoopMaster

//奴隶

<块引用>

HadoopMaster

HadoopSlave1

我已经为此苦苦挣扎了很长时间,感谢您的帮助.谢谢!

解决方案

Got it fixed..cluster 应该是正确的(而且这个问题不依赖于 cluster 的大小).

实际上是dns-lookup的问题,请确保进行以下更改以解决上述问题 -

  1. 尝试使用$主机名"在每台机器上打印主机名

  2. 检查为每台机器打印的主机名是否与在各自机器的主/从文件中创建的条目相同.

  3. 如果不匹配,则通过在/etc/hostname 文件中进行更改来重命名主机并重新启动系统.

示例:-

/etc/hosts文件中(假设在hadoop集群的Master机器上)

<块引用>

127.0.0.1 本地主机

127.0.1.1 约翰机

#Hadoop 集群

172.50.88.21 HadoopMaster

172.50.88.22 HadoopSlave1

172.50.88.23 HadoopSlave2

那么它 -> /etc/hostname 文件(在主机上)应该包含以下条目(为了解决上述问题)

<块引用>

HadoopMaster

同样验证各个slave节点的/etc/hostname文件.

My question may sound redundant here but the solution to the earlier questions were all ad-hoc. few I have tried but no luck yet.

Acutally, I am working on hadoop-1.2.1(on ubuntu 14), Initially I had single node set-up and there I ran the WordCount program succesfully. Then I added one more node to it according to this tutorial. It started successfully, without any errors, But now when I am running the same WordCount program it is hanging in reduce phase. I looked at task-tracker logs, they are as given below :-

INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201509110037_0001_m_000002_0 task's state:UNASSIGNED
INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201509110037_0001_m_000002_0 which needs 1 slots
INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201509110037_0001_m_000002_0 which needs 1 slots
INFO org.apache.hadoop.mapred.JobLocalizer: Initializing user hadoopuser on this TT.
INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201509110037_0001_m_18975496
INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_m_18975496 spawned.
INFO org.apache.hadoop.mapred.TaskController: Writing commands to /app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_m_000002_0/taskjvm.sh
INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201509110037_0001_m_18975496 given task: attempt_201509110037_0001_m_000002_0
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_m_000002_0 0.0% hdfs://HadoopMaster:54310/input/file02:25+3
INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201509110037_0001_m_000002_0 is done.
INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201509110037_0001_m_000002_0  was 6
INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201509110037_0001_m_18975496 exited with exit code 0. Number of tasks it ran: 1
INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201509110037_0001_r_000000_0 task's state:UNASSIGNED
INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201509110037_0001_r_000000_0 which needs 1 slots
INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201509110037_0001_r_000000_0 which needs 1 slots
INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hadoopuser for UID 10 from the native implementation
INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201509110037_0001_r_18975496
INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_r_18975496 spawned.
INFO org.apache.hadoop.mapred.TaskController: Writing commands to /app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_r_000000_0/taskjvm.sh
INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201509110037_0001_r_18975496 given task: attempt_201509110037_0001_r_000000_0
INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:500, dest: 127.0.0.1:55946, bytes: 6, op: MAPRED_SHUFFLE, cliID: attempt_201509110037_0001_m_000002_0, duration: 7129894
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 

Also on the console where I am running the program It hangs at -

00:39:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
00:39:24 INFO util.NativeCodeLoader: Loaded the native-hadoop library
00:39:24 WARN snappy.LoadSnappy: Snappy native library not loaded
00:39:24 INFO mapred.FileInputFormat: Total input paths to process : 2
00:39:24 INFO mapred.JobClient: Running job: job_201509110037_0001
00:39:25 INFO mapred.JobClient:  map 0% reduce 0%
00:39:28 INFO mapred.JobClient:  map 100% reduce 0%
00:39:35 INFO mapred.JobClient:  map 100% reduce 11%

and my configuration files are as follows :-

//core-site.xml

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://HadoopMaster:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>

//hdfs-site.xml

<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>   
</configuration>

//mapred-site.xml

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>HadoopMaster:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
<property>
<name>mapred.reduce.slowstart.completed.maps</name>
  <value>0.80</value>
</property>    
</configuration>

/etc/hosts

127.0.0.1 localhost
127.0.1.1 M-1947

#HADOOP CLUSTER SETUP
172.50.88.54 HadoopMaster
172.50.88.60 HadoopSlave1

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

/etc/hostname

M-1947

//masters

HadoopMaster

//slaves

HadoopMaster

HadoopSlave1

I have been struggling with it for long, any help is appreciated. Thanks !

解决方案

Got it fixed.. although, the same issue has multiple questions on the forums but the verified solution according to me is that hostname resolution for the any node in the cluster should be correct (moreover this issue doesnot depend upon the size of cluster).

Actually it is the issue with dns-lookup, ensure one make the below changes to resolve the above issue -

  1. try printing hostname on each machine using '$ hostname'

  2. check that the hostname printed for each machine is same as the entry made in master/slaves file for respective machine.

  3. If it doesn't matches then rename the host by making changes in the /etc/hostname file and reboot the system.

Example :-

in /etc/hosts file (let's say on Master machine of hadoop cluster)

127.0.0.1 localhost

127.0.1.1 john-machine

#Hadoop cluster

172.50.88.21 HadoopMaster

172.50.88.22 HadoopSlave1

172.50.88.23 HadoopSlave2

then it's -> /etc/hostname file (on master machine) should contain the following entry (for the above issue to be resolved)

HadoopMaster

similarly verify the /etc/hostname files of the each slave node.

这篇关于Hadoop 1.2.1 - 多节点集群 - Wordcount 程序的 Reducer 阶段挂起?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,WP2

admin_action_{$_REQUEST[‘action’]}

do_action( "admin_action_{$_REQUEST[‘action’]}" )动作钩子::在发送“Action”请求变量时激发。Action Hook: Fires when an ‘action’ request variable is sent.目录锚点:#说明#源码说明(Description)钩子名称的动态部分$_REQUEST['action']引用从GET或POST请求派生的操作。源码(Source)更新版本源码位置使用被使用2.6.0 wp-admin/admin.php:...

日期:2020-09-02 17:44:16 浏览:1159

admin_footer-{$GLOBALS[‘hook_suffix’]}

do_action( "admin_footer-{$GLOBALS[‘hook_suffix’]}", string $hook_suffix )操作挂钩:在默认页脚脚本之后打印脚本或数据。Action Hook: Print scripts or data after the default footer scripts.目录锚点:#说明#参数#源码说明(Description)钩子名的动态部分,$GLOBALS['hook_suffix']引用当前页的全局钩子后缀。参数(Parameters)参数类...

日期:2020-09-02 17:44:20 浏览:1060

customize_save_{$this->id_data[‘base’]}

do_action( "customize_save_{$this-&gt;id_data[‘base’]}", WP_Customize_Setting $this )动作钩子::在调用WP_Customize_Setting::save()方法时激发。Action Hook: Fires when the WP_Customize_Setting::save() method is called.目录锚点:#说明#参数#源码说明(Description)钩子名称的动态部分,$this->id_data...

日期:2020-08-15 15:47:24 浏览:800

customize_value_{$this->id_data[‘base’]}

apply_filters( "customize_value_{$this-&gt;id_data[‘base’]}", mixed $default )过滤器::过滤未作为主题模式或选项处理的自定义设置值。Filter Hook: Filter a Customize setting value not handled as a theme_mod or option.目录锚点:#说明#参数#源码说明(Description)钩子名称的动态部分,$this->id_date['base'],指的是设置...

日期:2020-08-15 15:47:24 浏览:887

get_comment_author_url

过滤钩子:过滤评论作者的URL。Filter Hook: Filters the comment author’s URL.目录锚点:#源码源码(Source)更新版本源码位置使用被使用 wp-includes/comment-template.php:32610...

日期:2020-08-10 23:06:14 浏览:925

network_admin_edit_{$_GET[‘action’]}

do_action( "network_admin_edit_{$_GET[‘action’]}" )操作挂钩:启动请求的处理程序操作。Action Hook: Fires the requested handler action.目录锚点:#说明#源码说明(Description)钩子名称的动态部分$u GET['action']引用请求的操作的名称。源码(Source)更新版本源码位置使用被使用3.1.0 wp-admin/network/edit.php:3600...

日期:2020-08-02 09:56:09 浏览:873

network_sites_updated_message_{$_GET[‘updated’]}

apply_filters( "network_sites_updated_message_{$_GET[‘updated’]}", string $msg )筛选器挂钩:在网络管理中筛选特定的非默认站点更新消息。Filter Hook: Filters a specific, non-default site-updated message in the Network admin.目录锚点:#说明#参数#源码说明(Description)钩子名称的动态部分$_GET['updated']引用了非默认的...

日期:2020-08-02 09:56:03 浏览:855

pre_wp_is_site_initialized

过滤器::过滤在访问数据库之前是否初始化站点的检查。Filter Hook: Filters the check for whether a site is initialized before the database is accessed.目录锚点:#源码源码(Source)更新版本源码位置使用被使用 wp-includes/ms-site.php:93910...

日期:2020-07-29 10:15:38 浏览:825

WordPress 的SEO 教学:如何在网站中加入关键字(Meta Keywords)与Meta 描述(Meta Description)?

你想在WordPress 中添加关键字和meta 描述吗?关键字和meta 描述使你能够提高网站的SEO。在本文中,我们将向你展示如何在WordPress 中正确添加关键字和meta 描述。为什么要在WordPress 中添加关键字和Meta 描述?关键字和说明让搜寻引擎更了解您的帖子和页面的内容。关键词是人们寻找您发布的内容时,可能会搜索的重要词语或片语。而Meta Description则是对你的页面和文章的简要描述。如果你想要了解更多关于中继标签的资讯,可以参考Google的说明。Meta 关键字和描...

日期:2020-10-03 21:18:25 浏览:1691

谷歌的SEO是什么

SEO (Search Engine Optimization)中文是搜寻引擎最佳化,意思近于「关键字自然排序」、「网站排名优化」。简言之,SEO是以搜索引擎(如Google、Bing)为曝光媒体的行销手法。例如搜寻「wordpress教学」,会看到本站的「WordPress教学:12个课程…」排行Google第一:关键字:wordpress教学、wordpress课程…若搜寻「网站架设」,则会看到另一个网页排名第1:关键字:网站架设、架站…以上两个网页,每月从搜寻引擎导入自然流量,达2万4千:每月「有机搜...

日期:2020-10-30 17:23:57 浏览:1298