[Hue|Hive]return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

2023-11-11 05:30

本文主要是介绍[Hue|Hive]return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

问题

Hue(4.5)查询Hive(1.2.2)的Phoenix外部表运行一段时间之后会报错

Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

或者报错

Error while compiling statement: FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception org.apache.phoenix.exception.PhoenixIOException: Can't get the locationsorg.apache.hadoop.hive.serde2.SerDeException:

初步分析

以上第一种报错并没有显示真正的错误原因,可以从Hue的页面中看到真正的错误
在这里插入图片描述
具体的错误信息如下:

INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
DEBUG : Configuring job job_1589783756298_0002 with /tmp/hadoop-yarn/staging/admin/.staging/job_1589783756298_0002 as the submit dir
DEBUG : adding the following namenodes' delegation tokens:[hdfs://mycluster]
DEBUG : Creating splits at hdfs://mycluster/tmp/hadoop-yarn/staging/admin/.staging/job_1589783756298_0002
INFO  : Cleaning up the staging area /tmp/hadoop-yarn/staging/admin/.staging/job_1589783756298_0002
ERROR : Job Submission failed with exception 'java.lang.RuntimeException(org.apache.phoenix.exception.PhoenixIOException: Can't get the locations)'
java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException: Can't get the locationsat org.apache.phoenix.hive.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:267)at org.apache.phoenix.hive.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:131)at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:361)at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:571)at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:432)at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1676)at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1435)at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1218)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077)at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)at java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.phoenix.exception.PhoenixIOException: Can't get the locationsat org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:144)at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:1197)at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1491)at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:2725)at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:1114)at org.apache.phoenix.compile.CreateTableCompiler$1.execute(CreateTableCompiler.java:192)at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:408)at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:391)at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:390)at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:378)at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1806)at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2538)at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2499)at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2499)at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:269)at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:151)at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:228)at java.sql.DriverManager.getConnection(DriverManager.java:664)at java.sql.DriverManager.getConnection(DriverManager.java:208)at org.apache.phoenix.hive.util.PhoenixConnectionUtil.getConnection(PhoenixConnectionUtil.java:98)at org.apache.phoenix.hive.util.PhoenixConnectionUtil.getInputConnection(PhoenixConnectionUtil.java:63)at org.apache.phoenix.hive.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:250)... 42 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locationsat org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:319)at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327)at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:302)at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:167)at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:162)at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:797)at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:406)at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:1097)... 64 more

可以看出是提交Job的时报错了,错误原因是

Job Submission failed with exception 'java.lang.RuntimeException(org.apache.phoenix.exception.PhoenixIOException: Can't get the locations)'
java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException: Can't get the locations

Hive日志里面没有任何日志信息,由于是Hive的外部Phoenix表,所以可能是hive上有问题,直接尝试使用pyhive连接hive进行查询

from pyhive import hive
conn = hive.Connection(host=host,port=10000,username='',database='default',auth='NONE')
cursor = conn.cursor()
cursor.execute('select * from ext_table limit 1')

重新启动Hive,并开启debug日志

hive --service hiveserver2 -hiveconf hive.root.logger=DEBUG,console

多次执行python脚本之后在hive的日志中可以看到如下

20/05/18 06:47:42 [HiveServer2-Handler-Pool: Thread-224-SendThread(172.100.0.11:2181)]: WARN zookeeper.ClientCnxn: Session 0x0 for server 172.100.0.11/172.100.0.11:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peerat sun.nio.ch.FileDispatcherImpl.read0(Native Method)at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)at sun.nio.ch.IOUtil.read(IOUtil.java:192)at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377)at org.apache.phoenix.shaded.org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)at org.apache.phoenix.shaded.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)at org.apache.phoenix.shaded.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

发现连接zookeeper被reset了,查看对应的zookeeper日志

2020-05-18 06:47:42,568 [myid:11] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@188] - Too many connections from /172.100.0.11 - max is 60

可以看到是因为zookeeper的连接(来自同一个IP的连接,并不是所有的连接)已经达到上限了。查看zookeeper的连接,hive所在机器到zookeeper的连接已经超过最大值了。

修改zookeeper最大连接数

针对以上错误提示,可以直接修改zookeeper的zoo.cfg

maxClientCnxns=500

检查外部表的phoenix.zookeeper.quorum

检查外部表的的phoenix.zookeeper.quorum是否配置了多个zookeeper地址,避免所有phoenix外部表的查询都指向到同一个节点

深入分析

根据初步分析知道,报错的一个原因是连接数达到上限了,回过头来看问题重现的过程,我们是在不断执行相同的操作。也就是即使修改了zookeepermaxClientCnxnsphoenix.zookeeper.quorum,连接数依然会达到上限,因为修改的两个参数都只是扩大来上限而已,并没有解决连接数增长的问题。为了彻底解决问题,我们需要确定连接数的增长是否有上限,如果有,上限应该是多少,或者是根据其他条件能够明确上限;如果没有,那么需要确定为什么连接没有释放(或者没有重用)?
带着以上两个问题去查看源码,根据上面的分析过程大概知道是hive中的phoenix连接zookeeper是问题所在。查看phoenix的源码
定位到创建连接的地方在org/apache/phoenix/query/ConnectionQueryServicesImpl.java

public class ConnectionQueryServicesImpl extends DelegateQueryServices implements ConnectionQueryServices {...private void openConnection() throws SQLException {try {this.connection = HBaseFactoryProvider.getHConnectionFactory().createConnection(this.config);...} catch (IOException e) {...}...}...public void init(final String url, final Properties props) throws SQLException {...logger.info("An instance of ConnectionQueryServices was created.");openConnection();...}
}

查看调用.init()的地方,定位到org/apache/phoenix/jdbc/PhoenixDriver.java

public final class PhoenixDriver extends PhoenixEmbeddedDriver {protected ConnectionQueryServices getConnectionQueryServices(String url, final Properties info) throws SQLException {connectionQueryServices = connectionQueryServicesCache.get(normalizedConnInfo, new Callable<ConnectionQueryServices>() {@Overridepublic ConnectionQueryServices call() throws Exception {ConnectionQueryServices connectionQueryServices;if (normalizedConnInfo.isConnectionless()) {connectionQueryServices = new ConnectionlessQueryServicesImpl(services, normalizedConnInfo, info);} else {connectionQueryServices = new ConnectionQueryServicesImpl(services, normalizedConnInfo, info);}return connectionQueryServices;}});connectionQueryServices.init(url, info);}
}

可以看到以上代码中使用了Cache,也就是说,如果所有参数相同,那么应该利用缓存才对,没有利用缓存说明每次执行的时候normalizedConnInfohashCode并不相同,查看hashCode的生成规则

public static class ConnectionInfo {public ConnectionInfo(String zookeeperQuorum, Integer port, String rootNode, String principal, String keytab) {this.zookeeperQuorum = zookeeperQuorum;this.port = port;this.rootNode = rootNode;this.isConnectionless = PhoenixRuntime.CONNECTIONLESS.equals(zookeeperQuorum);this.principal = principal;this.keytab = keytab;try {this.user = User.getCurrent();} catch (IOException e) {throw new RuntimeException("Couldn't get the current user!!");}if (null == this.user) {throw new RuntimeException("Acquired null user which should never happen");}}@Overridepublic int hashCode() {final int prime = 31;int result = 1;result = prime * result + ((zookeeperQuorum == null) ? 0 : zookeeperQuorum.hashCode());result = prime * result + ((port == null) ? 0 : port.hashCode());result = prime * result + ((rootNode == null) ? 0 : rootNode.hashCode());result = prime * result + ((principal == null) ? 0 : principal.hashCode());result = prime * result + ((keytab == null) ? 0 : keytab.hashCode());// `user` is guaranteed to be non-nullresult = prime * result + user.hashCode();return result;}
}

根据上面的信息zookeeperQuorumport以及rootNode都是相同的字符串,principalkeytab也是相同的字符串,只有userhashCode比较可疑,查看org/apache/hadoop/hbase/security/User.class

public abstract class User {protected UserGroupInformation ugi;public int hashCode() {return this.ugi.hashCode();}
}

继续查看org/apache/hadoop/security/UserGroupInformation.class

public class UserGroupInformation {private final Subject subject;public int hashCode() {return System.identityHashCode(this.subject);}
}

可以看到这里使用了identityHashCode,这会导致即使请求参数相同,但是每次请求得到的hashCode不同,而导致没有重用缓存。这个Subject对应的是用户名(pyhive默认取的是客户端的用户名)。

修改hive.server2.enable.doAs

查看hive-site.xml配置文件,只需要把hive.server2.enable.doAs设置成false,执行的时候就不会使用客户端的用户,而是使用hive启动的用户。
因为不需要每次请求都创建hive启动的用户对象,因此得到的hashCode相同。

  <property><name>hive.server2.enable.doAs</name><value>false</value><description>Setting this property to true will have HiveServer2 executeHive operations as the user making the calls to it.</description></property>

检查连接是否释放

做了以上修改之后,发现zookeeper的连接还是在不断增长,后来发现是我最初使用的版本是apache-phoenix-4.13.1-HBase-1.2里面有很多HConnection connection = HConnectionManager.createConnection()创建的连接没有释放,在最新的版本apache-phoenix-4.14.1-HBase-1.2中已经修复了该问题org/apache/phoenix/hive/mapreduce/PhoenixInputFormat.java

try (HConnection connection = HConnectionManager.createConnection(PhoenixConnectionUtil.getConfiguration(jobConf))) {
}

总结

1、确保使用的Phoenix版本已经修复了连接释放的问题,如果没有请升级到已经修复连接问题的版本apache-phoenix-4.14.1-HBase-1.2及其后续版本
2、hive-site.xml中需要把hive.server2.enable.doAs设置为false
3、可以根据集群使用情况适当调整zoo.cfg的最大连接数maxClientCnxns的值
4、建立Phoenix外部表的时候使用zookeeper集群地址,不要只配置单点

这篇关于[Hue|Hive]return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/387854

相关文章

Hadoop企业开发案例调优场景

需求 (1)需求:从1G数据中,统计每个单词出现次数。服务器3台,每台配置4G内存,4核CPU,4线程。 (2)需求分析: 1G / 128m = 8个MapTask;1个ReduceTask;1个mrAppMaster 平均每个节点运行10个 / 3台 ≈ 3个任务(4    3    3) HDFS参数调优 (1)修改:hadoop-env.sh export HDFS_NAMENOD

Hadoop集群数据均衡之磁盘间数据均衡

生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

hadoop开启回收站配置

开启回收站功能,可以将删除的文件在不超时的情况下,恢复原数据,起到防止误删除、备份等作用。 开启回收站功能参数说明 (1)默认值fs.trash.interval = 0,0表示禁用回收站;其他值表示设置文件的存活时间。 (2)默认值fs.trash.checkpoint.interval = 0,检查回收站的间隔时间。如果该值为0,则该值设置和fs.trash.interval的参数值相等。

Hadoop数据压缩使用介绍

一、压缩原则 (1)运算密集型的Job,少用压缩 (2)IO密集型的Job,多用压缩 二、压缩算法比较 三、压缩位置选择 四、压缩参数配置 1)为了支持多种压缩/解压缩算法,Hadoop引入了编码/解码器 2)要在Hadoop中启用压缩,可以配置如下参数

org.hibernate.hql.ast.QuerySyntaxException:is not mapped 异常总结

org.hibernate.hql.ast.QuerySyntaxException: User is not mapped [select u from User u where u.userName=:userName and u.password=:password] 上面的异常的抛出主要有几个方面:1、最容易想到的,就是你的from是实体类而不是表名,这个应该大家都知道,注意

Caused by: org.hibernate.MappingException: Could not determine type for: org.cgh.ssh.pojo.GoodsType,

MappingException:这个主要是类映射上的异常,Could not determine type for: org.cgh.ssh.pojo.GoodsType,这句话表示GoodsType这个类没有被映射到

Apache Tiles 布局管理器

陈科肇 =========== 1.简介 一个免费的开源模板框架现代Java应用程序。  基于该复合图案它是建立以简化的用户界面的开发。 对于复杂的网站,它仍然最简单,最优雅的方式来一起工作的任何MVC技术。 Tiles允许作者定义页面片段可被组装成在运行一个完整的网页。  这些片段,或Tiles,可以用于为了降低公共页面元素的重复,简单地包括或嵌入在其它瓦片,制定了一系列可重复使用

Debugging Lua Project created in Cocos Code IDE creates “Waiting for debugger to connect” in Win-7

转自 I Installed Cocos Code IDE and created a new Lua Project. When Debugging the Project(F11) the game window pops up and gives me the message waiting for debugger to connect and then freezes. Also a

Apache HttpClient使用详解

转载地址:http://eksliang.iteye.com/blog/2191017 Http协议的重要性相信不用我多说了,HttpClient相比传统JDK自带的URLConnection,增加了易用性和灵活性(具体区别,日后我们再讨论),它不仅是客户端发送Http请求变得容易,而且也方便了开发人员测试接口(基于Http协议的),即提高了开发的效率,也方便提高代码的健壮性。因此熟

JavaScript正则表达式六大利器:`test`、`exec`、`match`、`matchAll`、`search`与`replace`详解及对比

在JavaScript中,正则表达式(Regular Expression)是一种用于文本搜索、替换、匹配和验证的强大工具。本文将深入解析与正则表达式相关的几个主要执行方法:test、exec、match、matchAll、search和replace,并对它们进行对比,帮助开发者更好地理解这些方法的使用场景和差异。 正则表达式基础 在深入解析方法之前,先简要回顾一下正则表达式的基础知识。正则