kerberos管理开发总结

本文主要是介绍kerberos管理开发总结，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

从10月23日左右，到10月27日搬完代码，到今天11月20日；小一个月已经过去了； 

kerberos管理所花的时间超出了我的预期。 

当然，中间出了很多七七八八的是，干扰了开发。 

比如，xjl事件，联通总部数据域项目，以及一些其他的事情。 

好在现在基本开发完了，还有一些小瑕疵，可留在后续解决，现在急需对过去一月的工作进行总结整理。 

通过这次开发，大量借用ambari代码， 

其中hibernate的使用方面，遇到了不少坑，大概消耗了有一周的时间，这是没太想到的。 

命令执行方面也消耗了不少时间，其中逻辑和kerberos有耦合，弄了有快两周。 

kerberos自身的逻辑调试了一周多。 

其中遇错无数，没时间整理，这一个月的加班和调休数据很壮观，基本上每周末加班一天，一周加班四天的节奏。一个字，忙。 

没时间招聘。 

这里先把出现的问题整理下，当然有些实际上没有解决，是绕行。 

 1.mac ssh交互式问题（未解决，半天） 

想通过scp拷贝文件，需要输入密码操作，报错如下 

read_passphrase: can't open /dev/tty: Device not configured 

前后查了大半天，发现是mac的问题；linux没有此问题. 

/dev/tty是存在的，权限也设置了 

最后通过配置免密解决了该问题。 

绕行。。。 

 2.keytab加密问题（2天，解决） 

kerberos开启后发现journalnode log出现无法解密的错误， 

Cannot find key of appropriate type to decrypt AP REP - DES3 CBC mode with SHA1-KD 

查了半天，最后定位到加密类型上，因为keytab的生成为调用kdc api生成，所以加密类型与kdc server不完全一致。 

通过klist查看，发现命令行生成的keytab，与api生成的keytab的加密类型不完全一致。 

[root@hadoop181 ~]# klist -ke hadoop.keytab 

Keytab name: FILE:hadoop.keytab 

KVNO Principal 

---- -------------------------------------------------------------------------- 

4 HTTP/hadoop181@BONC (aes256-cts-hmac-sha1-96) 

4 HTTP/hadoop181@BONC (aes128-cts-hmac-sha1-96) 

4 HTTP/hadoop181@BONC (des3-cbc-sha1) 

4 HTTP/hadoop181@BONC (arcfour-hmac) 

4 HTTP/hadoop181@BONC (camellia256-cts-cmac) 

4 HTTP/hadoop181@BONC (camellia128-cts-cmac) 

4 HTTP/hadoop181@BONC (des-hmac-sha1) 

4 HTTP/hadoop181@BONC (des-cbc-md5) 

判断为api生成keytab使用了kdc server无法解密的加密类型，将api中的加密类型筛查一遍， 

去除 aes128，只支持加密类型削减到这三种。 

 aes128-cts-hmac-sha1-96 des-cbc-md5 des3-cbc-sha1 

kdc.conf中也做对应修改，解决该问题 

 3.hbase 无法访问zookeeper问题 

kerberos开启后，hbase log报错 

2017-11-14 09:22:41,158 ERROR [main-SendThread(hadoop181:2181)] zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. 

初步判断是hbase的keytab访问失败，将zk-jaas.conf principal对应的主机名去掉，解决该问题。 

初步判断为kerberos.removeHostFromPrincipal没有生效。 

 4.误以为maven nexus坑(半天，解决) 

捂脸，自己挖的坑，然后一直喊，谁给我挖的坑。。。 

更换公司的nexus后，jenkins打包报错。 

<mirror> 

<id>mirror_name</id> 

<mirrorOf>*</mirrorOf> 

<name>nexus-public</name> 

<url>https://code.bonc.com.cn/nexus/repository/public/</url> 

</mirror> 

误以为镜像的*覆盖了配置的cdh的repository，导致源没有生效，cdh的jar包无法获得; 

而nexus的public里面没有包含cdh的源。 

然而反转出现在仔细查看链接后，为什么会莫名其妙的有个repository，再次把锅甩给repository的layout字段，捂脸。 

最后发现原来自己的依赖里面竟然使用了本地库的repository前缀。 

 < dependency > 

  < groupId >repository . org.apache.hbase </ groupId > 

  < artifactId > hbase-client </ artifactId > 

  < version > ${hbase.version} </ version > 

 </ dependency > 

去掉repository后，问题解决。 

后续： 

现在有点晕，repository到底代表什么，有点忘了。 

 5.jpa 多对多 数据重复问题（0.7天，解决） 

停止hbase服务时，发现半天没有执行完，查看日志，发现regionserver关了20多次，但主机是重复的。 

排查发现，regionserver角色对应的主机是通过多对多关联来映射的。 

FunctionRoleEntity注解如下 

 @ManyToMany (mappedBy =  "roles" , fetch=FetchType. EAGER ) 

 private  Set<HostEntity>  hosts ; 

 HostEntity注解如下 

 @ManyToMany (fetch = FetchType. LAZY ) 

 @JoinTable (name= "t_host_role" , joinColumns = { @JoinColumn (name= "host_id" , referencedColumnName =  "host_id" )}, 

 inverseJoinColumns = { @JoinColumn (name= "role_id" , referencedColumnName =  "role_id" )}) 

 private  List<FunctionRoleEntity>  roles ; 

同样类似的关系为集群与主机关系的对应，同样的注解，集群的主机数据没有问题。 

角色的主机数据却存在问题。 

初步排查是因为hosts采用了EAGER的获取方式，修改为LAZY加载方式，却引起了 

quartz定时与hibernate 懒加载的经典问题，该问题的解决方案更加复杂。 

详见wiki《 定时任务与hibernate延迟加载问题》 

又回到EAGER获取的方式上，经过google，终于发现深层次原因，英文如下， 

说白了就是人家就这么干，想解决自己去重。 

 It's generally not a good idea to enforce eager fetching in the mapping - it's better to specify eager joins in appropriate queries (unless you're 100% sure that under any and all circumstances your object won't make sense / be valid without that collection being populated). 
 The reason you're getting duplicates is because Hibernate internally joins your root and collection tables. Note that they really are duplicates, e.g. for 2 SynonymMappings with 3 collection elements each you would get 6 results (2x3), 3 copies of each SynonymMapping entity. So the easiest workaround is to wrap results in a Set thereby ensuring they're unique. 

 https://stackoverflow.com/questions/1093153/hibernate-collectionofelements-eager-fetch-duplicates-elements 

使用Set数据结构解决该问题。 

 6.snapshot包问题（0.5天，解决） 

终于开发完，到测试环境这一步了，jenkins打包问题终于解决完了，可是打出来的包却报各种错。 

其实包括之前的jenkins问题，都是由于使用了自己编译的jar包导致的。 

可参考之前发的phoenix打包的文章。 

为了使用cdh版本的phoenix，自己编译了phoenix的包, 有源代码修改，其依赖的tephra也做了源码修改。 

之前都是通过拷贝覆盖repository的方式来开发。 

现在有了nexus，需要将这些包deploy到远程。之前jenkins打包出错就是因为远程没有这些包。 

可是现在打包没问题了，运行却还是classdefnotfound，明显还是缺jar包。 

phoenix driver not found。 

查到最后，实在想不出来，查了下classpath依赖配置文件，发现里面依赖的jar包很是奇怪，带着明显的时间后缀。 

 4.7.1-HBase-1.2-cdh-20171118.085048-1 

查看了依赖，确实写得是 

 < dependency > 

 < groupId > org.apache.phoenix </ groupId > 

 < artifactId > phoenix-core </ artifactId > 

 < version > 4.7.1-HBase-1.2-cdh-SNAPSHOT </ version > 

 </ dependency > 

还是问下万能的google，果然snapshot会部署最新的jar包，所以带时间戳，如果想去掉，需要打包时增加配置。 

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>com.xxxx.service.user.startup.DubboStart</mainClass>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
<!-- 如果不加这一句则依赖的SNAPSHOT的jar包就会表现为MANIFEST.MF中的
Class-Path: lib/facede-user-1.0-20160512.093945-1.jar
但是打包到../lib/facede-user-1.0-SNAPSHOT.jar下面包,这样就会出现找不到类的情况 -->
<useUniqueVersions>false</useUniqueVersions>
</manifest>
</archive>
<classesDirectory>
</classesDirectory>
</configuration>
</plugin>

 https://stackoverflow.com/questions/6920536/why-the-snapshot-name-always-has-date-in-its-jar-file-name-how-to-remove-it 

 http://blog.csdn.net/doegoo/article/details/51395835 

7.bean 循环引用问题 

问题很怪异，本地运行没问题，不止我一个人。 

但是打包放上服务器后会出现如下报错： 

2017-11-20 14:30:22.745 [main] INFO o.s.o.j.LocalContainerEntityManagerFactoryBean -Initialized JPA EntityManagerFactory for persistence unit 'default' 

2017-11-20 14:30:24.714 [main] WARN o.s.b.f.a.AutowiredAnnotationBeanPostProcessor -Autowired annotation should be used on methods with parameters: void com.bonc.manager.rest.modules.security.actionmanager.ActionDBAccessorImpl.init() 

2017-11-20 14:30:25.037 [main] WARN o.s.b.c.e.AnnotationConfigEmbeddedWebApplicationContext -Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'requestFactory': Unsatisfied dependency expressed through field 'stageFactory'; nested exception is org.springframework.beans.factory.BeanCurrentlyInCreationException: Error creating bean with name 'stageFactory': Bean with name 'stageFactory' has been injected into other beans [actionDBAccessorImpl] in its raw version as part of a circular reference, but has eventually been wrapped. This means that said other beans do not use the final version of the bean. This is often the result of over-eager type matching - consider using 'getBeanNamesOfType' with the 'allowEagerInit' flag turned off, for example. 

2017-11-20 14:30:25.039 [main] INFO o.s.o.j.LocalContainerEntityManagerFactoryBean -Closing JPA EntityManagerFactory for persistence unit 'default' 

2017-11-20 14:30:25.066 [main] INFO o.a.catalina.core.StandardService -Stopping service Tomcat 

2017-11-20 14:30:25.125 [main] INFO o.s.b.a.l.AutoConfigurationReportLoggingInitializer - 

Error starting ApplicationContext. To display the auto-configuration report re-run your application with 'debug' enabled. 

2017-11-20 14:30:25.141 [main] ERROR o.s.b.d.LoggingFailureAnalysisReporter - 

*************************** 

APPLICATION FAILED TO START 

*************************** 

Description: 

There is a circular dependency between 1 beans in the application context: 

- requestFactory (field private com.bonc.manager.rest.modules.security.actionmanager.StageFactory com.bonc.manager.rest.modules.security.actionmanager.RequestFactory.stageFactory) 

- stageFactory 

看错误描述意思是stageFactory注入了两次，无法保证最新。原因是stageFactory，requestFactory， ActionDBAccessorImpl 之前产生了循环以来。 

排查代码，确认了该循环依赖。 

解决方案： 

将 ActionDBAccessorImpl中stageFactory，requestFactory注入改为方法注入，解决该问题 

 @Autowired 

 void  init(RequestFactory requestFactory, StageFactory stageFactory) { 

  this . requestFactory  = requestFactory; 

  this . stageFactory  = stageFactory; 

}

二期优化： 

1.

部署完成后需要手动重启rest，timeline服务。 

可以考虑配置写入，开关入库，并且自动加载kerberos。 

2.

目前keytab在manager机器上生成，这个不太合理；因为manager本身没有; 

后续应该放在admin机器上 

3.

kerberos_config, role_command配置去除 

4.

principal密码变更最好去掉，会影响所有使用principal服务。 

5.

scp 目标地无法访问时的超时时间较长，且没有返回错误状态。 

考虑直接通过命令执行体系解决掉ssh问题。 

6.

增加kerberos关闭功能。 

7.组件连接 大量报错问题 

不止下面一个，这种循环报错，要变 

大量报错，关闭服务时 

2017-11-19 14:09:10.963 INFO 14964 --- [(hadoop01:2181)] org.apache.zookeeper.ClientCnxn : Opening socket connection to server hadoop01/172.16.11.168:2181. Will not attempt to authenticate using SASL (unknown error) 

2017-11-19 14:09:10.978 WARN 14964 --- [(hadoop01:2181)] org.apache.zookeeper.ClientCnxn : Session 0x35fc36d3d35000b for server null, unexpected error, closing socket connection and attempting reconnect 

java.net.ConnectException: Connection refused 

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 

at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) 

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 

这篇关于kerberos管理开发总结的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

kerberos管理开发总结

相关文章

VSCode开发中有哪些好用的插件和快捷键

使用Redis实现会话管理的示例代码

Agent开发核心技术解析以及现代Agent架构设计

C# List.Sort四种重载总结

SpringBoot项目整合Netty启动失败的常见错误总结

SpringBoot整合Kafka启动失败的常见错误问题总结(推荐)

Python+wxPython开发一个文件属性比对工具

C++多线程开发环境配置方法

python3中正则表达式处理函数用法总结

Elasticsearch 的索引管理与映射配置实战指南