Atlas 血缘分析-hive/spark

2024-05-25 20:36
文章标签 分析 hive spark 血缘 atlas

本文主要是介绍Atlas 血缘分析-hive/spark,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Apache Atlas部署安装

这里需要注意,需要从官网下载Atlas的源码,不要从git上分支去checkout,因为从分支checkout出来的代码,无法正常运行,这里小编使用针对Atlas-2.3.0源码进行编译.

mvn clean -DskipTests package -Pdist

部署前置条件

  • Elastic7.x
  • HBase2.x
  • Kafla-2.x
  • zookeeper-3.4.x
  • Hive Metastore - 3.x

Atlas参数配置

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##########  Graph Database Configs  ########## Graph Database#Configures the graph database to use.  Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus
# atlas.graph.storage.username=
# atlas.graph.storage.password=#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=10.0.0.141:2181,10.0.0.140:2181,10.0.0.142:2181
atlas.graph.storage.hbase.regions-per-server=1#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1# Graph Search Index
atlas.graph.index.search.backend=elasticsearch#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=false#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: https://docs.janusgraph.org/latest/elasticsearch.html
atlas.graph.index.search.hostname=10.0.0.79:9200,10.0.0.80:9200,10.0.0.141:9200
atlas.graph.index.search.elasticsearch.client-only=true# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150#########  Import Configs  #########
#atlas.import.temp.directory=/temp/import#########  Notification Configs  #########
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=10.0.0.141:2181,10.0.0.140:2181,10.0.0.142:2181/kafka
atlas.kafka.bootstrap.servers=10.0.0.141:9092,10.0.0.80:9092,10.0.0.79:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlasatlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/[email protected]
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443#########  Security Properties  ########## SSL config
atlas.enableTLS=false#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks# Authentication configatlas.authentication.method.kerberos=false
atlas.authentication.method.file=true#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.a

这篇关于Atlas 血缘分析-hive/spark的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1002565

相关文章

[职场] 公务员的利弊分析 #知识分享#经验分享#其他

公务员的利弊分析     公务员作为一种稳定的职业选择,一直备受人们的关注。然而,就像任何其他职业一样,公务员职位也有其利与弊。本文将对公务员的利弊进行分析,帮助读者更好地了解这一职业的特点。 利: 1. 稳定的职业:公务员职位通常具有较高的稳定性,一旦进入公务员队伍,往往可以享受到稳定的工作环境和薪资待遇。这对于那些追求稳定的人来说,是一个很大的优势。 2. 薪资福利优厚:公务员的薪资和

高度内卷下,企业如何通过VOC(客户之声)做好竞争分析?

VOC,即客户之声,是一种通过收集和分析客户反馈、需求和期望,来洞察市场趋势和竞争对手动态的方法。在高度内卷的市场环境下,VOC不仅能够帮助企业了解客户的真实需求,还能为企业提供宝贵的竞争情报,助力企业在竞争中占据有利地位。 那么,企业该如何通过VOC(客户之声)做好竞争分析呢?深圳天行健企业管理咨询公司解析如下: 首先,要建立完善的VOC收集机制。这包括通过线上渠道(如社交媒体、官网留言

BD错误集锦9——查询hive表格时出错:Wrong FS: hdfs://s233/user/../warehouse expected: hdfs://mycluster

集群环境描述:HDFS集群处于HA模式下,同时启动了YARN\JN\KAFKA\ZK。 现象: FAILED: SemanticException Unable to determine if hdfs://s233/user/hive/warehouse/mydb.db/ext_calllogs_in_hbase is encrypted: java.lang.IllegalArgument

BD错误集锦1——[Hive]ERROR StatusLogger No log4j2 configuration file found. Using default configuration:

错误描述:在使用IDEA进行jdbc方式连接到hive数据仓库时,出现以下错误:                ERROR StatusLogger No log4j2 configuration file found. 问题原因:缺少log4j2.xml文件   <?xml version="1.0" encoding="UTF-8"?><Configuration><Appender

打包体积分析和优化

webpack分析工具:webpack-bundle-analyzer 1. 通过<script src="./vue.js"></script>方式引入vue、vuex、vue-router等包(CDN) // webpack.config.jsif(process.env.NODE_ENV==='production') {module.exports = {devtool: 'none

Java中的大数据处理与分析架构

Java中的大数据处理与分析架构 大家好,我是免费搭建查券返利机器人省钱赚佣金就用微赚淘客系统3.0的小编,也是冬天不穿秋裤,天冷也要风度的程序猿!今天我们来讨论Java中的大数据处理与分析架构。随着大数据时代的到来,海量数据的存储、处理和分析变得至关重要。Java作为一门广泛使用的编程语言,在大数据领域有着广泛的应用。本文将介绍Java在大数据处理和分析中的关键技术和架构设计。 大数据处理与

段,页,段页,三种内存(RAM)管理机制分析

段,页,段页         是为实现虚拟内存而产生的技术。直接使用物理内存弊端:地址空间不隔离,内存使用效率低。 段 段:就是按照二进制文件的格式,在内存给进程分段(包括堆栈、数据段、代码段)。通过段寄存器中的段表来进行虚拟地址和物理地址的转换。 段实现的虚拟地址 = 段号+offset 物理地址:被分为很多个有编号的段,每个进程的虚拟地址都有段号,这样可以实现虚实地址之间的转换。其实所谓的地

mediasoup 源码分析 (八)分析PlainTransport

mediasoup 源码分析 (六)分析PlainTransport 一、接收裸RTP流二、mediasoup 中udp建立过程 tips 一、接收裸RTP流 PlainTransport 可以接收裸RTP流,也可以接收AES加密的RTP流。源码中提供了一个通过ffmpeg发送裸RTP流到mediasoup的脚本,具体地址为:mediasoup-demo/broadcaste

openfire+spark 在linux下安装,配置

文章转自:点击打开链接 相关软件下载 链接: https://pan.baidu.com/s/1boJs61h 密码: 2wd7 Openfire 在linux下安装和配置 + spark 在windows下配置 本机环境 系统:CentOS 6.7 64 位JDK 1.7 64 位MySQL 5.6 Openfir

Java并发编程—阻塞队列源码分析

在前面几篇文章中,我们讨论了同步容器(Hashtable、Vector),也讨论了并发容器(ConcurrentHashMap、CopyOnWriteArrayList),这些工具都为我们编写多线程程序提供了很大的方便。今天我们来讨论另外一类容器:阻塞队列。   在前面我们接触的队列都是非阻塞队列,比如PriorityQueue、LinkedList(LinkedList是双向链表,它实现了D