【kyuubi-spark】从0-1部署kyuubi集成spark执行spark sql到k8s读取iceberg的minio数据

本文主要是介绍【kyuubi-spark】从0-1部署kyuubi集成spark执行spark sql到k8s读取iceberg的minio数据,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

一、背景

团队在升级大数据架构

前端使用trino查询,对trino也进行了很多优化,目前测试来看,运行还算稳定,但是不可避免的trino的任务总会出现失败的情况。原来的架构是trino失败后去跑hive,而hive是跑mapreduce依赖于hadoop,新架构摒弃了hadoop,当然也没法用hive跑了,因此目前看较好的办法是使用spark sql来替代。

在初次研究了spark sql,发现没有能实时返回数据的方式,spark更着重于执行任务处理,而对于客户端交互没有很好的支持。在搜索了大量资料及chatgpt的帮助下,找到了用kyuubi驱动sparksql来与客户端交互的方法。网上关于kyuubi集成spark的文章大多只讲了spark和kyuubi的配置,配置完成执行一个spark demo就算完了,没有精准的实际案例,导致对这个集成没有一个整体的概念,有些无从下手,下面我会先讲清楚kyuubi jdbc获取数据的流程。当然过程中也遇到大量的问题,在研究了大概一周左右的时间,终于将kyuubi驱动spark到k8s跑spark sql,并访问rest catalogiceberg的数据走通了,在此记录并分享给大家!

二、目前架构

略。参考我另一篇文章

【大数据架构】基于流式数据的大数据架构升级-CSDN博客

三、版本介绍

组件名称版本
Iceberg1.5.0
Spark3.5.0
Kyuubi1.9.0

四、数据流介绍

Trino执行失败 >>> 访问Kyuubi Jdbc >>> Spark engine 提交spark任务到k8s >>> Spark读取iceberg元数据 >>> spark访问Minio读取数据并计算 >>> 返回数据到Kyuubi

这里讲一下kyuubi集成spark后是怎么通过kyuubi jdbc返回sql数据的。

1)首先kyuubi基于thrfit协议i创建了thrift jdbc server,同时kyuubi集成了spark的引擎,可以调用spark submit提交任务。那么从客户端到kyuubi再到spark执行sql就通了。

2)kyuubi使用spark引擎调用spark需要spark环境,因此在kyuubi中需要有一个spark环境,因此在本地需要装一个spark

3)kyuubi调用spark后执行任务,spark会使用spark submit将任务发送到本地,k8s,或yarn上面执行,这就用到了spark的动态调度特性

4)本例需要将spark发布到k8s执行,因此需要一个在k8s上能执行spark sql的镜像

5)其他环境:本例中存在一个基于iceberg的Rest catalog,数据和metadata存储在minio上,spark执行sql需要访问Rest Catalog,再去Minio取数据。访问Minio需要相应的access_key,secret_key,还有证书。以上这些都是已经存在了,需要在环境中或者配置中配好的。

下面正式开始集成。

五、spark安装及配置

1.首先下载spark程序,可以去官网下载,最新版是3.5.1,我下载的是3.5.0

https://dlcdn.apache.org/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz

2.上传到服务器解压

查看conf文件夹是这样的,其中是没有spark-env.sh和spark-defaults.conf

copy这两个配置文件

cp spark-env.sh.template spark-env.sh
cp spark-defaults.conf.template spark-defaults.conf

3.如果有环境变量,可以设置到spark-env.sh中,spark-env.sh的内容是这样的,我是暂时没有配置

#!/usr/bin/env bash#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos# Options read in any mode
# - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)# Options read in any cluster manager using HDFS
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files# Options read in YARN client/cluster mode
# - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers# Options for launcher
# - SPARK_LAUNCHER_OPTS, to set config properties and Java options for the launcher (e.g. "-Dx=y")# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR      Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR       Where log files are stored.  (Default: ${SPARK_HOME}/logs)
# - SPARK_LOG_MAX_FILES Max log files of Spark daemons can rotate to. Default is 5.
# - SPARK_PID_DIR       Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING  A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS      The scheduling priority for daemons. (Default: 0)
# - SPARK_NO_DAEMONIZE  Run the proposed command in the foreground. It will not output a PID file.
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
# - MKL_NUM_THREADS=1        Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS=1   Disable multi-threading of OpenBLAS# Options for beeline
# - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli (e.g. "-Dx=y")
# - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)

4. 配置spark-defaults.conf,我的完整配置如下。

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.# Example:#k8s地址spark.master                     k8s://https://10.38.199.201:443/k8s/clusters/c-m-l7gflsx7spark.eventLog.enabled           true#minio地址spark.eventLog.dir               s3a://wux-hoo-dev-01/ice_warehousespark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystemspark.hadoop.fs.s3a.access.key ******r5KpXkzEW2jNKWspark.hadoop.fs.s3a.secret.key ******hDYtuzsDnKGLGg9EJSbJ083ekuW7PejM#minio的endpointspark.hadoop.fs.s3a.endpoint http://XXX.com:30009spark.hadoop.fs.s3a.path.style.access truespark.hadoop.fs.s3a.aws.region=us-east-1spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystemspark.sql.catalog.default spark_catalogspark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkCatalogspark.sql.catalog.spark_catalog.type rest#spark.sql.catalog.spark_catalog.catalog-impl org.apache.iceberg.rest.RESTCatalogspark.sql.catalog.spark_catalog.uri http://10.40.8.42:31000spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensionsspark.sql.catalog.spark_catalog.io-impl org.apache.iceberg.aws.s3.S3FileIOspark.sql.catalog.spark_catalog.warehouse s3a://wux-hoo-dev-01/ice_warehousespark.sql.catalog.spark_catalog.s3.endpoint http://XXXXX.com:30009spark.sql.catalog.spark_catalog.s3.path-style-access truespark.sql.catalog.spark_catalog.s3.access-key-id ******5KpXkzEW2jNKWspark.sql.catalog.spark_catalog.s3.secret-access-key ******hDYtuzsDnKGLGg9EJSbJ083ekuW7PejMspark.sql.catalog.spark_catalog.region us-east-1#spark镜像harbor地址spark.kubernetes.container.image 10.38.199.203:1443/fhc/spark350:v1.0spark.kubernetes.namespace defaultspark.kubernetes.authenticate.driver.serviceAccountName sparkspark.kubernetes.container.image.pullPolicy Alwaysspark.submit.deployMode clusterspark.kubernetes.file.upload.path s3a://wux-hoo-dev-01/ice_warehouse
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

a)因为我是需要将spark sql提交到k8s上的,所以spark sql提交模式是cluster,然后配置k8s的路径。

b)其次我需要访问minio,需要配置minio访问的参数,spark.hadoop.fs.s3a的这一段

c)然后我访问mino是通过iceberg的,我创建了一个iceberg的rest catalog,地址是http://10.40.8.42:31000,所以要定义一个spark catalog,这些参数也需要加上

d)然后因为需要将spark sql提交到k8s上,需要一个spark环境的image,在创建pod的时候加载spark环境,所以要打一个spark运行环境的镜像 10.38.199.203:1443/fhc/spark350:v1.0

六、kyuubi部署配置

1.下载kyuubi,我下载的是1.9.0

2.上传到服务器,解压

查看conf文件夹是这样的,其中是没有kyuubi-env.sh和kyuubi-defaults.conf

copy这两个文件

cp kyuubi-env.sh.template kyuubi-env.sh
cp kyuubi-defaults.conf.template kyuubi-defaults.conf

3.配置kyuubi-env.sh,我的完整配置如下,主要配置java home,spark home,还有kyuuni启动参数

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# - JAVA_HOME               Java runtime to use. By default use "java" from PATH.
#
#
# - KYUUBI_CONF_DIR         Directory containing the Kyuubi configurations to use.
#                           (Default: $KYUUBI_HOME/conf)
# - KYUUBI_LOG_DIR          Directory for Kyuubi server-side logs.
#                           (Default: $KYUUBI_HOME/logs)
# - KYUUBI_PID_DIR          Directory stores the Kyuubi instance pid file.
#                           (Default: $KYUUBI_HOME/pid)
# - KYUUBI_MAX_LOG_FILES    Maximum number of Kyuubi server logs can rotate to.
#                           (Default: 5)
# - KYUUBI_JAVA_OPTS        JVM options for the Kyuubi server itself in the form "-Dx=y".
#                           (Default: none).
# - KYUUBI_CTL_JAVA_OPTS    JVM options for the Kyuubi ctl itself in the form "-Dx=y".
#                           (Default: none).
# - KYUUBI_BEELINE_OPTS     JVM options for the Kyuubi BeeLine in the form "-Dx=Y".
#                           (Default: none)
# - KYUUBI_NICENESS         The scheduling priority for Kyuubi server.
#                           (Default: 0)
# - KYUUBI_WORK_DIR_ROOT    Root directory for launching sql engine applications.
#                           (Default: $KYUUBI_HOME/work)
# - HADOOP_CONF_DIR         Directory containing the Hadoop / YARN configuration to use.
# - YARN_CONF_DIR           Directory containing the YARN configuration to use.
#
# - SPARK_HOME              Spark distribution which you would like to use in Kyuubi.
# - SPARK_CONF_DIR          Optional directory where the Spark configuration lives.
#                           (Default: $SPARK_HOME/conf)
# - FLINK_HOME              Flink distribution which you would like to use in Kyuubi.
# - FLINK_CONF_DIR          Optional directory where the Flink configuration lives.
#                           (Default: $FLINK_HOME/conf)
# - FLINK_HADOOP_CLASSPATH  Required Hadoop jars when you use the Kyuubi Flink engine.
# - HIVE_HOME               Hive distribution which you would like to use in Kyuubi.
# - HIVE_CONF_DIR           Optional directory where the Hive configuration lives.
#                           (Default: $HIVE_HOME/conf)
# - HIVE_HADOOP_CLASSPATH   Required Hadoop jars when you use the Kyuubi Hive engine.
### Examples ##export JAVA_HOME=/root/zulu11.52.13-ca-jdk11.0.13-linux_x64export SPARK_HOME=/root/spark-3.5.0-bin-hadoop3
# export FLINK_HOME=/opt/flink
# export HIVE_HOME=/opt/hive
# export FLINK_HADOOP_CLASSPATH=/path/to/hadoop-client-runtime-3.3.2.jar:/path/to/hadoop-client-api-3.3.2.jar
# export HIVE_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar:${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar
# export HADOOP_CONF_DIR=/usr/ndp/current/mapreduce_client/conf
# export YARN_CONF_DIR=/usr/ndp/current/yarn/confexport KYUUBI_JAVA_OPTS="-Xmx10g -XX:MaxMetaspaceSize=512m -XX:MaxDirectMemorySize=1024m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:+UseCondCardMark -XX:+UseGCOverheadLimit -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -verbose:gc -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=20M"export KYUUBI_BEELINE_OPTS="-Xmx2g -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+UseCondCardMark"

4.配置kyuubi-defaults.conf,我的配置如下,基本上把spark-defaults.conf搬一份过来

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
### Kyuubi Configurations#
# kyuubi.authentication                    NONE
#kyuubi.frontend.bind.host                10.38.199.201
# kyuubi.frontend.protocols                THRIFT_BINARY,REST
# kyuubi.frontend.thrift.binary.bind.port  10009kyuubi.frontend.rest.bind.port           10099
#kyuubi.engine.type                       SPARK_SQL
# kyuubi.engine.share.level                USER
# kyuubi.session.engine.initialize.timeout PT3M
#
# kyuubi.ha.addresses                      zk1:2181,zk2:2181,zk3:2181
# kyuubi.ha.namespace                      kyuubi
#spark.master=k8s://https://10.38.199.201:443/k8s/clusters/c-m-l7gflsx7
spark.home=/root/spark-3.5.0-bin-hadoop3spark.hadoop.fs.s3a.access.key=******5KpXkzEW2jNKW
spark.hadoop.fs.s3a.secret.key=******hDYtuzsDnKGLGg9EJSbJ083ekuW7PejM
spark.hadoop.fs.s3a.endpoint=http://XXXXX.com:30009
spark.hadoop.fs.s3a.path.style.access=true
spark.hadoop.fs.s3a.aws.region=us-east-1
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystemspark.sql.catalog.default=spark_catalog
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog.type=rest
#spark.sql.catalog.spark_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.spark_catalog.uri=http://10.40.8.42:31000
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
spark.sql.catalog.spark_catalog.warehouse=s3a://wux-hoo-dev-01/ice_warehouse
spark.sql.catalog.spark_catalog.s3.endpoint=http://XXXXX.com:30009
spark.sql.catalog.spark_catalog.s3.path-style-access=true
spark.sql.catalog.spark_catalog.s3.access-key-id=******r5KpXkzEW2jNKW
spark.sql.catalog.spark_catalog.s3.secret-access-key=******AhDYtuzsDnKGLGg9EJSbJ083ekuW7PejM
spark.sql.catalog.spark_catalog.region=us-east-1spark.kubernetes.container.image=10.38.199.203:1443/fhc/spark350:v1.0
spark.kubernetes.namespace=default
spark.kubernetes.authenticate.driver.serviceAccountName=spark
# Details in https://kyuubi.readthedocs.io/en/master/configuration/settings.html

七、spark运行环境镜像打包

这一步是最花时间的,因为打镜像后联调测试的时候,在kyuubi控制台执行spark sql,各种no class found,就是在spark镜像的环境中缺少各种jar包,其中最主要的包是amzon的jar。我是有一个错解决一个,把需要的jar下载下来打进去,一共打了大概31个jar吧。

Dockfile如下

from apache/spark:3.5.0RUN setUSER root#这个是minio访问的证书,如果是hdfs或者其他存储或没有证书的,不需要打进去
ADD SeagateCA02.cer /tmpCOPY awsicelib/ /opt/spark/jars#将minio证书导入到java环境
RUN keytool -import  -keystore /opt/java/openjdk/lib/security/cacerts -storepass changeit -noprompt -alias seagateca -file /tmp/SeagateCA02.cerRUN ls /opt/spark/jarsENV SPARK_HOME=/opt/sparkENV AWS_REGION=us-east-1

awsicelib/ 下面存储了所有调试报错后下载的jar

大概如下,我会把它打个zip包供下载,当然下载遇到困难的,可可以单独去maven仓库下,我都是到maven仓库下载的

资源包在这 https://download.csdn.net/download/w8998036/89410068

jar名称

-rw-r--r-- 1 root root     74757 Jun  7 10:34 apache-client-2.25.65.jar
-rw-r--r-- 1 root root    232047 Jun  7 09:26 auth-2.25.65.jar
-rw-r--r-- 1 root root    165024 Jun  6 16:47 aws-core-2.25.65.jar
-rw-r--r-- 1 root root 280645251 Jun  6 10:56 aws-java-sdk-bundle-1.12.262.jar
-rw-r--r-- 1 root root    115052 Jun  7 11:50 aws-json-protocol-2.25.65.jar
-rw-r--r-- 1 root root     68117 Jun  7 15:12 aws-query-protocol-2.25.65.jar
-rw-r--r-- 1 root root    101435 Jun  7 15:03 aws-xml-protocol-2.25.65.jar
-rw-r--r-- 1 root root      9341 Jun  7 15:16 checksums-2.25.65.jar
-rw-r--r-- 1 root root      8047 Jun  7 15:27 checksums-spi-2.25.65.jar
-rw-r--r-- 1 root root   2819073 Jun  7 14:50 dynamodb-2.25.65.jar
-rw-r--r-- 1 root root     13170 Jun  7 10:49 endpoints-spi-2.25.65.jar
-rw-r--r-- 1 root root   6786696 Jun  7 10:15 glue-2.25.65.jar
-rw-r--r-- 1 root root    962685 Jun  6 10:55 hadoop-aws-3.3.4.jar
-rw-r--r-- 1 root root     17476 Jun  7 11:02 http-auth-2.25.65.jar
-rw-r--r-- 1 root root    211391 Jun  7 10:58 http-auth-aws-2.25.65.jar
-rw-r--r-- 1 root root     44434 Jun  7 10:54 http-auth-spi-2.25.65.jar
-rw-r--r-- 1 root root     84134 Jun  7 09:51 http-client-spi-2.25.65.jar
-rw-r--r-- 1 root root  41601849 Jun  6 16:14 iceberg-spark-runtime-3.5_2.12-1.5.0.jar
-rw-r--r-- 1 root root     30965 Jun  7 09:34 identity-spi-2.25.65.jar
-rw-r--r-- 1 root root     30943 Jun  7 11:50 json-utils-2.25.65.jar
-rw-r--r-- 1 root root   1502321 Jun  7 14:32 kms-2.25.65.jar
-rw-r--r-- 1 root root     27267 Jun  7 14:58 metrics-spi-2.25.65.jar
-rw-r--r-- 1 root root     49524 Jun  7 10:38 profiles-2.25.65.jar
-rw-r--r-- 1 root root     35145 Jun  7 11:50 protocol-core-2.25.65.jar
-rw-r--r-- 1 root root     11640 Jun  7 11:31 reactive-streams-1.0.4.jar
-rw-r--r-- 1 root root    860193 Jun  7 10:23 regions-2.25.65.jar
-rw-r--r-- 1 root root   3578525 Jun  6 16:41 s3-2.25.65.jar
-rw-r--r-- 1 root root    900547 Jun  7 09:10 sdk-core-2.25.65.jar
-rw-r--r-- 1 root root    506301 Jun  7 09:56 sts-2.25.65.jar
-rw-r--r-- 1 root root    535001 Jun  7 13:26 third-party-jackson-core-2.25.65.jar
-rw-r--r-- 1 root root    218521 Jun  7 09:39 utils-2.25.65.jar

这一步的错误贴几个,就是在kyuubi控制台执行sql的时候出现的

其次还要注意下载的包的版本,spark3.5.0镜像里面用的scale版本是2.12,下载包的时候如果有版本的问题要注意一下。

镜像打包完成 10.38.199.203:1443/fhc/spark350:v1.0,在spark配置文件中引用它

八、测试kyuubi执行spark sql

cd到kyuubi home目录,执行命令jdbc:hive2://10.38.199.201:10009。

bin/beeline -u 'jdbc:hive2://10.38.199.201:10009'

第一次执行会比较慢,控制台会打印出spark pod(driver和executor)构建过程,打印出的log如下

[root@t001 apache-kyuubi-1.9.0-bin]# bin/beeline -u 'jdbc:hive2://10.38.199.201:10009'
Connecting to jdbc:hive2://10.38.199.201:10009
2024-06-07 20:16:50.252 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.operaunchEngine: Processing anonymous's query[1b721771-c943-45f5-ab31-384652d151b9]: PENDING_STATENING_STATE, statement:
LaunchEngine
2024-06-07 20:16:50.255 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.shadtor.framework.imps.CuratorFrameworkImpl: Starting
2024-06-07 20:16:50.256 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.shadeeper.ZooKeeper: Initiating client connection, connectString=10.38.199.201:2181 sessionTimeoutwatcher=org.apache.kyuubi.shaded.curator.ConnectionState@62306c2b
2024-06-07 20:16:50.259 INFO KyuubiSessionManager-exec-pool: Thread-407-SendThread(t001:2181) che.kyuubi.shaded.zookeeper.ClientCnxn: Opening socket connection to server t001/10.38.199.201Will not attempt to authenticate using SASL (unknown error)
2024-06-07 20:16:50.262 INFO KyuubiSessionManager-exec-pool: Thread-407-SendThread(t001:2181) che.kyuubi.shaded.zookeeper.ClientCnxn: Socket connection established to t001/10.38.199.201:21tiating session
2024-06-07 20:16:50.269 INFO KyuubiSessionManager-exec-pool: Thread-407-SendThread(t001:2181) che.kyuubi.shaded.zookeeper.ClientCnxn: Session establishment complete on server t001/10.38.19181, sessionid = 0x100dd6b3d350015, negotiated timeout = 60000
2024-06-07 20:16:50.269 INFO KyuubiSessionManager-exec-pool: Thread-407-EventThread org.apache.shaded.curator.framework.state.ConnectionStateManager: State change: CONNECTED
2024-06-07 20:16:50.309 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.engiBuilder: Logging to /root/kyuubi/apache-kyuubi-1.9.0-bin/work/anonymous/kyuubi-spark-sql-engin
2024-06-07 20:16:50.310 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.Utiling Kyuubi properties from /root/spark-3.5.0-bin-hadoop3/conf/spark-defaults.conf
2024-06-07 20:16:50.314 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.engineRef: Launching engine:
/root/spark-3.5.0-bin-hadoop3/bin/spark-submit \--class org.apache.kyuubi.engine.spark.SparkSQLEngine \--conf spark.hive.server2.thrift.resultset.default.fetch.size=1000 \--conf spark.kyuubi.client.ipAddress=10.38.199.201 \--conf spark.kyuubi.client.version=1.9.0 \--conf spark.kyuubi.engine.engineLog.path=/root/kyuubi/apache-kyuubi-1.9.0-bin/work/an/kyuubi-spark-sql-engine.log.9 \--conf spark.kyuubi.engine.submit.time=1717762610303 \--conf spark.kyuubi.engine.type=SPARK_SQL \--conf spark.kyuubi.ha.addresses=10.38.199.201:2181 \--conf spark.kyuubi.ha.engine.ref.id=2b3ce304-7f0d-455e-bffd-e2e7df1d3c3d \--conf spark.kyuubi.ha.namespace=/kyuubi_1.9.0_USER_SPARK_SQL/anonymous/default \--conf spark.kyuubi.ha.zookeeper.auth.type=NONE \--conf spark.kyuubi.server.ipAddress=10.38.199.201 \--conf spark.kyuubi.session.connection.url=10.38.199.201:10009 \--conf spark.kyuubi.session.real.user=anonymous \--conf spark.app.name=kyuubi_USER_SPARK_SQL_anonymous_default_2b3ce304-7f0d-455e-bffd-d3c3d \--conf spark.hadoop.fs.s3a.access.key=apPeWWr5KpXkzEW2jNKW \--conf spark.hadoop.fs.s3a.aws.region=us-east-1 \--conf spark.hadoop.fs.s3a.endpoint=http://wuxdihadl01b.seagate.com:30009 \--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \--conf spark.hadoop.fs.s3a.path.style.access=true \--conf spark.hadoop.fs.s3a.secret.key=cRt3inWAhDYtuzsDnKGLGg9EJSbJ083ekuW7PejM \--conf spark.home=/root/spark-3.5.0-bin-hadoop3 \--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \--conf spark.kubernetes.container.image=10.38.199.203:1443/fhc/spark350:v1.0 \--conf spark.kubernetes.driver.label.kyuubi-unique-tag=2b3ce304-7f0d-455e-bffd-e2e7df1--conf spark.kubernetes.driver.pod.name=kyuubi-user-spark-sql-anonymous-default-2b3ce3-455e-bffd-e2e7df1d3c3d-driver \--conf spark.kubernetes.executor.podNamePrefix=kyuubi-user-spark-sql-anonymous-default04-7f0d-455e-bffd-e2e7df1d3c3d \--conf spark.kubernetes.namespace=default \--conf spark.master=k8s://https://10.38.199.201:443/k8s/clusters/c-m-l7gflsx7 \--conf spark.sql.catalog.default=spark_catalog \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog \--conf spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \--conf spark.sql.catalog.spark_catalog.region=us-east-1 \--conf spark.sql.catalog.spark_catalog.s3.access-key-id=apPeWWr5KpXkzEW2jNKW \--conf spark.sql.catalog.spark_catalog.s3.endpoint=http://wuxdihadl01b.seagate.com:300--conf spark.sql.catalog.spark_catalog.s3.path-style-access=true \--conf spark.sql.catalog.spark_catalog.s3.secret-access-key=cRt3inWAhDYtuzsDnKGLGg9EJSuW7PejM \--conf spark.sql.catalog.spark_catalog.type=rest \--conf spark.sql.catalog.spark_catalog.uri=http://10.40.8.42:31000 \--conf spark.sql.catalog.spark_catalog.warehouse=s3a://wux-hoo-dev-01/ice_warehouse \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExt \--conf spark.kubernetes.driverEnv.SPARK_USER_NAME=anonymous \--conf spark.executorEnv.SPARK_USER_NAME=anonymous \--proxy-user anonymous /root/kyuubi/apache-kyuubi-1.9.0-bin/externals/engines/spark/kyark-sql-engine_2.12-1.9.0.jar
24/06/07 20:16:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platforing builtin-java classes where applicable
24/06/07 20:16:54 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using currentt from users K8S config file
24/06/07 20:16:55 WARN DriverServiceFeatureStep: Driver's hostname would preferably be kyuubi-ark-sql-anonymous-default-2b3ce304-7f0d-455e-bffd-e2e7df1d3c3d-25e50e8ff2a16310-driver-svc, buis too long (must be <= 63 characters). Falling back to use spark-b043ef8ff2a1691d-driver-svc driver service's name.
24/06/07 20:16:55 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file  or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
24/06/07 20:16:56 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-ftem.properties,hadoop-metrics2.properties
24/06/07 20:16:56 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
24/06/07 20:16:56 INFO MetricsSystemImpl: s3a-file-system metrics system started
24/06/07 20:16:57 INFO KubernetesUtils: Uploading file: /root/kyuubi/apache-kyuubi-1.9.0-bin/es/engines/spark/kyuubi-spark-sql-engine_2.12-1.9.0.jar to dest: s3a://wux-hoo-dev-01/ice_warehark-upload-e2c1f98d-0612-4ea8-9645-7ddebbae8afc/kyuubi-spark-sql-engine_2.12-1.9.0.jar...
24/06/07 20:16:58 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/root3.5.0-bin-hadoop3/conf) : spark-env.sh,log4j2.properties
24/06/07 20:16:59 INFO LoggingPodStatusWatcherImpl: State changed, new state:pod name: kyuubi-user-spark-sql-anonymous-default-2b3ce304-7f0d-455e-bffd-e2e7df1d3c3rnamespace: defaultlabels: kyuubi-unique-tag -> 2b3ce304-7f0d-455e-bffd-e2e7df1d3c3d, spark-app-name -> user-spark-sql-anonymous-default-2b3ce304-7f0d-455e-bffd, spark-app-selector -> spark-a6876b8c5b87f15a03bc8959a, spark-role -> driver, spark-version -> 3.5.0pod uid: 7826bf25-67a5-4c84-b98e-7239f999a368creation time: 2024-06-07T12:16:59Zservice account name: sparkvolumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-rw6gmnode name: t009start time: 2024-06-07T12:16:59Zphase: Pendingcontainer status:container name: spark-kubernetes-drivercontainer image: 10.38.199.203:1443/fhc/spark350:v1.0container state: waitingpending reason: ContainerCreating
24/06/07 20:16:59 INFO LoggingPodStatusWatcherImpl: State changed, new state:pod name: kyuubi-user-spark-sql-anonymous-default-2b3ce304-7f0d-455e-bffd-e2e7df1d3c3rnamespace: defaultlabels: kyuubi-unique-tag -> 2b3ce304-7f0d-455e-bffd-e2e7df1d3c3d, spark-app-name -> user-spark-sql-anonymous-default-2b3ce304-7f0d-455e-bffd, spark-app-selector -> spark-a6876b8c5b87f15a03bc8959a, spark-role -> driver, spark-version -> 3.5.0pod uid: 7826bf25-67a5-4c84-b98e-7239f999a368creation time: 2024-06-07T12:16:59Zservice account name: sparkvolumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-rw6gmnode name: t009start time: 2024-06-07T12:16:59Zphase: Pendingcontainer status:container name: spark-kubernetes-drivercontainer image: 10.38.199.203:1443/fhc/spark350:v1.0container state: waitingpending reason: ContainerCreating
24/06/07 20:16:59 INFO LoggingPodStatusWatcherImpl: Waiting for application kyuubi_USER_SPARK_nymous_default_2b3ce304-7f0d-455e-bffd-e2e7df1d3c3d with application ID spark-a6876b8c2ebe48a503bc8959a and submission ID default:kyuubi-user-spark-sql-anonymous-default-2b3ce304-7f0d-455e2e7df1d3c3d-driver to finish...
24/06/07 20:17:00 INFO LoggingPodStatusWatcherImpl: State changed, new state:pod name: kyuubi-user-spark-sql-anonymous-default-2b3ce304-7f0d-455e-bffd-e2e7df1d3c3rnamespace: defaultlabels: kyuubi-unique-tag -> 2b3ce304-7f0d-455e-bffd-e2e7df1d3c3d, spark-app-name -> user-spark-sql-anonymous-default-2b3ce304-7f0d-455e-bffd, spark-app-selector -> spark-a6876b8c5b87f15a03bc8959a, spark-role -> driver, spark-version -> 3.5.0pod uid: 7826bf25-67a5-4c84-b98e-7239f999a368creation time: 2024-06-07T12:16:59Zservice account name: sparkvolumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-rw6gmnode name: t009start time: 2024-06-07T12:16:59Zphase: Pendingcontainer status:container name: spark-kubernetes-drivercontainer image: 10.38.199.203:1443/fhc/spark350:v1.0container state: waitingpending reason: ContainerCreating
24/06/07 20:17:00 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Pending)
24/06/07 20:17:01 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Pending)
24/06/07 20:17:02 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Pending)
24/06/07 20:17:03 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Pending)
24/06/07 20:17:04 INFO LoggingPodStatusWatcherImpl: State changed, new state:pod name: kyuubi-user-spark-sql-anonymous-default-2b3ce304-7f0d-455e-bffd-e2e7df1d3c3rnamespace: defaultlabels: kyuubi-unique-tag -> 2b3ce304-7f0d-455e-bffd-e2e7df1d3c3d, spark-app-name -> user-spark-sql-anonymous-default-2b3ce304-7f0d-455e-bffd, spark-app-selector -> spark-a6876b8c5b87f15a03bc8959a, spark-role -> driver, spark-version -> 3.5.0pod uid: 7826bf25-67a5-4c84-b98e-7239f999a368creation time: 2024-06-07T12:16:59Zservice account name: sparkvolumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-rw6gmnode name: t009start time: 2024-06-07T12:16:59Zphase: Runningcontainer status:container name: spark-kubernetes-drivercontainer image: 10.38.199.203:1443/fhc/spark350:v1.0container state: runningcontainer started at: 2024-06-07T12:17:03Z
24/06/07 20:17:04 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:05 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:06 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:07 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:08 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:09 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:10 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:11 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:12 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:13 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:14 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:15 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:16 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:17 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:18 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:19 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:20 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:21 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:22 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:23 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:24 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
2024-06-07 20:17:25.372 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.ha.cookeeper.ZookeeperDiscoveryClient: Get service instance:10.42.235.227:36967 engine id:spark-a6ebe48a5b87f15a03bc8959a and version:1.9.0 under /kyuubi_1.9.0_USER_SPARK_SQL/anonymous/default
24/06/07 20:17:25 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:26 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:27 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:28 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:29 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:30 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:31 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:32 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:33 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:34 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:35 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:36 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:37 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:38 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:39 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
2024-06-07 20:17:43.512 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.sessubiSessionImpl: [anonymous:10.38.199.201] SessionHandle [2b3ce304-7f0d-455e-bffd-e2e7df1d3c3d]ected to engine [10.42.235.227:36967]/[spark-a6876b8c2ebe48a5b87f15a03bc8959a] with SessionHan3ce304-7f0d-455e-bffd-e2e7df1d3c3d]]
2024-06-07 20:17:43.513 INFO Curator-Framework-0 org.apache.kyuubi.shaded.curator.framework.imtorFrameworkImpl: backgroundOperationsLoop exiting
2024-06-07 20:17:43.520 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.shadeeper.ZooKeeper: Session: 0x100dd6b3d350015 closed
2024-06-07 20:17:43.520 INFO KyuubiSessionManager-exec-pool: Thread-407-EventThread org.apache.shaded.zookeeper.ClientCnxn: EventThread shut down for session: 0x100dd6b3d350015
2024-06-07 20:17:43.530 INFO KyuubiSessionManager-exec-pool: Thread-407 org.apache.kyuubi.operaunchEngine: Processing anonymous's query[1b721771-c943-45f5-ab31-384652d151b9]: RUNNING_STATEISHED_STATE, time taken: 53.276 seconds
24/06/07 20:17:40 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:41 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
24/06/07 20:17:42 INFO LoggingPodStatusWatcherImpl: Application status for spark-a6876b8c2ebe415a03bc8959a (phase: Running)
Connected to: Spark SQL (version 3.5.0)
Driver: Kyuubi Project Hive JDBC Client (version 1.9.0)
Beeline version 1.9.0 by Apache Kyuubi
0: jdbc:hive2://10.38.199.201:10009>
0: jdbc:hive2://10.38.199.201:10009>
0: jdbc:hive2://10.38.199.201:10009>

此时去k8s的rancher中看到,生成了一个driver和2个executor,其中driver的log就是上面控制台打出的

执行sql

show catalogs;show schemas;show tables from p530_cimarronbp;select serial_num,attr_name,pre_attr_value,post_attr_value from p530_cimarronbp.attr_vals limit 10;

分别输出

查表过程如下

0: jdbc:hive2://10.38.199.201:10009> select serial_num,attr_name,pre_attr_value,post_attr_value from p530_cimarronbp.attr_vals limit 10;
2024-06-07 21:03:35.706 INFO KyuubiSessionManager-exec-pool: Thread-437 org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[9ca66778-6ada-4e82-a67c-94ff0d1711be]: PENDING_STATE -> RUNNING_STATE, statement:
select serial_num,attr_name,pre_attr_value,post_attr_value from p530_cimarronbp.attr_vals limit 10
24/06/07 13:03:35 INFO ExecuteStatement: Processing anonymous's query[9ca66778-6ada-4e82-a67c-94ff0d1711be]: PENDING_STATE -> RUNNING_STATE, statement:
select serial_num,attr_name,pre_attr_value,post_attr_value from p530_cimarronbp.attr_vals limit 10
24/06/07 13:03:35 INFO ExecuteStatement:Spark application name: kyuubi_USER_SPARK_SQL_anonymous_default_2b3ce304-7f0d-455e-bffd-e2e7df1d3c3dapplication ID: spark-a6876b8c2ebe48a5b87f15a03bc8959aapplication web UI: http://spark-b043ef8ff2a1691d-driver-svc.default.svc:4040master: k8s://https://10.38.199.201:443/k8s/clusters/c-m-l7gflsx7deploy mode: clusterversion: 3.5.0Start time: 2024-06-07T12:17:09.076User: anonymous
24/06/07 13:03:35 INFO ExecuteStatement: Execute in full collect mode
24/06/07 13:03:35 INFO V2ScanRelationPushDown:
Output: serial_num#89, attr_name#91, pre_attr_value#92, post_attr_value#9324/06/07 13:03:35 INFO SnapshotScan: Scanning table spark_catalog.p530_cimarronbp.attr_vals snapshot 7873460573566859549 created at 2024-06-05T07:37:31.200+00:00 with filter true
24/06/07 13:03:35 INFO BaseDistributedDataScan: Planning file tasks locally for table spark_catalog.p530_cimarronbp.attr_vals
24/06/07 13:03:36 INFO LoggingMetricsReporter: Received metrics report: ScanReport{tableName=spark_catalog.p530_cimarronbp.attr_vals, snapshotId=7873460573566859549, filter=true, schemaId=0, projectedFieldIds=[1, 3, 4, 5], projectedFieldNames=[serial_num, attr_name, pre_attr_value, post_attr_value], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.171799742S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=443}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=40}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=39}, skippedDataManifests=CounterResult{unit=COUNT, value=1}, totalFileSizeInBytes=CounterResult{unit=BYTES, value=600752760}, totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=0}, skippedDataFiles=CounterResult{unit=COUNT, value=0}, skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, scannedDeleteManifests=CounterResult{unit=COUNT, value=0}, skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, indexedDeleteFiles=CounterResult{unit=COUNT, value=0}, equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, positionalDeleteFiles=CounterResult{unit=COUNT, value=0}}, metadata={engine-version=3.5.0, iceberg-version=Apache Iceberg 1.5.0 (commit 2519ab43d654927802cc02e19c917ce90e8e0265), app-id=spark-a6876b8c2ebe48a5b87f15a03bc8959a, engine-name=spark}}
24/06/07 13:03:36 INFO SparkPartitioningAwareScan: Reporting UnknownPartitioning with 125 partition(s) for table spark_catalog.p530_cimarronbp.attr_vals
24/06/07 13:03:36 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 32.0 KiB, free 413.8 MiB)
24/06/07 13:03:36 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.8 KiB, free 413.8 MiB)
24/06/07 13:03:36 INFO SparkContext: Created broadcast 4 from broadcast at SparkBatch.java:79
24/06/07 13:03:36 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 32.0 KiB, free 413.8 MiB)
24/06/07 13:03:36 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.8 KiB, free 413.8 MiB)
24/06/07 13:03:36 INFO SparkContext: Created broadcast 5 from broadcast at SparkBatch.java:79
24/06/07 13:03:36 INFO CodeGenerator: Code generated in 17.286796 ms
24/06/07 13:03:36 INFO SparkContext: Starting job: collect at ExecuteStatement.scala:85
24/06/07 13:03:36 INFO SQLOperationListener: Query [9ca66778-6ada-4e82-a67c-94ff0d1711be]: Job 2 started with 1 stages, 1 active jobs running
24/06/07 13:03:36 INFO SQLOperationListener: Query [9ca66778-6ada-4e82-a67c-94ff0d1711be]: Stage 2.0 started with 1 tasks, 1 active stages running
24/06/07 13:03:36 INFO SQLOperationListener: Finished stage: Stage(2, 0); Name: 'collect at ExecuteStatement.scala:85'; Status: succeeded; numTasks: 1; Took: 506 msec
24/06/07 13:03:36 INFO DAGScheduler: Job 2 finished: collect at ExecuteStatement.scala:85, took 0.513315 s
24/06/07 13:03:36 INFO StatsReportListener: task runtime:(count: 1, mean: 493.000000, stdev: 0.000000, max: 493.000000, min: 493.000000)
24/06/07 13:03:36 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%  90%      95%     100%
24/06/07 13:03:36 INFO StatsReportListener:     493.0 ms        493.0 ms        493.0 ms     493.0 ms 493.0 ms        493.0 ms        493.0 ms        493.0 ms        493.0 ms
24/06/07 13:03:36 INFO StatsReportListener: shuffle bytes written:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
24/06/07 13:03:36 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%  90%      95%     100%
24/06/07 13:03:36 INFO StatsReportListener:     0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B0.0 B    0.0 B   0.0 B
24/06/07 13:03:36 INFO StatsReportListener: fetch wait time:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
24/06/07 13:03:36 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%  90%      95%     100%
24/06/07 13:03:36 INFO StatsReportListener:     0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms0.0 ms  0.0 ms  0.0 ms
24/06/07 13:03:36 INFO StatsReportListener: remote bytes read:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
24/06/07 13:03:36 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%  90%      95%     100%
24/06/07 13:03:36 INFO StatsReportListener:     0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B0.0 B    0.0 B   0.0 B
24/06/07 13:03:36 INFO StatsReportListener: task result size:(count: 1, mean: 4787.000000, stdev: 0.000000, max: 4787.000000, min: 4787.000000)
24/06/07 13:03:36 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%  90%      95%     100%
24/06/07 13:03:36 INFO StatsReportListener:     4.7 KiB 4.7 KiB 4.7 KiB 4.7 KiB 4.7 KiB 4.7 KiB       4.7 KiB 4.7 KiB 4.7 KiB
24/06/07 13:03:36 INFO StatsReportListener: executor (non-fetch) time pct: (count: 1, mean: 92.292089, stdev: 0.000000, max: 92.292089, min: 92.292089)
24/06/07 13:03:36 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%  90%      95%     100%
24/06/07 13:03:36 INFO StatsReportListener:     92 %    92 %    92 %    92 %    92 %    92 % 92 %     92 %    92 %
24/06/07 13:03:36 INFO StatsReportListener: fetch wait time pct: (count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
24/06/07 13:03:36 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%  90%      95%     100%
24/06/07 13:03:36 INFO StatsReportListener:      0 %     0 %     0 %     0 %     0 %     0 %  0 %      0 %     0 %
24/06/07 13:03:36 INFO StatsReportListener: other time pct: (count: 1, mean: 7.707911, stdev: 0.000000, max: 7.707911, min: 7.707911)
24/06/07 13:03:36 INFO StatsReportListener:     0%      5%      10%     25%     50%     75%  90%      95%     100%
24/06/07 13:03:36 INFO StatsReportListener:      8 %     8 %     8 %     8 %     8 %     8 %  8 %      8 %     8 %
24/06/07 13:03:36 INFO SQLOperationListener: Query [9ca66778-6ada-4e82-a67c-94ff0d1711be]: Job 2 succeeded, 0 active jobs running
24/06/07 13:03:36 INFO CodeGenerator: Code generated in 19.276515 ms
24/06/07 13:03:36 INFO ExecuteStatement: Processing anonymous's query[9ca66778-6ada-4e82-a67c-94ff0d1711be]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.955 seconds
24/06/07 13:03:36 INFO ExecuteStatement: statementId=9ca66778-6ada-4e82-a67c-94ff0d1711be, operationRunTime=0.5 s, operationCpuTime=0.2 s
2024-06-07 21:03:36.667 INFO KyuubiSessionManager-exec-pool: Thread-437 org.apache.kyuubi.operation.ExecuteStatement: Query[9ca66778-6ada-4e82-a67c-94ff0d1711be] in FINISHED_STATE
2024-06-07 21:03:36.667 INFO KyuubiSessionManager-exec-pool: Thread-437 org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[9ca66778-6ada-4e82-a67c-94ff0d1711be]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.96 seconds
+-------------+----------------+-----------------------+-----------------------+
| serial_num  |   attr_name    |    pre_attr_value     |    post_attr_value    |
+-------------+----------------+-----------------------+-----------------------+
| WWZ40XKV    | STATE_NAME     | END                   | END                   |
| WWZ40XKV    | TEST_DATE      | 10/16/2023 00:37:13   | 10/16/2023 00:37:13   |
| WWZ40XKV    | FILE_TYPE      | NTR                   | NTR                   |
| WWZ40XKV    | PLUG_1_VENDOR  | NULL                  | NULL                  |
| WWZ40XKV    | PCBA_COMP_ID5  | 15172                 | 15172                 |
| WWZ40XKV    | CCVTEST        | NONE                  | NONE                  |
| WWZ40XKV    | PCBA_COMP_ID3  | 78810                 | 78810                 |
| WWZ40XKV    | PCBA_COMP_ID2  | 14050                 | 14050                 |
| WWZ40XKV    | PCBA_COMP_ID1  | 15229                 | 15229                 |
| WWZ40XKV    | FTFC_APC_DATE  | "0001-01-0100:00:00"  | "0001-01-0100:00:00"  |
+-------------+----------------+-----------------------+-----------------------+
10 rows selected (0.997 seconds)
0: jdbc:hive2://10.38.199.201:10009>

如有疑问,欢迎评论,或加vx沟通

这篇关于【kyuubi-spark】从0-1部署kyuubi集成spark执行spark sql到k8s读取iceberg的minio数据的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1042088

相关文章

Java中Springboot集成Kafka实现消息发送和接收功能

《Java中Springboot集成Kafka实现消息发送和接收功能》Kafka是一个高吞吐量的分布式发布-订阅消息系统,主要用于处理大规模数据流,它由生产者、消费者、主题、分区和代理等组件构成,Ka... 目录一、Kafka 简介二、Kafka 功能三、POM依赖四、配置文件五、生产者六、消费者一、Kaf

Python将大量遥感数据的值缩放指定倍数的方法(推荐)

《Python将大量遥感数据的值缩放指定倍数的方法(推荐)》本文介绍基于Python中的gdal模块,批量读取大量多波段遥感影像文件,分别对各波段数据加以数值处理,并将所得处理后数据保存为新的遥感影像... 本文介绍基于python中的gdal模块,批量读取大量多波段遥感影像文件,分别对各波段数据加以数值处

python管理工具之conda安装部署及使用详解

《python管理工具之conda安装部署及使用详解》这篇文章详细介绍了如何安装和使用conda来管理Python环境,它涵盖了从安装部署、镜像源配置到具体的conda使用方法,包括创建、激活、安装包... 目录pytpshheraerUhon管理工具:conda部署+使用一、安装部署1、 下载2、 安装3

Mysql虚拟列的使用场景

《Mysql虚拟列的使用场景》MySQL虚拟列是一种在查询时动态生成的特殊列,它不占用存储空间,可以提高查询效率和数据处理便利性,本文给大家介绍Mysql虚拟列的相关知识,感兴趣的朋友一起看看吧... 目录1. 介绍mysql虚拟列1.1 定义和作用1.2 虚拟列与普通列的区别2. MySQL虚拟列的类型2

使用MongoDB进行数据存储的操作流程

《使用MongoDB进行数据存储的操作流程》在现代应用开发中,数据存储是一个至关重要的部分,随着数据量的增大和复杂性的增加,传统的关系型数据库有时难以应对高并发和大数据量的处理需求,MongoDB作为... 目录什么是MongoDB?MongoDB的优势使用MongoDB进行数据存储1. 安装MongoDB

mysql数据库分区的使用

《mysql数据库分区的使用》MySQL分区技术通过将大表分割成多个较小片段,提高查询性能、管理效率和数据存储效率,本文就来介绍一下mysql数据库分区的使用,感兴趣的可以了解一下... 目录【一】分区的基本概念【1】物理存储与逻辑分割【2】查询性能提升【3】数据管理与维护【4】扩展性与并行处理【二】分区的

MySQL中时区参数time_zone解读

《MySQL中时区参数time_zone解读》MySQL时区参数time_zone用于控制系统函数和字段的DEFAULTCURRENT_TIMESTAMP属性,修改时区可能会影响timestamp类型... 目录前言1.时区参数影响2.如何设置3.字段类型选择总结前言mysql 时区参数 time_zon

Python MySQL如何通过Binlog获取变更记录恢复数据

《PythonMySQL如何通过Binlog获取变更记录恢复数据》本文介绍了如何使用Python和pymysqlreplication库通过MySQL的二进制日志(Binlog)获取数据库的变更记录... 目录python mysql通过Binlog获取变更记录恢复数据1.安装pymysqlreplicat

Linux使用dd命令来复制和转换数据的操作方法

《Linux使用dd命令来复制和转换数据的操作方法》Linux中的dd命令是一个功能强大的数据复制和转换实用程序,它以较低级别运行,通常用于创建可启动的USB驱动器、克隆磁盘和生成随机数据等任务,本文... 目录简介功能和能力语法常用选项示例用法基础用法创建可启动www.chinasem.cn的 USB 驱动

使用SQL语言查询多个Excel表格的操作方法

《使用SQL语言查询多个Excel表格的操作方法》本文介绍了如何使用SQL语言查询多个Excel表格,通过将所有Excel表格放入一个.xlsx文件中,并使用pandas和pandasql库进行读取和... 目录如何用SQL语言查询多个Excel表格如何使用sql查询excel内容1. 简介2. 实现思路3