pyspark.sql.types

2024-09-06 10:36

文章标签 sql pyspark database types

本文主要是介绍pyspark.sql.types，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

示例：

from datetime import datetime, date
from decimal import Decimal
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, ArrayType, BooleanType, \DateType, TimestampType, DecimalType, MapType# 初始化 SparkSession 对象
spark = SparkSession.builder \.appName("Example PySpark Script with Advanced Data Types") \.getOrCreate()# 定义数据结构
schema = StructType([StructField("name", StringType(), True),StructField("age", IntegerType(), True),StructField("weight", FloatType(), True),StructField("interests", ArrayType(StringType()), True),StructField("has_license", BooleanType(), True),StructField("birthday", DateType(), True),StructField("last_checkup", TimestampType(), True),StructField("balance", DecimalType(precision=10, scale=2), True),StructField("preferences", MapType(StringType(), StringType()), True)
])# 创建数据
data = [("Alice",34,65.5,["reading", "swimming"],True,date(1990, 1, 1),datetime(2023, 1, 1, 10, 0, 0),Decimal('12345.67'),{"theme": "dark", "language": "en"}),("Bob",45,80.2,["gaming", "traveling"],False,date(1979, 5, 15),datetime(2023, 5, 15, 12, 0, 0),Decimal('54321.01'),{"theme": "light", "language": "fr"}),("Cathy",29,55.0,["cooking", "painting"],True,date(1995, 8, 20),datetime(2023, 8, 20, 14, 0, 0),Decimal('7890.12'),{"theme": "dark", "language": "zh"})
]# 创建 DataFrame
df = spark.createDataFrame(data=data, schema=schema)# 查看 DataFrame 结构
df.printSchema()# 显示 DataFrame 内容
df.show(truncate=False)# 关闭 SparkSession
spark.stop()

root|-- name: string (nullable = true)|-- age: integer (nullable = true)|-- weight: float (nullable = true)|-- interests: array (nullable = true)|    |-- element: string (containsNull = true)|-- has_license: boolean (nullable = true)|-- birthday: date (nullable = true)|-- last_checkup: timestamp (nullable = true)|-- balance: decimal(10,2) (nullable = true)|-- preferences: map (nullable = true)|    |-- key: string|    |-- value: string (valueContainsNull = true)+-----+---+------+-------------------+-----------+----------+-------------------+--------+--------------------------------+
|name |age|weight|interests          |has_license|birthday  |last_checkup       |balance |preferences                     |
+-----+---+------+-------------------+-----------+----------+-------------------+--------+--------------------------------+
|Alice|34 |65.5  |[reading, swimming]|true       |1990-01-01|2023-01-01 10:00:00|12345.67|{language -> en, theme -> dark} |
|Bob  |45 |80.2  |[gaming, traveling]|false      |1979-05-15|2023-05-15 12:00:00|54321.01|{language -> fr, theme -> light}|
|Cathy|29 |55.0  |[cooking, painting]|true       |1995-08-20|2023-08-20 14:00:00|7890.12 |{language -> zh, theme -> dark} |
+-----+---+------+-------------------+-----------+----------+-------------------+--------+--------------------------------+

导入必要的模块：
- 从 pyspark.sql 导入 SparkSession。
- 从 pyspark.sql.functions 导入 to_date, to_timestamp。
- 从 pyspark.sql.types 导入 StructType, StructField, StringType, IntegerType, FloatType, ArrayType, BooleanType, DateType, TimestampType, DecimalType, MapType。
- 从 decimal 模块导入 Decimal 类。
- 从 datetime 模块导入 datetime, date 类。
初始化 SparkSession 对象：
- 创建一个名为 "Example PySpark Script with Advanced Data Types" 的 SparkSession。
定义数据结构：
- 使用 StructType 定义整个 DataFrame 的结构。
- 包括姓名（字符串）、年龄（整数）、体重（浮点数）、兴趣爱好（数组）、是否有驾照（布尔值）、生日（日期）、最近一次体检时间（时间戳）、银行账户余额（十进制数）和偏好设置（映射）。
创建数据：
- 创建一个包含示例数据的列表 data，并将日期和时间戳类型的字符串转换为 date 和 datetime 对象。
创建 DataFrame：
- 使用 spark.createDataFrame 方法创建 DataFrame，并指定其结构。
查看 DataFrame 结构：
- 使用 df.printSchema() 查看 DataFrame 的结构。
显示 DataFrame 内容：
- 使用 df.show(truncate=False) 显示 DataFrame 的内容。