【Effective Python】3-Pythonic-了解 bytes、str 与 unicode 区别

本文主要是介绍【Effective Python】3-Pythonic-了解 bytes、str 与 unicode 区别，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

3-Pythonic-了解 bytes、str 与 unicode 区别

文章目录

- 3-Pythonic-了解 bytes、str 与 unicode 区别
- - 字符序列类型
  - 编码与解码
  - 使用情形（Python 3）
  - 可能的问题（Python 3）

字符序列类型

字符序列类型	Python 3	Python 2
8位值 (8个二进制)	bytes	str
Unicode 字符	str	unicode

Python 3 的 str 实例和 Python 2 的 unicode 实例都没有和特定的二进制编码形式相关联；
换句话说，Unicode字符转换为二进制数据有很多编码方式，其中最常见的是 UTF-8。

编码与解码

Unicode 字符 --> 二进制数据，称为编码 encode
二进制数据 --> Unicode 字符，称为解码 decode

编写 Python 程序的时候，一定要把编码和解码放在最外围来做。

使用情形（Python 3）

使用情形
- 需要将 Unicode 字符 --> UTF-8 编码后的二进制数据
- 需要操作没有特定编码形式的 Unicode字符
解码

def to_str(bytes_or_str):if isinstance(bytes_or_str, bytes):value = bytes_or_str.decode('utf-8')else:value = bytes_or_strreturn value # Instance of str

编码

def to_str(bytes_or_str):if isinstance(bytes_or_str, str):value = bytes_or_str.encode('utf-8')else:value = bytes_or_strreturn value # Instance of bytes

可能的问题（Python 3）

如果使用内置函数 open 获取了文件句柄（file handle）。那么请注意，该句柄默认采用 UTF-8 的编码格式来操作文件。

问题：如果向文件中随机写入一些二进制数据，下面代码可能会出错。

with open('/tmp/random.bin', 'w')as f:f.write(os.urandom(10))
>>>
TypeError: must be str, not bytes

原因：Python 3 给 open 函数添加了名为 encoding 的新参数，而这个参数的默认值就是 ‘utf-8’。
解决方案，用二进制写入模式（‘wb’）来开启待操作的文件。

with open('/tmp/random.bin', 'wb')as f:f.write(os.urandom(10))

读取数据也类似，用（‘rb’）来打开文件。

这篇关于【Effective Python】3-Pythonic-了解 bytes、str 与 unicode 区别的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

【Effective Python】3-Pythonic-了解 bytes、str 与 unicode 区别

3-Pythonic-了解 bytes、str 与 unicode 区别

文章目录

字符序列类型

编码与解码

使用情形（Python 3）

可能的问题（Python 3）

相关文章

Python 字典 (Dictionary)使用详解

Python自动化批量重命名与整理文件系统

JAVA覆盖和重写的区别及说明

使用Python构建一个高效的日志处理系统

python生成随机唯一id的几种实现方法

C++中全局变量和局部变量的区别

MyBatis中$与#的区别解析

使用Python删除Excel中的行列和单元格示例详解

Python通用唯一标识符模块uuid使用案例详解

Python办公自动化实战之打造智能邮件发送工具