第十四章：文件

本文主要是介绍第十四章：文件，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

文件的分类：

(1) 文本文件 .txt(通过编码保存成字节)

(2) 二进制文件 .mp3/ .wmv/ .doc

一、获取文件对象

1. 格式：

open([path]file, mode, )

file:文件或者文件夹，其中path包括绝对路径和相对路径。

绝对路径：从当前路径开始的路径

相对路径：从盘符，即根目录开始路径

mode:

r: 只读模式(默认)

w: 覆盖写

a: 追加写

t: 文本模式，以字符串格式显示(仅限于txt格式文件)(默认)

b：二进制模式，以字节格式显示

+：读取和显示

r+ 覆盖写

模式	r	r+	w	w+	a	a+
读	+	+		+		+
写		+	+	+	+	+
创建			+	+	+	+
覆盖			+	+
指针在开始	+	+	+	+
指针在结尾					+	+

二、文件的关闭

(1) 直接关闭 --- 不推荐

(2) try-finaly 中关闭 --- 需要声明全局变量，因为是同一等级；

迭代器，用掉即删掉了

(3) with打开和关闭

with open(path,mod) as file_object_name

file_object_name.write('dsafas')

> 多个用逗号隔开

三、文件的读取

1. read(size [= -1]):

--- 默认读取全部内容，如果有参数，读取前num*3个字节

> 文件对象是迭代器，当文件读取到末尾时候，无法访问任何内容

2. readline()

--- 返回文件的一行，保留'\n'

3. readlines()

--- 返回一个列表，读取多行

4. for

--- 如果文件过大，使用for循环读取文件对象进行迭代

for i in f:

print(i, end = '')

四、文件的写入

(1) write(content)

with open(path, 'r', encoding='utf-8') as f, open(aid, 'w', encoding='utf-8') as g:for i in f:print(i,end='')g.write(i)

(2) writelines(list)

--- 将list内容写入文件，一个元素一行

五、文件的定位

1. 文件的指针：

当r模式的时候，指针指向文件第一个字符的位置

当a、w模式时，指针在文件末尾的最后一个字符的下一个位置

(1) tell

--- 返回指针位置，即下一个要读/写的字符位置

>>> with open('c:/test.txt', 'wt') as f:f.write('1234567')print(f.tell())77
>>> with open('c:/test.txt', 'rt') as f:print(f.read(1))print(f.tell())11

(2) seek(offset, whence)

offset: 偏移量

whence:

from os import SEEK_SET, SEEK_CUR, SEEK_END

① 0，从文件头开始计算（SEEK_SET）

② 1, 从文件当前位置(SEEK_CUR)

③ 2，从文件尾开始计算(SEEK_END)

> 以字节格式(b)打开，支持任意offset,whence

> 以字符串格式(t)打开文件，如果whence=0，偏移量随意

如果whence = 1/2, 偏移量只能是0

六、文件的路径操作

1. os

import os

(1) mkr:

--- 父目录必须存在，子文件夹必须不存在，父目录不存在或者文件夹已存在报错

(2) makedirs

--- 创建文件夹，父目录不存在会同时创建父目录

(3) rmdir

--- 删除空目录

(4) removedirs

--- 删除空文件夹，如果父目录也为空文件夹，删除直到不为空位置

(5) remove

--- 删除文件，文件不存在不会报错

(6) rename(old, new)

--- 修改文件名，目录要求一致

(7) renames(old, new)

--- 修改文件名，可以实现【移动+重命名】操作,要记得写文件后缀

(8) getcwd()

--- 返回当前工作目录

>>> os.getcwd()'C:\\Users\\aura-bd\\AppData\\Local\\Programs\\Python\\Python35'

(9) walk

--- 遍历路径下的文件

    (1) dirpath:    string  路径

    (2) dirnames:   list    dirpath下的目录名字

    (3) filenames:  list    非目录文件名

# 下面使用Acrobat安装包作为实例，进行试验# walk进行递归查找，root是当前路径字符串，dirs是文件夹名列表，files是文件名列表
>>> for root, dirs, files in os.walk('D:\Acrobat'):for name in files:r = os.path.join(root, name)print(r)D:\Acrobat\Berime.htm
D:\Acrobat\Leame.htm
D:\Acrobat\LeesMij.htm
D:\Acrobat\Leggimi.htm
D:\Acrobat\LeiaMe.htm
D:\Acrobat\Liesmich.htm
D:\Acrobat\Lisezmoi.htm
D:\Acrobat\LueMinut.htm
D:\Acrobat\ReadMe.htm
D:\Acrobat\ReadMeCS.htm
D:\Acrobat\ReadMeCT.htm
D:\Acrobat\ReadMeCZE.htm
D:\Acrobat\ReadMeHUN.htm
D:\Acrobat\ReadMeJ.htm
D:\Acrobat\ReadMeK.htm
D:\Acrobat\ReadMeMEA.htm
D:\Acrobat\ReadMeMEH.htm
D:\Acrobat\ReadMePOL.htm
D:\Acrobat\ReadMeRUS.htm
D:\Acrobat\ReadMeSKY.htm
D:\Acrobat\ReadMeTUR.htm
D:\Acrobat\ReadMeUKR.htm
D:\Acrobat\Vigtigt.htm
D:\Acrobat\Viktig.htm
D:\Acrobat\Viktigt.htm
D:\Acrobat\Adobe Acrobat\ABCPY.INI
D:\Acrobat\Adobe Acrobat\AcrobatDCUpd1801120035.msp
D:\Acrobat\Adobe Acrobat\AcroPro.msi
D:\Acrobat\Adobe Acrobat\Data1.cab
D:\Acrobat\Adobe Acrobat\Setup.exe
D:\Acrobat\Adobe Acrobat\setup.ini
D:\Acrobat\Adobe Acrobat\WindowsInstaller-KB893803-v2-x86.exe
D:\Acrobat\Adobe Acrobat\Transforms\1025.mst
D:\Acrobat\Adobe Acrobat\Transforms\1028.mst
D:\Acrobat\Adobe Acrobat\Transforms\1029.mst
D:\Acrobat\Adobe Acrobat\Transforms\1030.mst
D:\Acrobat\Adobe Acrobat\Transforms\1031.mst
D:\Acrobat\Adobe Acrobat\Transforms\1033.mst
D:\Acrobat\Adobe Acrobat\Transforms\1034.mst
D:\Acrobat\Adobe Acrobat\Transforms\1035.mst
D:\Acrobat\Adobe Acrobat\Transforms\1036.mst
D:\Acrobat\Adobe Acrobat\Transforms\1037.mst
D:\Acrobat\Adobe Acrobat\Transforms\1038.mst
D:\Acrobat\Adobe Acrobat\Transforms\1040.mst
D:\Acrobat\Adobe Acrobat\Transforms\1041.mst
D:\Acrobat\Adobe Acrobat\Transforms\1042.mst
D:\Acrobat\Adobe Acrobat\Transforms\1043.mst
D:\Acrobat\Adobe Acrobat\Transforms\1044.mst
D:\Acrobat\Adobe Acrobat\Transforms\1045.mst
D:\Acrobat\Adobe Acrobat\Transforms\1046.mst
D:\Acrobat\Adobe Acrobat\Transforms\1049.mst
D:\Acrobat\Adobe Acrobat\Transforms\1051.mst
D:\Acrobat\Adobe Acrobat\Transforms\1053.mst
D:\Acrobat\Adobe Acrobat\Transforms\1055.mst
D:\Acrobat\Adobe Acrobat\Transforms\1058.mst
D:\Acrobat\Adobe Acrobat\Transforms\1060.mst
D:\Acrobat\Adobe Acrobat\Transforms\2052.mst
D:\Acrobat\Adobe Acrobat\Transforms\6156.mst
D:\Acrobat\Adobe Acrobat\VCRT_x64\cab1.cab
D:\Acrobat\Adobe Acrobat\VCRT_x64\vc_runtimeMinimum_x64.msi
D:\Acrobat\GB18030\ReadMe.htm
D:\Acrobat\GB18030\ReadMeCS.htm

(10) listdir(path)

--- 显示参数路径下的第一层目录名

# 获取当前路径下所有文件夹和文件名字
>>> os.listdir('D:\Acrobat')['Adobe Acrobat', 'Berime.htm', 'GB18030', 'Leame.htm', 'LeesMij.htm', 'Leggimi.htm', 
'LeiaMe.htm', 'Liesmich.htm', 'Lisezmoi.htm', 'LueMinut.htm', 'ReadMe.htm', 'ReadMeCS.htm','ReadMeCT.htm', 'ReadMeCZE.htm', 'ReadMeHUN.htm', 'ReadMeJ.htm', 'ReadMeK.htm', 
'ReadMeMEA.htm', 'ReadMeMEH.htm', 'ReadMePOL.htm', 'ReadMeRUS.htm', 'ReadMeSKY.htm', 
'ReadMeTUR.htm', 'ReadMeUKR.htm', 'Vigtigt.htm', 'Viktig.htm', 'Viktigt.htm']

2. os.path

(1) abspath(path)

--- 局对路径

(2) basepath

--- 返回路径中最后面的部分，以/分隔符为准

>>> os.path.basename('C:/abc/def/i.jpg')'i.jpg'

(3) commonpath(path1, path2...)

--- 返回最长公共路径

(4) exist(path)

--- 判断是否存在

(5) getatime/ getmtime()

---- 返回目录或者文件的最后访问时间，修改时间(Access / Modification)

(6) getsize

--- 获得文件大小，单位为bit,

(7) isdir

--- 判断是否存在文件或目录

(8) join(str1, str2, str3)

--- 拼接路径

从绝对路径开始，之前的胡去掉，空元素也会去掉

(9) split(path)

--- 拆分路径 dirname + basename

3. shutil模块

import shutill

(1) copy('file', 'new_path')

--- 把文件在新路径下复制一份文件，新路径必须已经存在

(2) copy2

--- 深度复制，包含元信息

(3) copytree

--- 复制目录树，包含文件和文件夹

七、序列化

--- ①传输快；

②解析任务交与客户端

7.1 CSV (Comma Separated Values)

--- 即逗号分隔值（也称字符分隔值，因为分隔符可以不是逗号），是一种常用的逗号分隔文本格式，用以存储表格数据，包括数字或者字符。

import csv# 使用数字和字符串的数字都可以
datas = [['name', 'age'],['Bob', 14],['Tom', 23],['Jerry', '18']]# newline 表示写操作的时候，调用write方法，默认会加\n
with open('C:/test.csv', 'w', newline = '') as f:# 需要先初始化writer对象writer = csv.writer(f)# 单行写入for row in datas:writer.writerow(row)# 多行写入# writer.writerows(datas)

7.2 json

--- (JavaScript Object Notation) 是一种轻量级的数据交换文本格式。

> 采用键值对映射模式，使用逗号分隔

对象 - {}

数组 - []

字符串 - ''

布尔 - true与false。

数值类型 - 整数与浮点数。

> python的数据类型和json几乎一致，python和json之间的转化很方便：

(1) python >>> json,使用json.dump方法

格式：

json.dump(data, file[, ensure_ascii=True])

# ensure_ascii如果修改为false,才看正常显示中文

# eg.
dic = {"bg": "green","title": {"data": ["data1", "data2", "data3", "data4"],"align": "左对齐"}}
import json
with open('c:test.json', 'wt') as f:json.dump(d, f, ens

(2) json >>> python,使用load方法

            with open('c:test.json', 'rw') as f:

                #读取json文件，返回字典类型字符串

                d = json.load(f)

7.3 序列化和反序列化

(1) 序列化：Python字典 >>> 字符串

--- json.dumps函数

格式：

temp = json.dumps(data, file, [ensure_ascii=True])

(2) 反序列化：字符串 >>> Python字典

--- json.loads函数

格式： temp = json.loads(str)

7.4 python字典和json类型之间的映射

>>> data = {'布尔':True,'空值':None,'浮点':1.2,'整数':1,'字符串':'asdf','列表':[1,23,4],'字典':{"one":1},}
>>> json.dumps(data, ensure_ascii=False)'{"列表": [1, 23, 4], "空值": null, "整数": 1, "浮点": 1.2, "布尔": true, "字典": {"one": 1}, "字符串": "asdf"}'

7.5 自定义序列化类型

> JSONEncoder: 处理普通类型的序列化

> 继承JSONEcoder,重写default方法

class Person:def __init__(self, name, age):self.name = nameself.age = age
p = Person()class My_Encoder(json.JSONEncoder):def default(self, o):# 变成字典if isinstance(o, Person):return {'name':o.name, 'age':o.age}	# 规定输出else:return super().default(o)
# 调用自定义的序列化方法
json.dumps(dic, cls=My_Encoder, ensure_ascii=False)

7.6 pickle(针对python)

class Person:def __init__(self, name, age):self.name = nameself.age = age
p = Person()
# 写入数据
with open('c:test.pikle', 'wb') as f:	#二进制格式pickle.dump(p, f)
# 读取数据
with open('c:test.pikle', 'wb') as f:pickle.load(f)

四、上下文管理器

1. 通过重写魔法方法

--- 上下文管理器需要定义进入和退出两个部分：

在with语句进入和退出时，分别执行

__enter__ :进入语句体返回值为with方法的文件对象

__exit__ :退出语句体返回值为None,抛出异常；返回值为True，镇压异常

# eg.class File(object):def __init__(self, filename, mode):self.filename = filenameself.mode = modedef __enter__(self):print("entering")self.f = open(self.filename, self.mode)return self.fdef __exit__(self, *args):print("will exit")self.f.close()with File('a.txt', 'w') as file:file.write('ssssss')

2. 通过装饰器实现上下文管理器

from contextlib import contextmanager
class MyResource:def query(self):print('query data')@contextmanager
def make_myresource():print('start to connect')yield MyResource()print('end connect')passwith make_myresource() as r:r.query()

被装饰器装饰的函数分为三部分:
with语句中的代码块执行前执行函数中yield之前代码
yield返回的内容复制给as之后的变量
with代码块执行完毕后执行函数中yield之后的代码


start to connect
query data
end connect

@contextlib

def gen():

print('enter方法，执行')

yield 'enter方法返回值，即with语句体的绑定对象'

print('exit方法，执行')