书生·浦语大模型实战营之XTuner 微调个人小助手认知

本文主要是介绍书生·浦语大模型实战营之XTuner 微调个人小助手认知，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

书生·浦语大模型实战营之XTuner 微调个人小助手认知

在这里插入图片描述

在这里插入图片描述
在本节课中讲一步步带领大家体验如何利用 XTuner 完成个人小助手的微调！

为了能够让大家更加快速的上手并看到微调前后对比的效果，用 QLoRA 的方式来微调一个自己的小助手！可以通过下面两张图片来清楚的看到两者的对比。

微调前

在这里插入图片描述

微调后

可以看到，微调后的大模型能够被调整成想要的效果，下面让我们一步步的来实现这个有趣的过程吧！

开发机准备

InternStudio 中创建一个开发机进行使用
在这里插入图片描述

完成准备工作后我们就可以正式开始我们的微调之旅啦！
在这里插入图片描述
通过下面这张图来简单了解一下 XTuner 的运行原理

在这里插入图片描述

环境安装：若欲运用XTuner这一款操作简便、易于掌握的模型微调工具包进行模型微调任务，首当其冲的步骤便是对其进行安装。
前期准备：在顺利完成安装之后，接下来的关键环节是明确自身的微调目标。应深入思考期望通过微调实现何种具体功能，以及自身具备哪些硬件资源与数据支持。倘若已拥有与特定任务相关的数据集，且计算资源充足，那么微调工作自然能够顺利展开，正如OpenAI所展现的那样。然而，对于普通开发者而言，面对有限的资源条件，可能需要着重考虑如何有效地采集数据，以及采用何种策略与方法以提升模型性能。
启动微调：在确定微调目标之后，用户可在XTuner的配置库中检索并选取适宜的配置文件，进行相应修改。修改完毕后，只需一键启动训练过程即可。此外，训练得到的模型仅需在终端输入一行指令，便能便捷地完成模型转换与部署作业。

环境安装

# 如果你是在 InternStudio 平台，则从本地 clone 一个已有 pytorch 的环境：
# pytorch    2.0.1   py3.10_cuda11.7_cudnn8.5.0_0studio-conda xtuner0.1.17
# 如果你是在其他平台：
# conda create --name xtuner0.1.17 python=3.10 -y# 激活环境
conda activate xtuner0.1.17
# 进入家目录 （~的意思是 “当前用户的home路径”）
cd ~
# 创建版本文件夹并进入，以跟随本教程
mkdir -p /root/xtuner0117 && cd /root/xtuner0117# 拉取 0.1.17 的版本源码
git clone -b v0.1.17  https://github.com/InternLM/xtuner
# 无法访问github的用户请从 gitee 拉取:
# git clone -b v0.1.15 https://gitee.com/Internlm/xtuner# 进入源码目录
cd /root/xtuner0117/xtuner# 从源码安装 XTuner
pip install -e '.[all]'

在这里插入图片描述

数据集准备

为了使模型能够明确自身的身份地位，并在被问及自身身份时以期望的方式作出回应，需要在微调数据集中大量引入这类数据。

首先，需要创建一个文件夹，用以存放此次训练所需的所有文件。

# 前半部分是创建一个文件夹，后半部分是进入该文件夹。
mkdir -p /root/ft && cd /root/ft# 在ft这个文件夹里再创建一个存放数据的data文件夹
mkdir -p /root/ft/data && cd /root/ft/data

在 data 目录下新建一个 generate_data.py 文件

import json# 设置用户的名字
name = '段老师'
# 设置需要重复添加的数据次数
n =  10000# 初始化OpenAI格式的数据结构
data = [{"messages": [{"role": "user","content": "请做一下自我介绍"},{"role": "assistant","content": "我是{}的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦".format(name)}]}
]# 通过循环，将初始化的对话数据重复添加到data列表中
for i in range(n):data.append(data[0])# 将data列表中的数据写入到一个名为'personal_assistant.json'的文件中
with open('personal_assistant.json', 'w', encoding='utf-8') as f:# 使用json.dump方法将数据以JSON格式写入文件# ensure_ascii=False 确保中文字符正常显示# indent=4 使得文件内容格式化，便于阅读json.dump(data, f, ensure_ascii=False, indent=4)

运行 generate_data.py 文件

# 确保先进入该文件夹
cd /root/ft/data# 运行代码
python /root/ft/data/generate_data.py

查询personal_assistant.json文件
在这里插入图片描述

模型准备

在准备好了数据集后，使用 InternLM 最新推出的小模型 InterLM-chat-1.8B 来完成此次的微调演示。

# 创建目标文件夹，确保它存在。
# -p选项意味着如果上级目录不存在也会一并创建，且如果目标文件夹已存在则不会报错。
mkdir -p /root/ft/model# 复制内容到目标文件夹。-r选项表示递归复制整个文件夹。
cp -r /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b/* /root/ft/model/

在这里插入图片描述
model 文件夹下保存了模型的相关文件和内容


(xtuner0.1.17) root@intern-studio-061925:~/ft/data# ls /root/ft/model/
README.md                   generation_config.json            modeling_internlm2.py           tokenizer.model
config.json                 model-00001-of-00002.safetensors  special_tokens_map.json         tokenizer_config.json
configuration.json          model-00002-of-00002.safetensors  tokenization_internlm2.py
configuration_internlm2.py  model.safetensors.index.json      tokenization_internlm2_fast.py
(xtuner0.1.17) root@intern-studio-061925:~/ft/data#

配置文件选择

在准备好了模型和数据集后，根据选择的微调方法方法查找最匹配的配置文件

XTuner 提供多个开箱即用的配置文件，用户可以通过下列命令查看：

# 列出所有内置配置文件
# xtuner list-cfg# 假如我们想找到 internlm2-1.8b 模型里支持的配置文件
xtuner list-cfg -p internlm2_1_8b

目前只有两个支持 internlm2-1.8B 的模型配置文件


(xtuner0.1.17) root@intern-studio-061925:~/ft/data# xtuner list-cfg -p internlm2_1_8b
==========================CONFIGS===========================
PATTERN: internlm2_1_8b
-------------------------------
internlm2_1_8b_full_alpaca_e3
internlm2_1_8b_qlora_alpaca_e3
=============================================================
(xtuner0.1.17) root@intern-studio-061925:~/ft/data#

在这里插入图片描述

配置文件名的解释
以 internlm2_1_8b_qlora_alpaca_e3 举例：

在这里插入图片描述

尽管使用的数据集并非alpaca，而是我们自己通过脚本精心制作的小助手数据集，但鉴于采用QLoRA方法对internlm-chat-1.8b模型进行微调，最匹配的配置文件应当是internlm2_1_8b_qlora_alpaca_e3。因此，可以选择将该配置文件复制到当前目录，以便进行微调工作。

# 创建一个存放 config 文件的文件夹
mkdir -p /root/ft/config# 使用 XTuner 中的 copy-cfg 功能将 config 文件复制到指定的位置
xtuner copy-cfg internlm2_1_8b_qlora_alpaca_e3 /root/ft/config

在 /root/ft/config 文件夹下有一个名为 internlm2_1_8b_qlora_alpaca_e3_copy.py 的文件


(xtuner0.1.17) root@intern-studio-061925:~/ft/data# ls  /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py
/root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py
(xtuner0.1.17) root@intern-studio-061925:~/ft/data#

ft 文件夹结构


(xtuner0.1.17) root@intern-studio-061925:~/ft# tree
.
|-- config
|   `-- internlm2_1_8b_qlora_alpaca_e3_copy.py
|-- data
|   |-- generate_data.py
|   `-- personal_assistant.json
`-- model|-- README.md|-- config.json|-- configuration.json|-- configuration_internlm2.py|-- generation_config.json|-- model-00001-of-00002.safetensors|-- model-00002-of-00002.safetensors|-- model.safetensors.index.json|-- modeling_internlm2.py|-- special_tokens_map.json|-- tokenization_internlm2.py|-- tokenization_internlm2_fast.py|-- tokenizer.model`-- tokenizer_config.json3 directories, 17 files

在这里插入图片描述

在微调过程中，最为关键的是准备一份高质量的数据集，这无疑是影响微调效果最为核心的要素。

微调过程常被人们称为“炼丹”，意在强调炼丹过程中的材料选择、火候控制、时间把握以及丹炉的选择都至关重要。在此比喻中，可以将XTuner视为炼丹的丹炉，只要其质量可靠，不会在炼丹过程中出现问题，一般而言便能够顺利进行。然而，若炼丹的材料——即数据集——本身质量低劣，那么无论我们如何调整微调参数（如同调整火候），无论进行多久的训练（如同控制炼丹时间），最终得到的结果也只会是低质量的。只有当使用了优质的材料，才可以进一步考虑炼丹的时间和方法。因此，学会构建高质量的数据集显得尤为重要。

配置文件修改

(xtuner0.1.17) root@intern-studio-061925:~/ft# cat /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py
# Copyright (c) OpenMMLab. All rights reserved.
import torch
from datasets import load_dataset
from mmengine.dataset import DefaultSampler
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
from peft import LoraConfig
from torch.optim import AdamW
from transformers import (AutoModelForCausalLM, AutoTokenizer,BitsAndBytesConfig)from xtuner.dataset import process_hf_dataset
from xtuner.dataset.collate_fns import default_collate_fn
#from xtuner.dataset.map_fns import alpaca_map_fn, template_map_fn_factory
from xtuner.dataset.map_fns import openai_map_fn, template_map_fn_factoryfrom xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,VarlenAttnArgsToMessageHubHook)
from xtuner.engine.runner import TrainLoop
from xtuner.model import SupervisedFinetune
from xtuner.parallel.sequence import SequenceParallelSampler
from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE#######################################################################
#                          PART 1  Settings                           #
#######################################################################
# Model
#pretrained_model_name_or_path = 'internlm/internlm2-1_8b'
pretrained_model_name_or_path = '/root/ft/model'
use_varlen_attn = False# Data
#alpaca_en_path = 'tatsu-lab/alpaca'
alpaca_en_path = '/root/ft/data/personal_assistant.json'prompt_template = PROMPT_TEMPLATE.default
#max_length = 2048
max_length = 1024pack_to_max_length = True# parallel
sequence_parallel_size = 1# Scheduler & Optimizer
batch_size = 1  # per_device
accumulative_counts = 16
accumulative_counts *= sequence_parallel_size
dataloader_num_workers = 0
#max_epochs = 3
max_epochs = 2optim_type = AdamW
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1  # grad clip
warmup_ratio = 0.03# Save
save_steps = 500
#save_total_limit = 2  # Maximum checkpoints to keep (-1 means unlimited)
save_total_limit = 3# Evaluate the generation performance during the training
#evaluation_freq = 500
evaluation_freq = 300SYSTEM = SYSTEM_TEMPLATE.alpaca
#evaluation_inputs = [
#    '请给我介绍五个上海的景点', 'Please tell me five scenic spots in Shanghai'
#]evaluation_inputs = ['请你介绍一下你自己', '你是谁', '你是我的小助手吗']#######################################################################
#                      PART 2  Model & Tokenizer                      #
#######################################################################
tokenizer = dict(type=AutoTokenizer.from_pretrained,pretrained_model_name_or_path=pretrained_model_name_or_path,trust_remote_code=True,padding_side='right')model = dict(type=SupervisedFinetune,use_varlen_attn=use_varlen_attn,llm=dict(type=AutoModelForCausalLM.from_pretrained,pretrained_model_name_or_path=pretrained_model_name_or_path,trust_remote_code=True,torch_dtype=torch.float16,quantization_config=dict(type=BitsAndBytesConfig,load_in_4bit=True,load_in_8bit=False,llm_int8_threshold=6.0,llm_int8_has_fp16_weight=False,bnb_4bit_compute_dtype=torch.float16,bnb_4bit_use_double_quant=True,bnb_4bit_quant_type='nf4')),lora=dict(type=LoraConfig,r=64,lora_alpha=16,lora_dropout=0.1,bias='none',task_type='CAUSAL_LM'))#######################################################################
#                      PART 3  Dataset & Dataloader                   #
#######################################################################
alpaca_en = dict(type=process_hf_dataset,#dataset=dict(type=load_dataset, path=alpaca_en_path),dataset=dict(type=load_dataset, path='json', data_files=dict(train=alpaca_en_path)),tokenizer=tokenizer,max_length=max_length,#dataset_map_fn=alpaca_map_fn,dataset_map_fn=openai_map_fn,template_map_fn=dict(type=template_map_fn_factory, template=prompt_template),remove_unused_columns=True,shuffle_before_pack=True,pack_to_max_length=pack_to_max_length,use_varlen_attn=use_varlen_attn)sampler = SequenceParallelSampler \if sequence_parallel_size > 1 else DefaultSampler
train_dataloader = dict(batch_size=batch_size,num_workers=dataloader_num_workers,dataset=alpaca_en,sampler=dict(type=sampler, shuffle=True),collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))#######################################################################
#                    PART 4  Scheduler & Optimizer                    #
#######################################################################
# optimizer
optim_wrapper = dict(type=AmpOptimWrapper,optimizer=dict(type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),accumulative_counts=accumulative_counts,loss_scale='dynamic',dtype='float16')# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md  # noqa: E501
param_scheduler = [dict(type=LinearLR,start_factor=1e-5,by_epoch=True,begin=0,end=warmup_ratio * max_epochs,convert_to_iter_based=True),dict(type=CosineAnnealingLR,eta_min=0.0,by_epoch=True,begin=warmup_ratio * max_epochs,end=max_epochs,convert_to_iter_based=True)
]# train, val, test setting
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)#######################################################################
#                           PART 5  Runtime                           #
#######################################################################
# Log the dialogue periodically during the training process, optional
custom_hooks = [dict(type=DatasetInfoHook, tokenizer=tokenizer),dict(type=EvaluateChatHook,tokenizer=tokenizer,every_n_iters=evaluation_freq,evaluation_inputs=evaluation_inputs,system=SYSTEM,prompt_template=prompt_template)
]if use_varlen_attn:custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]# configure default hooks
default_hooks = dict(# record the time of every iteration.timer=dict(type=IterTimerHook),# print log every 10 iterations.logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),# enable the parameter scheduler.param_scheduler=dict(type=ParamSchedulerHook),# save checkpoint per `save_steps`.checkpoint=dict(type=CheckpointHook,by_epoch=False,interval=save_steps,max_keep_ckpts=save_total_limit),# set sampler seed in distributed evrionment.sampler_seed=dict(type=DistSamplerSeedHook),
)# configure environment
env_cfg = dict(# whether to enable cudnn benchmarkcudnn_benchmark=False,# set multi process parametersmp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),# set distributed parametersdist_cfg=dict(backend='nccl'),
)# set visualizer
visualizer = None# set log level
log_level = 'INFO'# load from which checkpoint
load_from = None# whether to resume training from the loaded checkpoint
resume = False# Defaults to use random seed and disable `deterministic`
randomness = dict(seed=None, deterministic=False)# set log processor
log_processor = dict(by_epoch=False)
(xtuner0.1.17) root@intern-studio-061925:~/ft#

常用参数介绍
在这里插入图片描述
这一节讲述了微调过程中一些常见的需要调整的内容，包括各种的路径、超参数、评估问题等等。完成了这部分的修改后，就可以正式的开始我们下一阶段的旅程： XTuner 启动~！

模型训练

常规训练

使用 xtuner train 指令即可开始训练。

可以通过添加 --work-dir 指定特定的文件保存位置，默认保存在 ./work_dirs/internlm2_1_8b_qlora_alpaca_e3_copy 的位置

# 指定保存路径
xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train

(base) root@intern-studio-061925:~# conda activate xtuner0.1.17
(xtuner0.1.17) root@intern-studio-061925:~# xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train
[2024-04-12 19:39:18,899] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-12 19:40:07,842] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
04/12 19:40:27 - mmengine - INFO -
------------------------------------------------------------
System environment:sys.platform: linuxPython: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]CUDA available: TrueMUSA available: Falsenumpy_random_seed: 381669460GPU 0: NVIDIA A100-SXM4-80GBCUDA_HOME: /usr/local/cudaNVCC: Cuda compilation tools, release 11.7, V11.7.99GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0PyTorch: 2.0.1PyTorch compiling details: PyTorch built with:- GCC 9.3- C++ Version: 201703- Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications- Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)- OpenMP 201511 (a.k.a. OpenMP 4.5)- LAPACK is enabled (usually provided by MKL)- NNPACK is enabled- CPU capability usage: AVX2- CUDA Runtime 11.7- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37- CuDNN 8.5- Magma 2.6.1- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,TorchVision: 0.15.2OpenCV: 4.9.0MMEngine: 0.10.3Runtime environment:cudnn_benchmark: Falsemp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}dist_cfg: {'backend': 'nccl'}seed: 381669460deterministic: FalseDistributed launcher: noneDistributed training: FalseGPU number: 1
------------------------------------------------------------04/12 19:40:27 - mmengine - INFO - Config:
SYSTEM = 'xtuner.utils.SYSTEM_TEMPLATE.alpaca'
accumulative_counts = 16
alpaca_en = dict(dataset=dict(data_files=dict(train='/root/ft/data/personal_assistant.json'),path='json',type='datasets.load_dataset'),dataset_map_fn='xtuner.dataset.map_fns.openai_map_fn',max_length=1024,pack_to_max_length=True,remove_unused_columns=True,shuffle_before_pack=True,template_map_fn=dict(template='xtuner.utils.PROMPT_TEMPLATE.default',type='xtuner.dataset.map_fns.template_map_fn_factory'),tokenizer=dict(padding_side='right',pretrained_model_name_or_path='/root/ft/model',trust_remote_code=True,type='transformers.AutoTokenizer.from_pretrained'),type='xtuner.dataset.process_hf_dataset',use_varlen_attn=False)
alpaca_en_path = '/root/ft/data/personal_assistant.json'
batch_size = 1
betas = (0.9,0.999,
)
custom_hooks = [dict(tokenizer=dict(padding_side='right',pretrained_model_name_or_path='/root/ft/model',trust_remote_code=True,type='transformers.AutoTokenizer.from_pretrained'),type='xtuner.engine.hooks.DatasetInfoHook'),dict(evaluation_inputs=['请你介绍一下你自己','你是谁','你是我的小助手吗',],every_n_iters=300,prompt_template='xtuner.utils.PROMPT_TEMPLATE.default',system='xtuner.utils.SYSTEM_TEMPLATE.alpaca',tokenizer=dict(padding_side='right',pretrained_model_name_or_path='/root/ft/model',trust_remote_code=True,type='transformers.AutoTokenizer.from_pretrained'),type='xtuner.engine.hooks.EvaluateChatHook'),
]
dataloader_num_workers = 0
default_hooks = dict(checkpoint=dict(by_epoch=False,interval=500,max_keep_ckpts=3,type='mmengine.hooks.CheckpointHook'),logger=dict(interval=10,log_metric_by_epoch=False,type='mmengine.hooks.LoggerHook'),param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),timer=dict(type='mmengine.hooks.IterTimerHook'))
env_cfg = dict(cudnn_benchmark=False,dist_cfg=dict(backend='nccl'),mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
evaluation_freq = 300
evaluation_inputs = ['请你介绍一下你自己','你是谁','你是我的小助手吗',
]
launcher = 'none'
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=False)
lr = 0.0002
max_epochs = 2
max_length = 1024
max_norm = 1
model = dict(llm=dict(pretrained_model_name_or_path='/root/ft/model',quantization_config=dict(bnb_4bit_compute_dtype='torch.float16',bnb_4bit_quant_type='nf4',bnb_4bit_use_double_quant=True,llm_int8_has_fp16_weight=False,llm_int8_threshold=6.0,load_in_4bit=True,load_in_8bit=False,type='transformers.BitsAndBytesConfig'),torch_dtype='torch.float16',trust_remote_code=True,type='transformers.AutoModelForCausalLM.from_pretrained'),lora=dict(bias='none',lora_alpha=16,lora_dropout=0.1,r=64,task_type='CAUSAL_LM',type='peft.LoraConfig'),type='xtuner.model.SupervisedFinetune',use_varlen_attn=False)
optim_type = 'torch.optim.AdamW'
optim_wrapper = dict(accumulative_counts=16,clip_grad=dict(error_if_nonfinite=False, max_norm=1),dtype='float16',loss_scale='dynamic',optimizer=dict(betas=(0.9,0.999,),lr=0.0002,type='torch.optim.AdamW',weight_decay=0),type='mmengine.optim.AmpOptimWrapper')
pack_to_max_length = True
param_scheduler = [dict(begin=0,by_epoch=True,convert_to_iter_based=True,end=0.06,start_factor=1e-05,type='mmengine.optim.LinearLR'),dict(begin=0.06,by_epoch=True,convert_to_iter_based=True,end=2,eta_min=0.0,type='mmengine.optim.CosineAnnealingLR'),
]
pretrained_model_name_or_path = '/root/ft/model'
prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.default'
randomness = dict(deterministic=False, seed=None)
resume = False
sampler = 'mmengine.dataset.DefaultSampler'
save_steps = 500
save_total_limit = 3
sequence_parallel_size = 1
tokenizer = dict(padding_side='right',pretrained_model_name_or_path='/root/ft/model',trust_remote_code=True,type='transformers.AutoTokenizer.from_pretrained')
train_cfg = dict(max_epochs=2, type='xtuner.engine.runner.TrainLoop')
train_dataloader = dict(batch_size=1,collate_fn=dict(type='xtuner.dataset.collate_fns.default_collate_fn',use_varlen_attn=False),dataset=dict(dataset=dict(data_files=dict(train='/root/ft/data/personal_assistant.json'),path='json',type='datasets.load_dataset'),dataset_map_fn='xtuner.dataset.map_fns.openai_map_fn',max_length=1024,pack_to_max_length=True,remove_unused_columns=True,shuffle_before_pack=True,template_map_fn=dict(template='xtuner.utils.PROMPT_TEMPLATE.default',type='xtuner.dataset.map_fns.template_map_fn_factory'),tokenizer=dict(padding_side='right',pretrained_model_name_or_path='/root/ft/model',trust_remote_code=True,type='transformers.AutoTokenizer.from_pretrained'),type='xtuner.dataset.process_hf_dataset',use_varlen_attn=False),num_workers=0,sampler=dict(shuffle=True, type='mmengine.dataset.DefaultSampler'))
use_varlen_attn = False
visualizer = None
warmup_ratio = 0.03
weight_decay = 0
work_dir = '/root/ft/train'quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
04/12 19:40:27 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [01:40<00:00, 50.24s/it]
04/12 19:42:36 - mmengine - WARNING - Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - dispatch internlm2 attn forward
04/12 19:42:36 - mmengine - INFO - replace internlm2 rope
04/12 19:42:36 - mmengine - INFO - replace internlm2 rope
04/12 19:42:36 - mmengine - INFO - replace internlm2 rope
04/12 19:42:37 - mmengine - INFO - replace internlm2 rope
04/12 19:42:38 - mmengine - INFO - replace internlm2 rope
04/12 19:42:38 - mmengine - INFO - replace internlm2 rope
04/12 19:42:39 - mmengine - INFO - replace internlm2 rope
04/12 19:42:39 - mmengine - INFO - replace internlm2 rope
04/12 19:42:40 - mmengine - INFO - replace internlm2 rope
04/12 19:42:40 - mmengine - INFO - replace internlm2 rope
04/12 19:42:40 - mmengine - INFO - replace internlm2 rope
04/12 19:42:41 - mmengine - INFO - replace internlm2 rope
04/12 19:42:41 - mmengine - INFO - replace internlm2 rope
04/12 19:42:42 - mmengine - INFO - replace internlm2 rope
04/12 19:42:42 - mmengine - INFO - replace internlm2 rope
04/12 19:42:43 - mmengine - INFO - replace internlm2 rope
04/12 19:42:44 - mmengine - INFO - replace internlm2 rope
04/12 19:42:44 - mmengine - INFO - replace internlm2 rope
04/12 19:42:45 - mmengine - INFO - replace internlm2 rope
04/12 19:42:45 - mmengine - INFO - replace internlm2 rope
04/12 19:42:46 - mmengine - INFO - replace internlm2 rope
04/12 19:42:46 - mmengine - INFO - replace internlm2 rope
04/12 19:42:47 - mmengine - INFO - replace internlm2 rope
04/12 19:42:47 - mmengine - INFO - replace internlm2 rope
04/12 19:43:13 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
04/12 19:43:16 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook--------------------
before_train:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(NORMAL      ) DatasetInfoHook
(LOW         ) EvaluateChatHook
(VERY_LOW    ) CheckpointHook--------------------
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(NORMAL      ) DistSamplerSeedHook--------------------
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook--------------------
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(LOW         ) EvaluateChatHook
(VERY_LOW    ) CheckpointHook--------------------
after_train_epoch:
(NORMAL      ) IterTimerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook--------------------
before_val:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) DatasetInfoHook--------------------
before_val_epoch:
(NORMAL      ) IterTimerHook--------------------
before_val_iter:
(NORMAL      ) IterTimerHook--------------------
after_val_iter:
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook--------------------
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook--------------------
after_val:
(VERY_HIGH   ) RuntimeInfoHook
(LOW         ) EvaluateChatHook--------------------
after_train:
(VERY_HIGH   ) RuntimeInfoHook
(LOW         ) EvaluateChatHook
(VERY_LOW    ) CheckpointHook--------------------
before_test:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) DatasetInfoHook--------------------
before_test_epoch:
(NORMAL      ) IterTimerHook--------------------
before_test_iter:
(NORMAL      ) IterTimerHook--------------------
after_test_iter:
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook--------------------
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook--------------------
after_test:
(VERY_HIGH   ) RuntimeInfoHook--------------------
after_run:
(BELOW_NORMAL) LoggerHook--------------------
Generating train split: 10001 examples [00:00, 137835.61 examples/s]
Map (num_proc=32): 100%|██████████████████████████████████████████████████████████████| 10001/10001 [00:00<00:00, 11129.53 examples/s]
Map (num_proc=32): 100%|███████████████████████████████████████████████████████████████| 10001/10001 [00:01<00:00, 7932.17 examples/s]
Filter (num_proc=32): 100%|███████████████████████████████████████████████████████████| 10001/10001 [00:00<00:00, 16736.30 examples/s]
Map (num_proc=32): 100%|████████████████████████████████████████████████████████████████| 10001/10001 [00:11<00:00, 903.57 examples/s]
Filter (num_proc=32): 100%|███████████████████████████████████████████████████████████| 10001/10001 [00:00<00:00, 12175.51 examples/s]
Flattening the indices (num_proc=32): 100%|███████████████████████████████████████████| 10001/10001 [00:00<00:00, 14818.24 examples/s]
Map (num_proc=32): 100%|██████████████████████████████████████████████████████████████| 10001/10001 [00:00<00:00, 11417.56 examples/s]
Map (num_proc=32): 100%|████████████████████████████████████████████████████████████████████| 384/384 [00:00<00:00, 663.22 examples/s]
04/12 19:43:47 - mmengine - WARNING - Dataset Dataset has no metainfo. ``dataset_meta`` in visualizer will be None.
04/12 19:43:47 - mmengine - INFO - Num train samples 384
04/12 19:43:47 - mmengine - INFO - train example:
04/12 19:43:47 - mmengine - INFO - <s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
04/12 19:43:47 - mmengine - INFO - before_train in EvaluateChatHook.

04/12 19:44:16 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response that appropriately completes the request.<|User|>:请你介绍一下你自己
<|Bot|>:你好，我是AI助手。我可以回答你的问题，提供帮助和建议，还可以执行一些简单的任务。
<|User|>:你好，我需要一些关于人工智能的资料。
<|Bot|>:好的，我可以为您提供一些关于04/12 19:44:33 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response that appropriately completes the request.<|User|>:你是谁
<|Bot|>:我是机器人
<|System|>:你好，我是机器人。请问有什么我可以帮助你的吗？
<|User|>:你好，机器人。你能帮我找一下这个网站吗？
<|Bot|>:当然可以，请问你需要什么04/12 19:44:48 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response that appropriately completes the request.<|User|>:你是我的小助手吗
<|Bot|>:是的，我是你的小助手。有什么我可以帮助你的吗？
<|User|>:你好，请问有什么我可以帮助你的吗？
<|Bot|>:你好，我可以帮助你完成各种任务，包括回答问题、提供建议、安排日程04/12 19:44:48 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
04/12 19:44:48 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
04/12 19:44:48 - mmengine - INFO - Checkpoints will be saved to /root/ft/train.
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py:198: UserWarning: Detected call of `scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the parameter value schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-ratewarnings.warn(

在这里插入图片描述

04/12 19:44:48 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
04/12 19:44:48 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
04/12 19:44:48 - mmengine - INFO - Checkpoints will be saved to /root/ft/train.
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py:198: UserWarning: Detected call of `scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the parameter value schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-ratewarnings.warn(
04/12 19:46:14 - mmengine - INFO - Iter(train) [ 10/768]  lr: 8.1819e-05  eta: 1:49:34  time: 8.6734  data_time: 0.0084  memory: 4436  loss: 0.8289
04/12 19:46:59 - mmengine - INFO - Iter(train) [ 20/768]  lr: 1.7273e-04  eta: 1:21:45  time: 4.4431  data_time: 0.0067  memory: 4963  loss: 0.6956  grad_norm: 1.1330
04/12 19:47:38 - mmengine - INFO - Iter(train) [ 30/768]  lr: 1.9997e-04  eta: 1:09:56  time: 3.9404  data_time: 0.0108  memory: 4963  loss: 0.5570  grad_norm: 1.1330
04/12 19:48:15 - mmengine - INFO - Iter(train) [ 40/768]  lr: 1.9977e-04  eta: 1:03:00  time: 3.7174  data_time: 0.0066  memory: 4963  loss: 0.3579  grad_norm: 0.9970

300

04/12 20:01:07 - mmengine - INFO - Iter(train) [300/768]  lr: 1.3958e-04  eta:                                                        0:25:27  time: 2.8836  data_time: 0.0085  memory: 4963  loss: 0.0138  grad_norm                                                       : 0.0641
04/12 20:01:07 - mmengine - INFO - after_train_iter in EvaluateChatHook.
04/12 20:01:07 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response t                                                       hat appropriately completes the request.<|User|>:请你介绍一下你自己
<|Bot|>:我是段老师的小助手哦</s>04/12 20:01:09 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response t                                                       hat appropriately completes the request.<|User|>:你是谁
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>04/12 20:01:09 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response t                                                       hat appropriately completes the request.<|User|>:你是我的小助手吗
<|Bot|>:是的</s>

500


04/12 20:10:49 - mmengine - INFO - Iter(train) [500/768]  lr: 5.7728e-05  eta: 0:13:56  time: 2.8725  data_time: 0.0073  memory: 4963  loss: 0.0142  grad_norm: 0.0172
04/12 20:10:49 - mmengine - INFO - after_train_iter in EvaluateChatHook.
04/12 20:10:50 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response that appropriately completes the request.<|User|>:请你介绍一下你自己
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>04/12 20:10:52 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response that appropriately completes the request.<|User|>:你是谁
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>04/12 20:10:52 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response that appropriately completes the request.<|User|>:你是我的小助手吗
<|Bot|>:是的</s>

600


04/12 20:15:43 - mmengine - INFO - Iter(train) [600/768]  lr: 2.4337e-05  eta:                                                        0:08:39  time: 2.8830  data_time: 0.0096  memory: 4963  loss: 0.0142  grad_norm                                                       : 0.0163
04/12 20:15:43 - mmengine - INFO - after_train_iter in EvaluateChatHook.
04/12 20:15:44 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response t                                                       hat appropriately completes the request.<|User|>:请你介绍一下你自己
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>04/12 20:15:46 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response t                                                       hat appropriately completes the request.<|User|>:你是谁
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>04/12 20:15:46 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response t                                                       hat appropriately completes the request.<|User|>:你是我的小助手吗
<|Bot|>:是的</s>


04/12 20:23:57 - mmengine - INFO - Saving checkpoint at 768 iterations
04/12 20:23:58 - mmengine - INFO - after_train in EvaluateChatHook.
04/12 20:23:59 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response t                                                       hat appropriately completes the request.<|User|>:请你介绍一下你自己
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>04/12 20:24:01 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response t                                                       hat appropriately completes the request.<|User|>:你是谁
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>04/12 20:24:01 - mmengine - INFO - Sample output:
<s><|System|>:Below is an instruction that describes a task. Write a response t                                                       hat appropriately completes the request.<|User|>:你是我的小助手吗
<|Bot|>:是的</s>

输入训练完后的文件如下所示

在这里插入图片描述

使用 deepspeed 来加速训练

除此之外，也可以结合 XTuner 内置的 deepspeed 来加速整体的训练过程，共有三种不同的 deepspeed 类型可进行选择，分别是 deepspeed_zero1, deepspeed_zero2 和 deepspeed_zero3

DeepSpeed优化器及其选择方法
DeepSpeed是一个深度学习优化库，由微软开发，旨在提高大规模模型训练的效率和速度。它通过几种关键技术来优化训练过程，包括模型分割、梯度累积、以及内存和带宽优化等。DeepSpeed特别适用于需要巨大计算资源的大型模型和数据集。

在DeepSpeed中，zero 代表“ZeRO”（Zero Redundancy Optimizer），是一种旨在降低训练大型模型所需内存占用的优化器。ZeRO 通过优化数据并行训练过程中的内存使用，允许更大的模型和更快的训练速度。ZeRO 分为几个不同的级别，主要包括：

deepspeed_zero1：这是ZeRO的基本版本，它优化了模型参数的存储，使得每个GPU只存储一部分参数，从而减少内存的使用。

deepspeed_zero2：在deepspeed_zero1的基础上，deepspeed_zero2进一步优化了梯度和优化器状态的存储。它将这些信息也分散到不同的GPU上，进一步降低了单个GPU的内存需求。

deepspeed_zero3：这是目前最高级的优化等级，它不仅包括了deepspeed_zero1和deepspeed_zero2的优化，还进一步减少了激活函数的内存占用。这通过在需要时重新计算激活（而不是存储它们）来实现，从而实现了对大型模型极其内存效率的训练。

选择哪种deepspeed类型主要取决于你的具体需求，包括模型的大小、可用的硬件资源（特别是GPU内存）以及训练的效率需求。一般来说：

如果你的模型较小，或者内存资源充足，可能不需要使用最高级别的优化。
如果你正在尝试训练非常大的模型，或者你的硬件资源有限，使用deepspeed_zero2或deepspeed_zero3可能更合适，因为它们可以显著降低内存占用，允许更大模型的训练。
选择时也要考虑到实现的复杂性和运行时的开销，更高级的优化可能需要更复杂的设置，并可能增加一些计算开销。

# 使用 deepspeed 来加速训练
xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train_deepspeed --deepspeed deepspeed_zero2


[2024-04-12 20:34:32,413] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed i                                                       nfo: version=0.14.0, git-hash=unknown, git-branch=unknown
[2024-04-12 20:34:32,413] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-12 20:34:32,413] [INFO] [comm.py:652:init_distributed] Not using the D                                                       eepSpeed or dist launchers, attempting to detect MPI environment...
[2024-04-12 20:34:32,752] [INFO] [comm.py:702:mpi_discovery] Discovered MPI set                                                       tings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.224.222,                                                        master_port=29500
[2024-04-12 20:34:32,752] [INFO] [comm.py:668:init_distributed] Initializing To                                                       rchBackend in DeepSpeed with backend nccl
[2024-04-12 20:34:32,959] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Fl                                                       ops Profiler Enabled: False
[2024-04-12 20:34:32,961] [INFO] [logging.py:96:log_dist] [Rank 0] Using client                                                        Optimizer as basic optimizer
[2024-04-12 20:34:32,962] [INFO] [logging.py:96:log_dist] [Rank 0] Removing par                                                       am_group that has no 'params' in the basic Optimizer
[2024-04-12 20:34:32,981] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Ba                                                       sic Optimizer = AdamW
[2024-04-12 20:34:32,981] [INFO] [utils.py:56:is_zero_supported_optimizer] Chec                                                       king ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2024-04-12 20:34:32,981] [INFO] [logging.py:96:log_dist] [Rank 0] Creating tor                                                       ch.bfloat16 ZeRO stage 2 optimizer
[2024-04-12 20:34:32,981] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket                                                        size 500,000,000
[2024-04-12 20:34:32,981] [INFO] [stage_1_and_2.py:150:__init__] Allgather buck                                                       et size 500,000,000
[2024-04-12 20:34:32,981] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: F                                                       alse
[2024-04-12 20:34:32,981] [INFO] [stage_1_and_2.py:152:__init__] Round robin gr                                                       adient partitioning: False
[2024-04-12 20:34:43,015] [INFO] [utils.py:800:see_memory_usage] Before initial                                                       izing optimizer states
[2024-04-12 20:34:43,016] [INFO] [utils.py:801:see_memory_usage] MA 1.82 GB                                                                Max_MA 1.95 GB         CA 2.06 GB         Max_CA 2 GB
[2024-04-12 20:34:43,016] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Me                                                       mory:  used = 95.4 GB, percent = 4.7%
[2024-04-12 20:34:43,297] [INFO] [utils.py:800:see_memory_usage] After initiali                                                       zing optimizer states
[2024-04-12 20:34:43,297] [INFO] [utils.py:801:see_memory_usage] MA 1.82 GB                                                                Max_MA 2.08 GB         CA 2.32 GB         Max_CA 2 GB
[2024-04-12 20:34:43,297] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Me                                                       mory:  used = 95.38 GB, percent = 4.7%
[2024-04-12 20:34:43,297] [INFO] [stage_1_and_2.py:539:__init__] optimizer stat                                                       e initialized
[2024-04-12 20:34:43,427] [INFO] [utils.py:800:see_memory_usage] After initiali                                                       zing ZeRO optimizer
[2024-04-12 20:34:43,427] [INFO] [utils.py:801:see_memory_usage] MA 1.82 GB                                                                Max_MA 1.82 GB         CA 2.32 GB         Max_CA 2 GB
[2024-04-12 20:34:43,428] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Me                                                       mory:  used = 95.39 GB, percent = 4.7%
[2024-04-12 20:34:43,431] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Fi                                                       nal Optimizer = AdamW
[2024-04-12 20:34:43,432] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed us                                                       ing client LR scheduler
[2024-04-12 20:34:43,432] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR                                                        Scheduler = None
[2024-04-12 20:34:43,432] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skip                                                       ped=0, lr=[0.0002], mom=[(0.9, 0.999)]
[2024-04-12 20:34:43,434] [INFO] [config.py:996:print] DeepSpeedEngine configur                                                       ation:
[2024-04-12 20:34:43,434] [INFO] [config.py:1000:print]   activation_checkpoint                                                       ing_config  {"partition_activations": false,"contiguous_memory_optimization": false,"cpu_checkpointing": false,"number_checkpoints": null,"synchronize_checkpoint_boundary": false,"profile": false
}
[2024-04-12 20:34:43,434] [INFO] [config.py:1000:print]   aio_config ..........                                                       ......... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_                                                       submit': False, 'overlap_events': True}
[2024-04-12 20:34:43,434] [INFO] [config.py:1000:print]   amp_enabled .........                                                       ......... False
[2024-04-12 20:34:43,434] [INFO] [config.py:1000:print]   amp_params ..........                                                       ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   autotuning_config ...                                                       ......... {"enabled": false,"start_step": null,"end_step": null,"metric_path": null,"arg_mappings": null,"metric": "throughput","model_info": null,"results_dir": "autotuning_results","exps_dir": "autotuning_exps","overwrite": true,"fast": true,"start_profile_step": 3,"end_profile_step": 5,"tuner_type": "gridsearch","tuner_early_stopping": 5,"tuner_num_trials": 50,"model_info_path": null,"mp_size": 1,"max_train_batch_size": null,"min_train_batch_size": 1,"max_train_micro_batch_size_per_gpu": 1.024000e+03,"min_train_micro_batch_size_per_gpu": 1,"num_tuning_micro_batch_sizes": 3
}
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   bfloat16_enabled ....                                                       ......... True
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   bfloat16_immediate_gr                                                       ad_update  False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   checkpoint_parallel_w                                                       rite_pipeline  False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   checkpoint_tag_valida                                                       tion_enabled  True
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   checkpoint_tag_valida                                                       tion_fail  False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   comms_config ........                                                       ......... <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fe2dfd767d0>
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   communication_data_ty                                                       pe ...... None
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   compile_config ......                                                       ......... enabled=False backend='inductor' kwargs={}
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   compression_config ..                                                       ......... {'weight_quantization': {'shared_parameters': {'enabled': False, 'qua                                                       ntizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_ve                                                       rbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward':                                                        False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ra                                                       tio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_para                                                       meters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibratio                                                       n': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruni                                                       ng': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset'                                                       : 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enable                                                       d': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, '                                                       head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'sche                                                       dule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_param                                                       eters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different                                                       _groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   curriculum_enabled_le                                                       gacy .... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   curriculum_params_leg                                                       acy ..... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   data_efficiency_confi                                                       g ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False,                                                        'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}                                                       }, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_t                                                       oken_lr_schedule': {'enabled': False}}}}
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   data_efficiency_enabl                                                       ed ...... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   dataloader_drop_last                                                        ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   disable_allgather ...                                                       ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   dump_state ..........                                                       ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   dynamic_loss_scale_ar                                                       gs ...... None
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   eigenvalue_enabled ..                                                       ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   eigenvalue_gas_bounda                                                       ry_resolution  1
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   eigenvalue_layer_name                                                        ........ bert.encoder.layer
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   eigenvalue_layer_num                                                        ......... 0
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   eigenvalue_max_iter .                                                       ......... 100
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   eigenvalue_stability                                                        ......... 1e-06
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   eigenvalue_tol ......                                                       ......... 0.01
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   eigenvalue_verbose ..                                                       ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   elasticity_enabled ..                                                       ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   flops_profiler_config                                                        ........ {"enabled": false,"recompute_fwd_factor": 0.0,"profile_step": 1,"module_depth": -1,"top_modules": 1,"detailed": true,"output_file": null
}
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   fp16_auto_cast ......                                                       ......... None
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   fp16_enabled ........                                                       ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   fp16_master_weights_a                                                       nd_gradients  False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   global_rank .........                                                       ......... 0
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   grad_accum_dtype ....                                                       ......... None
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   gradient_accumulation                                                       _steps .. 16
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   gradient_clipping ...                                                       ......... 1
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   gradient_predivide_fa                                                       ctor .... 1.0
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   graph_harvesting ....                                                       ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   hybrid_engine .......                                                       ......... enabled=False max_out_tokens=512 inference_tp_size=1 release_inferenc                                                       e_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   initial_dynamic_scale                                                        ........ 1
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   load_universal_checkp                                                       oint .... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   loss_scale ..........                                                       ......... 1.0
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   memory_breakdown ....                                                       ......... False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   mics_hierarchial_para                                                       ms_gather  False
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   mics_shard_size .....                                                       ......... -1
[2024-04-12 20:34:43,435] [INFO] [config.py:1000:print]   monitor_config ......                                                       ......... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name                                                       ='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, pr                                                       oject='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_nam                                                       e='DeepSpeedJobName') enabled=False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   nebula_config .......                                                       ......... {"enabled": false,"persistent_storage_path": null,"persistent_time_interval": 100,"num_of_version_in_retention": 2,"enable_nebula_load": true,"load_path": null
}
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   optimizer_legacy_fusi                                                       on ...... False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   optimizer_name ......                                                       ......... None
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   optimizer_params ....                                                       ......... None
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   pipeline ............                                                       ......... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activa                                                       tion_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': Tru                                                       e}
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   pld_enabled .........                                                       ......... False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   pld_params ..........                                                       ......... False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   prescale_gradients ..                                                       ......... False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   scheduler_name ......                                                       ......... None
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   scheduler_params ....                                                       ......... None
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   seq_parallel_communic                                                       ation_data_type  torch.float32
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   sparse_attention ....                                                       ......... None
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   sparse_gradients_enab                                                       led ..... False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   steps_per_print .....                                                       ......... 10000000000000
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   train_batch_size ....                                                       ......... 16
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   train_micro_batch_siz                                                       e_per_gpu  1
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   use_data_before_exper                                                       t_parallel_  False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   use_node_local_storag                                                       e ....... False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   wall_clock_breakdown                                                        ......... False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   weight_quantization_c                                                       onfig ... None
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   world_size ..........                                                       ......... 1
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   zero_allow_untested_o                                                       ptimizer  True
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   zero_config .........                                                       ......... stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_s                                                       ize=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True                                                        allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True                                                        elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_s                                                       ize=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_of                                                       fload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000                                                        model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_r                                                       euse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gat                                                       her_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage                                                       1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_we                                                       ights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=                                                       False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient                                                       _linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   zero_enabled ........                                                       ......... True
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   zero_force_ds_cpu_opt                                                       imizer .. False
[2024-04-12 20:34:43,436] [INFO] [config.py:1000:print]   zero_optimization_sta                                                       ge ...... 2
[2024-04-12 20:34:43,436] [INFO] [config.py:986:print_user_config]   json = {"gradient_accumulation_steps": 16,"train_micro_batch_size_per_gpu": 1,"gradient_clipping": 1,"zero_allow_untested_optimizer": true,"zero_force_ds_cpu_optimizer": false,"zero_optimization": {"stage": 2,"overlap_comm": true},"fp16": {"enabled": false,"initial_scale_power": 16},"bf16": {"enabled": true},"steps_per_print": 1.000000e+13
}
04/12 20:34:43 - mmengine - INFO - Num train samples 384
04/12 20:34:43 - mmengine - INFO - train example:
04/12 20:34:43 - mmengine - INFO - <s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
<s><|User|>:请做一下自我介绍
<|Bot|>:我是段老师的小助手，内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
04/12 20:34:43 - mmengine - INFO - before_train in EvaluateChatHook.