麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录

2023-12-20 12:44

本文主要是介绍麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

1. 查看系统版本

uname -a

Linux localhost.localdomain 4.19.90-24.4.v2101.ky10.aarch64 #1 SMP Mon May 24 14:45:37 CST 2021 aarch64 aarch64 aarch64 GNU/Linux

2. 查看显卡

npu-smi info

前情提要:

官网给出支持昇腾910架构,刚好有300I资源,测试一下,给大家提供参考~~菜鸟一枚还需向各位大佬学习

https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-supporticon-default.png?t=N7T8https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-support主要测试参考该方法,暂时不做深入研究。 

暂时了解 该系统可以做简单的算法模型,主要是架构不同,需要重新写算法,可以安装pytorch、tensorflow和mindformers等。

查看具体参数:

uname -m && cat /etc/*release

 

aarch64
Kylin Linux Advanced Server release V10 (Sword)
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Sword)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Sword)"
ANSI_COLOR="0;31"Kylin Linux Advanced Server release V10 (Sword)

3. 配置docker,有两种配置方法,一种在官网下载,一种直接用命令yum 安装即可

4. 安装minconda ,注意安装arrch64版本即可

5.按照教程配置,这里不做详细介绍了,直接给出记录

6.没有使用教程启动docker的命令,使用以下命令。

sudo docker run -it --rm -u root --network=host --ipc=host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2  --device=/dev/davinci3 --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 --name=6bff46b104b8 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /etc/ascend_install.info:/etc/ascend_install.info  -v /root/qwen/Qwen-7B-Chat:/data/qwen/models/Qwen-7B-Chat -v /var/log/npu/:/usr/slog  qwenllm/qwen-mindspore /bin/bash

成功启动docker。

7.转换模型

python3 /data/qwen/mindformers/research/qwen/convert_weight.py

/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.return self._float_to_str(self.smallest_subnormal)
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.return self._float_to_str(self.smallest_subnormal)
Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
Flash attention will be disabled because it does NOT support fp32.
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|??????????????????????????????????????????????????????????????????????????????| 8/8 [00:03<00:00,  2.35it/s]
Parameter (name=transformer.wte.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.wte.weight->transformer.wte.embedding_weight
Parameter (name=transformer.h.0.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.ln_1.weight->transformer.layers.0.attention_norm.weight
Parameter (name=transformer.h.0.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_attn.weight->transformer.layers.0.attn.c_attn.weight
Parameter (name=transformer.h.0.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_attn.bias->transformer.layers.0.attn.c_attn.bias
Parameter (name=transformer.h.0.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_proj.weight->transformer.layers.0.attention.wo.weight
Parameter (name=transformer.h.0.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.ln_2.weight->transformer.layers.0.ffn_norm.weight
Parameter (name=transformer.h.0.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.w1.weight->transformer.layers.0.feed_forward.w1.weight
Parameter (name=transformer.h.0.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.w2.weight->transformer.layers.0.feed_forward.w3.weight
Parameter (name=transformer.h.0.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.c_proj.weight->transformer.layers.0.feed_forward.w2.weight
Parameter (name=transformer.h.1.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.ln_1.weight->transformer.layers.1.attention_norm.weight
Parameter (name=transformer.h.1.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_attn.weight->transformer.layers.1.attn.c_attn.weight
Parameter (name=transformer.h.1.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_attn.bias->transformer.layers.1.attn.c_attn.bias
Parameter (name=transformer.h.1.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_proj.weight->transformer.layers.1.attention.wo.weight
Parameter (name=transformer.h.1.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.ln_2.weight->transformer.layers.1.ffn_norm.weight
Parameter (name=transformer.h.1.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.w1.weight->transformer.layers.1.feed_forward.w1.weight
Parameter (name=transformer.h.1.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.w2.weight->transformer.layers.1.feed_forward.w3.weight
Parameter (name=transformer.h.1.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.c_proj.weight->transformer.layers.1.feed_forward.w2.weight
Parameter (name=transformer.h.2.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.ln_1.weight->transformer.layers.2.attention_norm.weight
Parameter (name=transformer.h.2.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_attn.weight->transformer.layers.2.attn.c_attn.weight
Parameter (name=transformer.h.2.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_attn.bias->transformer.layers.2.attn.c_attn.bias
Parameter (name=transformer.h.2.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_proj.weight->transformer.layers.2.attention.wo.weight
Parameter (name=transformer.h.2.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.ln_2.weight->transformer.layers.2.ffn_norm.weight
Parameter (name=transformer.h.2.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.w1.weight->transformer.layers.2.feed_forward.w1.weight
Parameter (name=transformer.h.2.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.w2.weight->transformer.layers.2.feed_forward.w3.weight
Parameter (name=transformer.h.2.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.c_proj.weight->transformer.layers.2.feed_forward.w2.weight
Parameter (name=transformer.h.3.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.ln_1.weight->transformer.layers.3.attention_norm.weight
Parameter (name=transformer.h.3.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_attn.weight->transformer.layers.3.attn.c_attn.weight
Parameter (name=transformer.h.3.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_attn.bias->transformer.layers.3.attn.c_attn.bias
Parameter (name=transformer.h.3.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_proj.weight->transformer.layers.3.attention.wo.weight
Parameter (name=transformer.h.3.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.ln_2.weight->transformer.layers.3.ffn_norm.weight
Parameter (name=transformer.h.3.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.w1.weight->transformer.layers.3.feed_forward.w1.weight
Parameter (name=transformer.h.3.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.w2.weight->transformer.layers.3.feed_forward.w3.weight
Parameter (name=transformer.h.3.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.c_proj.weight->transformer.layers.3.feed_forward.w2.weight
Parameter (name=transformer.h.4.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.ln_1.weight->transformer.layers.4.attention_norm.weight
Parameter (name=transformer.h.4.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_attn.weight->transformer.layers.4.attn.c_attn.weight
Parameter (name=transformer.h.4.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_attn.bias->transformer.layers.4.attn.c_attn.bias
Parameter (name=transformer.h.4.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_proj.weight->transformer.layers.4.attention.wo.weight
Parameter (name=transformer.h.4.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.ln_2.weight->transformer.layers.4.ffn_norm.weight
Parameter (name=transformer.h.4.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.w1.weight->transformer.layers.4.feed_forward.w1.weight
Parameter (name=transformer.h.4.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.w2.weight->transformer.layers.4.feed_forward.w3.weight
Parameter (name=transformer.h.4.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.c_proj.weight->transformer.layers.4.feed_forward.w2.weight
Parameter (name=transformer.h.5.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.ln_1.weight->transformer.layers.5.attention_norm.weight
Parameter (name=transformer.h.5.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_attn.weight->transformer.layers.5.attn.c_attn.weight
Parameter (name=transformer.h.5.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_attn.bias->transformer.layers.5.attn.c_attn.bias
Parameter (name=transformer.h.5.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_proj.weight->transformer.layers.5.attention.wo.weight
Parameter (name=transformer.h.5.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.ln_2.weight->transformer.layers.5.ffn_norm.weight
Parameter (name=transformer.h.5.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.w1.weight->transformer.layers.5.feed_forward.w1.weight
Parameter (name=transformer.h.5.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.w2.weight->transformer.layers.5.feed_forward.w3.weight
Parameter (name=transformer.h.5.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.c_proj.weight->transformer.layers.5.feed_forward.w2.weight
Parameter (name=transformer.h.6.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.ln_1.weight->transformer.layers.6.attention_norm.weight
Parameter (name=transformer.h.6.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_attn.weight->transformer.layers.6.attn.c_attn.weight
Parameter (name=transformer.h.6.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_attn.bias->transformer.layers.6.attn.c_attn.bias
Parameter (name=transformer.h.6.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_proj.weight->transformer.layers.6.attention.wo.weight
Parameter (name=transformer.h.6.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.ln_2.weight->transformer.layers.6.ffn_norm.weight
Parameter (name=transformer.h.6.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.w1.weight->transformer.layers.6.feed_forward.w1.weight
Parameter (name=transformer.h.6.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.w2.weight->transformer.layers.6.feed_forward.w3.weight
Parameter (name=transformer.h.6.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.c_proj.weight->transformer.layers.6.feed_forward.w2.weight
Parameter (name=transformer.h.7.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.ln_1.weight->transformer.layers.7.attention_norm.weight
Parameter (name=transformer.h.7.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_attn.weight->transformer.layers.7.attn.c_attn.weight
Parameter (name=transformer.h.7.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_attn.bias->transformer.layers.7.attn.c_attn.bias
Parameter (name=transformer.h.7.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_proj.weight->transformer.layers.7.attention.wo.weight
Parameter (name=transformer.h.7.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.ln_2.weight->transformer.layers.7.ffn_norm.weight
Parameter (name=transformer.h.7.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.w1.weight->transformer.layers.7.feed_forward.w1.weight
Parameter (name=transformer.h.7.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.w2.weight->transformer.layers.7.feed_forward.w3.weight
Parameter (name=transformer.h.7.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.c_proj.weight->transformer.layers.7.feed_forward.w2.weight
Parameter (name=transformer.h.8.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.ln_1.weight->transformer.layers.8.attention_norm.weight
Parameter (name=transformer.h.8.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_attn.weight->transformer.layers.8.attn.c_attn.weight
Parameter (name=transformer.h.8.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_attn.bias->transformer.layers.8.attn.c_attn.bias
Parameter (name=transformer.h.8.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_proj.weight->transformer.layers.8.attention.wo.weight
Parameter (name=transformer.h.8.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.ln_2.weight->transformer.layers.8.ffn_norm.weight
Parameter (name=transformer.h.8.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.w1.weight->transformer.layers.8.feed_forward.w1.weight
Parameter (name=transformer.h.8.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.w2.weight->transformer.layers.8.feed_forward.w3.weight
Parameter (name=transformer.h.8.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.c_proj.weight->transformer.layers.8.feed_forward.w2.weight
Parameter (name=transformer.h.9.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.ln_1.weight->transformer.layers.9.attention_norm.weight
Parameter (name=transformer.h.9.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_attn.weight->transformer.layers.9.attn.c_attn.weight
Parameter (name=transformer.h.9.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_attn.bias->transformer.layers.9.attn.c_attn.bias
Parameter (name=transformer.h.9.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_proj.weight->transformer.layers.9.attention.wo.weight
Parameter (name=transformer.h.9.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.ln_2.weight->transformer.layers.9.ffn_norm.weight
Parameter (name=transformer.h.9.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.w1.weight->transformer.layers.9.feed_forward.w1.weight
Parameter (name=transformer.h.9.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.w2.weight->transformer.layers.9.feed_forward.w3.weight
Parameter (name=transformer.h.9.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.c_proj.weight->transformer.layers.9.feed_forward.w2.weight
Parameter (name=transformer.h.10.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.ln_1.weight->transformer.layers.10.attention_norm.weight
Parameter (name=transformer.h.10.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_attn.weight->transformer.layers.10.attn.c_attn.weight
Parameter (name=transformer.h.10.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_attn.bias->transformer.layers.10.attn.c_attn.bias
Parameter (name=transformer.h.10.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_proj.weight->transformer.layers.10.attention.wo.weight
Parameter (name=transformer.h.10.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.ln_2.weight->transformer.layers.10.ffn_norm.weight
Parameter (name=transformer.h.10.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.w1.weight->transformer.layers.10.feed_forward.w1.weight
Parameter (name=transformer.h.10.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.w2.weight->transformer.layers.10.feed_forward.w3.weight
Parameter (name=transformer.h.10.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.c_proj.weight->transformer.layers.10.feed_forward.w2.weight
Parameter (name=transformer.h.11.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.ln_1.weight->transformer.layers.11.attention_norm.weight
Parameter (name=transformer.h.11.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_attn.weight->transformer.layers.11.attn.c_attn.weight
Parameter (name=transformer.h.11.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_attn.bias->transformer.layers.11.attn.c_attn.bias
Parameter (name=transformer.h.11.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_proj.weight->transformer.layers.11.attention.wo.weight
Parameter (name=transformer.h.11.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.ln_2.weight->transformer.layers.11.ffn_norm.weight
Parameter (name=transformer.h.11.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.w1.weight->transformer.layers.11.feed_forward.w1.weight
Parameter (name=transformer.h.11.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.w2.weight->transformer.layers.11.feed_forward.w3.weight
Parameter (name=transformer.h.11.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.c_proj.weight->transformer.layers.11.feed_forward.w2.weight
Parameter (name=transformer.h.12.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.ln_1.weight->transformer.layers.12.attention_norm.weight
Parameter (name=transformer.h.12.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_attn.weight->transformer.layers.12.attn.c_attn.weight
Parameter (name=transformer.h.12.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_attn.bias->transformer.layers.12.attn.c_attn.bias
Parameter (name=transformer.h.12.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_proj.weight->transformer.layers.12.attention.wo.weight
Parameter (name=transformer.h.12.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.ln_2.weight->transformer.layers.12.ffn_norm.weight
Parameter (name=transformer.h.12.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.w1.weight->transformer.layers.12.feed_forward.w1.weight
Parameter (name=transformer.h.12.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.w2.weight->transformer.layers.12.feed_forward.w3.weight
Parameter (name=transformer.h.12.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.c_proj.weight->transformer.layers.12.feed_forward.w2.weight
Parameter (name=transformer.h.13.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.ln_1.weight->transformer.layers.13.attention_norm.weight
Parameter (name=transformer.h.13.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_attn.weight->transformer.layers.13.attn.c_attn.weight
Parameter (name=transformer.h.13.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_attn.bias->transformer.layers.13.attn.c_attn.bias
Parameter (name=transformer.h.13.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_proj.weight->transformer.layers.13.attention.wo.weight
Parameter (name=transformer.h.13.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.ln_2.weight->transformer.layers.13.ffn_norm.weight
Parameter (name=transformer.h.13.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.w1.weight->transformer.layers.13.feed_forward.w1.weight
Parameter (name=transformer.h.13.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.w2.weight->transformer.layers.13.feed_forward.w3.weight
Parameter (name=transformer.h.13.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.c_proj.weight->transformer.layers.13.feed_forward.w2.weight
Parameter (name=transformer.h.14.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.ln_1.weight->transformer.layers.14.attention_norm.weight
Parameter (name=transformer.h.14.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_attn.weight->transformer.layers.14.attn.c_attn.weight
Parameter (name=transformer.h.14.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_attn.bias->transformer.layers.14.attn.c_attn.bias
Parameter (name=transformer.h.14.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_proj.weight->transformer.layers.14.attention.wo.weight
Parameter (name=transformer.h.14.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.ln_2.weight->transformer.layers.14.ffn_norm.weight
Parameter (name=transformer.h.14.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.w1.weight->transformer.layers.14.feed_forward.w1.weight
Parameter (name=transformer.h.14.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.w2.weight->transformer.layers.14.feed_forward.w3.weight
Parameter (name=transformer.h.14.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.c_proj.weight->transformer.layers.14.feed_forward.w2.weight
Parameter (name=transformer.h.15.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.ln_1.weight->transformer.layers.15.attention_norm.weight
Parameter (name=transformer.h.15.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_attn.weight->transformer.layers.15.attn.c_attn.weight
Parameter (name=transformer.h.15.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_attn.bias->transformer.layers.15.attn.c_attn.bias
Parameter (name=transformer.h.15.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_proj.weight->transformer.layers.15.attention.wo.weight
Parameter (name=transformer.h.15.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.ln_2.weight->transformer.layers.15.ffn_norm.weight
Parameter (name=transformer.h.15.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.w1.weight->transformer.layers.15.feed_forward.w1.weight
Parameter (name=transformer.h.15.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.w2.weight->transformer.layers.15.feed_forward.w3.weight
Parameter (name=transformer.h.15.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.c_proj.weight->transformer.layers.15.feed_forward.w2.weight
Parameter (name=transformer.h.16.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.ln_1.weight->transformer.layers.16.attention_norm.weight
Parameter (name=transformer.h.16.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_attn.weight->transformer.layers.16.attn.c_attn.weight
Parameter (name=transformer.h.16.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_attn.bias->transformer.layers.16.attn.c_attn.bias
Parameter (name=transformer.h.16.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_proj.weight->transformer.layers.16.attention.wo.weight
Parameter (name=transformer.h.16.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.ln_2.weight->transformer.layers.16.ffn_norm.weight
Parameter (name=transformer.h.16.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.w1.weight->transformer.layers.16.feed_forward.w1.weight
Parameter (name=transformer.h.16.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.w2.weight->transformer.layers.16.feed_forward.w3.weight
Parameter (name=transformer.h.16.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.c_proj.weight->transformer.layers.16.feed_forward.w2.weight
Parameter (name=transformer.h.17.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.ln_1.weight->transformer.layers.17.attention_norm.weight
Parameter (name=transformer.h.17.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_attn.weight->transformer.layers.17.attn.c_attn.weight
Parameter (name=transformer.h.17.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_attn.bias->transformer.layers.17.attn.c_attn.bias
Parameter (name=transformer.h.17.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_proj.weight->transformer.layers.17.attention.wo.weight
Parameter (name=transformer.h.17.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.ln_2.weight->transformer.layers.17.ffn_norm.weight
Parameter (name=transformer.h.17.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.w1.weight->transformer.layers.17.feed_forward.w1.weight
Parameter (name=transformer.h.17.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.w2.weight->transformer.layers.17.feed_forward.w3.weight
Parameter (name=transformer.h.17.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.c_proj.weight->transformer.layers.17.feed_forward.w2.weight
Parameter (name=transformer.h.18.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.ln_1.weight->transformer.layers.18.attention_norm.weight
Parameter (name=transformer.h.18.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_attn.weight->transformer.layers.18.attn.c_attn.weight
Parameter (name=transformer.h.18.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_attn.bias->transformer.layers.18.attn.c_attn.bias
Parameter (name=transformer.h.18.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_proj.weight->transformer.layers.18.attention.wo.weight
Parameter (name=transformer.h.18.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.ln_2.weight->transformer.layers.18.ffn_norm.weight
Parameter (name=transformer.h.18.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.w1.weight->transformer.layers.18.feed_forward.w1.weight
Parameter (name=transformer.h.18.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.w2.weight->transformer.layers.18.feed_forward.w3.weight
Parameter (name=transformer.h.18.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.c_proj.weight->transformer.layers.18.feed_forward.w2.weight
Parameter (name=transformer.h.19.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.ln_1.weight->transformer.layers.19.attention_norm.weight
Parameter (name=transformer.h.19.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_attn.weight->transformer.layers.19.attn.c_attn.weight
Parameter (name=transformer.h.19.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_attn.bias->transformer.layers.19.attn.c_attn.bias
Parameter (name=transformer.h.19.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_proj.weight->transformer.layers.19.attention.wo.weight
Parameter (name=transformer.h.19.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.ln_2.weight->transformer.layers.19.ffn_norm.weight
Parameter (name=transformer.h.19.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.w1.weight->transformer.layers.19.feed_forward.w1.weight
Parameter (name=transformer.h.19.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.w2.weight->transformer.layers.19.feed_forward.w3.weight
Parameter (name=transformer.h.19.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.c_proj.weight->transformer.layers.19.feed_forward.w2.weight
Parameter (name=transformer.h.20.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.ln_1.weight->transformer.layers.20.attention_norm.weight
Parameter (name=transformer.h.20.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_attn.weight->transformer.layers.20.attn.c_attn.weight
Parameter (name=transformer.h.20.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_attn.bias->transformer.layers.20.attn.c_attn.bias
Parameter (name=transformer.h.20.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_proj.weight->transformer.layers.20.attention.wo.weight
Parameter (name=transformer.h.20.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.ln_2.weight->transformer.layers.20.ffn_norm.weight
Parameter (name=transformer.h.20.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.w1.weight->transformer.layers.20.feed_forward.w1.weight
Parameter (name=transformer.h.20.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.w2.weight->transformer.layers.20.feed_forward.w3.weight
Parameter (name=transformer.h.20.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.c_proj.weight->transformer.layers.20.feed_forward.w2.weight
Parameter (name=transformer.h.21.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.ln_1.weight->transformer.layers.21.attention_norm.weight
Parameter (name=transformer.h.21.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_attn.weight->transformer.layers.21.attn.c_attn.weight
Parameter (name=transformer.h.21.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_attn.bias->transformer.layers.21.attn.c_attn.bias
Parameter (name=transformer.h.21.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_proj.weight->transformer.layers.21.attention.wo.weight
Parameter (name=transformer.h.21.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.ln_2.weight->transformer.layers.21.ffn_norm.weight
Parameter (name=transformer.h.21.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.w1.weight->transformer.layers.21.feed_forward.w1.weight
Parameter (name=transformer.h.21.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.w2.weight->transformer.layers.21.feed_forward.w3.weight
Parameter (name=transformer.h.21.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.c_proj.weight->transformer.layers.21.feed_forward.w2.weight
Parameter (name=transformer.h.22.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.ln_1.weight->transformer.layers.22.attention_norm.weight
Parameter (name=transformer.h.22.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_attn.weight->transformer.layers.22.attn.c_attn.weight
Parameter (name=transformer.h.22.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_attn.bias->transformer.layers.22.attn.c_attn.bias
Parameter (name=transformer.h.22.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_proj.weight->transformer.layers.22.attention.wo.weight
Parameter (name=transformer.h.22.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.ln_2.weight->transformer.layers.22.ffn_norm.weight
Parameter (name=transformer.h.22.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.w1.weight->transformer.layers.22.feed_forward.w1.weight
Parameter (name=transformer.h.22.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.w2.weight->transformer.layers.22.feed_forward.w3.weight
Parameter (name=transformer.h.22.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.c_proj.weight->transformer.layers.22.feed_forward.w2.weight
Parameter (name=transformer.h.23.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.ln_1.weight->transformer.layers.23.attention_norm.weight
Parameter (name=transformer.h.23.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_attn.weight->transformer.layers.23.attn.c_attn.weight
Parameter (name=transformer.h.23.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_attn.bias->transformer.layers.23.attn.c_attn.bias
Parameter (name=transformer.h.23.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_proj.weight->transformer.layers.23.attention.wo.weight
Parameter (name=transformer.h.23.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.ln_2.weight->transformer.layers.23.ffn_norm.weight
Parameter (name=transformer.h.23.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.w1.weight->transformer.layers.23.feed_forward.w1.weight
Parameter (name=transformer.h.23.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.w2.weight->transformer.layers.23.feed_forward.w3.weight
Parameter (name=transformer.h.23.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.c_proj.weight->transformer.layers.23.feed_forward.w2.weight
Parameter (name=transformer.h.24.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.ln_1.weight->transformer.layers.24.attention_norm.weight
Parameter (name=transformer.h.24.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_attn.weight->transformer.layers.24.attn.c_attn.weight
Parameter (name=transformer.h.24.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_attn.bias->transformer.layers.24.attn.c_attn.bias
Parameter (name=transformer.h.24.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_proj.weight->transformer.layers.24.attention.wo.weight
Parameter (name=transformer.h.24.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.ln_2.weight->transformer.layers.24.ffn_norm.weight
Parameter (name=transformer.h.24.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.w1.weight->transformer.layers.24.feed_forward.w1.weight
Parameter (name=transformer.h.24.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.w2.weight->transformer.layers.24.feed_forward.w3.weight
Parameter (name=transformer.h.24.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.c_proj.weight->transformer.layers.24.feed_forward.w2.weight
Parameter (name=transformer.h.25.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.ln_1.weight->transformer.layers.25.attention_norm.weight
Parameter (name=transformer.h.25.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_attn.weight->transformer.layers.25.attn.c_attn.weight
Parameter (name=transformer.h.25.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_attn.bias->transformer.layers.25.attn.c_attn.bias
Parameter (name=transformer.h.25.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_proj.weight->transformer.layers.25.attention.wo.weight
Parameter (name=transformer.h.25.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.ln_2.weight->transformer.layers.25.ffn_norm.weight
Parameter (name=transformer.h.25.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.w1.weight->transformer.layers.25.feed_forward.w1.weight
Parameter (name=transformer.h.25.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.w2.weight->transformer.layers.25.feed_forward.w3.weight
Parameter (name=transformer.h.25.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.c_proj.weight->transformer.layers.25.feed_forward.w2.weight
Parameter (name=transformer.h.26.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.ln_1.weight->transformer.layers.26.attention_norm.weight
Parameter (name=transformer.h.26.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_attn.weight->transformer.layers.26.attn.c_attn.weight
Parameter (name=transformer.h.26.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_attn.bias->transformer.layers.26.attn.c_attn.bias
Parameter (name=transformer.h.26.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_proj.weight->transformer.layers.26.attention.wo.weight
Parameter (name=transformer.h.26.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.ln_2.weight->transformer.layers.26.ffn_norm.weight
Parameter (name=transformer.h.26.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.w1.weight->transformer.layers.26.feed_forward.w1.weight
Parameter (name=transformer.h.26.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.w2.weight->transformer.layers.26.feed_forward.w3.weight
Parameter (name=transformer.h.26.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.c_proj.weight->transformer.layers.26.feed_forward.w2.weight
Parameter (name=transformer.h.27.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.ln_1.weight->transformer.layers.27.attention_norm.weight
Parameter (name=transformer.h.27.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_attn.weight->transformer.layers.27.attn.c_attn.weight
Parameter (name=transformer.h.27.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_attn.bias->transformer.layers.27.attn.c_attn.bias
Parameter (name=transformer.h.27.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_proj.weight->transformer.layers.27.attention.wo.weight
Parameter (name=transformer.h.27.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.ln_2.weight->transformer.layers.27.ffn_norm.weight
Parameter (name=transformer.h.27.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.w1.weight->transformer.layers.27.feed_forward.w1.weight
Parameter (name=transformer.h.27.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.w2.weight->transformer.layers.27.feed_forward.w3.weight
Parameter (name=transformer.h.27.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.c_proj.weight->transformer.layers.27.feed_forward.w2.weight
Parameter (name=transformer.h.28.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.ln_1.weight->transformer.layers.28.attention_norm.weight
Parameter (name=transformer.h.28.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_attn.weight->transformer.layers.28.attn.c_attn.weight
Parameter (name=transformer.h.28.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_attn.bias->transformer.layers.28.attn.c_attn.bias
Parameter (name=transformer.h.28.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_proj.weight->transformer.layers.28.attention.wo.weight
Parameter (name=transformer.h.28.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.ln_2.weight->transformer.layers.28.ffn_norm.weight
Parameter (name=transformer.h.28.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.w1.weight->transformer.layers.28.feed_forward.w1.weight
Parameter (name=transformer.h.28.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.w2.weight->transformer.layers.28.feed_forward.w3.weight
Parameter (name=transformer.h.28.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.c_proj.weight->transformer.layers.28.feed_forward.w2.weight
Parameter (name=transformer.h.29.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.ln_1.weight->transformer.layers.29.attention_norm.weight
Parameter (name=transformer.h.29.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_attn.weight->transformer.layers.29.attn.c_attn.weight
Parameter (name=transformer.h.29.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_attn.bias->transformer.layers.29.attn.c_attn.bias
Parameter (name=transformer.h.29.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_proj.weight->transformer.layers.29.attention.wo.weight
Parameter (name=transformer.h.29.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.ln_2.weight->transformer.layers.29.ffn_norm.weight
Parameter (name=transformer.h.29.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.w1.weight->transformer.layers.29.feed_forward.w1.weight
Parameter (name=transformer.h.29.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.w2.weight->transformer.layers.29.feed_forward.w3.weight
Parameter (name=transformer.h.29.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.c_proj.weight->transformer.layers.29.feed_forward.w2.weight
Parameter (name=transformer.h.30.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.ln_1.weight->transformer.layers.30.attention_norm.weight
Parameter (name=transformer.h.30.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_attn.weight->transformer.layers.30.attn.c_attn.weight
Parameter (name=transformer.h.30.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_attn.bias->transformer.layers.30.attn.c_attn.bias
Parameter (name=transformer.h.30.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_proj.weight->transformer.layers.30.attention.wo.weight
Parameter (name=transformer.h.30.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.ln_2.weight->transformer.layers.30.ffn_norm.weight
Parameter (name=transformer.h.30.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.w1.weight->transformer.layers.30.feed_forward.w1.weight
Parameter (name=transformer.h.30.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.w2.weight->transformer.layers.30.feed_forward.w3.weight
Parameter (name=transformer.h.30.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.c_proj.weight->transformer.layers.30.feed_forward.w2.weight
Parameter (name=transformer.h.31.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.ln_1.weight->transformer.layers.31.attention_norm.weight
Parameter (name=transformer.h.31.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_attn.weight->transformer.layers.31.attn.c_attn.weight
Parameter (name=transformer.h.31.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_attn.bias->transformer.layers.31.attn.c_attn.bias
Parameter (name=transformer.h.31.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_proj.weight->transformer.layers.31.attention.wo.weight
Parameter (name=transformer.h.31.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.ln_2.weight->transformer.layers.31.ffn_norm.weight
Parameter (name=transformer.h.31.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.w1.weight->transformer.layers.31.feed_forward.w1.weight
Parameter (name=transformer.h.31.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.w2.weight->transformer.layers.31.feed_forward.w3.weight
Parameter (name=transformer.h.31.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.c_proj.weight->transformer.layers.31.feed_forward.w2.weight
Parameter (name=transformer.ln_f.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
Parameter (name=lm_head.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
Saving converted weights to /data/qwen/models/Qwen-7B-Chat/qwen-7b-chat.ckpt...
Done

配置路径,启动推理脚本。

cd /data/qwen/mindformers/research/qwen

export PYTHONPATH=/data/qwen/mindformers:$PYTHONPATH

python3 infer_qwen.py

/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.return self._float_to_str(self.smallest_subnormal)
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.return self._float_to_str(self.smallest_subnormal)
[Warning]Can not find libascendalog.so
[Warning]Can not find libascendalog.so
Traceback (most recent call last):File "/data/qwen/mindformers/research/qwen/infer_qwen.py", line 4, in <module>from mindformers.trainer import TrainerFile "/data/qwen/mindformers/mindformers/__init__.py", line 17, in <module>from mindformers import core, auto_class, dataset, \File "/data/qwen/mindformers/mindformers/core/__init__.py", line 19, in <module>from .metric import build_metricFile "/data/qwen/mindformers/mindformers/core/metric/__init__.py", line 17, in <module>from .metric import *File "/data/qwen/mindformers/mindformers/core/metric/metric.py", line 37, in <module>from mindformers.models import BasicTokenizerFile "/data/qwen/mindformers/mindformers/models/__init__.py", line 21, in <module>from .blip2 import *File "/data/qwen/mindformers/mindformers/models/blip2/__init__.py", line 17, in <module>from .blip2_config import Blip2ConfigFile "/data/qwen/mindformers/mindformers/models/blip2/blip2_config.py", line 23, in <module>from mindformers.models.llama import LlamaConfigFile "/data/qwen/mindformers/mindformers/models/llama/__init__.py", line 18, in <module>from .llama import LlamaForCausalLM, LlamaForCausalLMWithLora, LlamaModelFile "/data/qwen/mindformers/mindformers/models/llama/llama.py", line 30, in <module>from mindspore.nn.layer.flash_attention import FlashAttentionFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/nn/layer/flash_attention.py", line 24, in <module>from mindspore.ops._op_impl._custom_op.flash_attention.flash_attention_impl import get_flash_attentionFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/__init__.py", line 17, in <module>from mindspore.ops._op_impl._custom_op.dsd_impl import dsd_matmulFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/dsd_impl.py", line 17, in <module>from te import tikFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/te/__init__.py", line 128, in <module>from tbe import tvmFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/__init__.py", line 44, in <module>import tvmFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/__init__.py", line 26, in <module>from ._ffi.base import TVMError, __version__, _RUNTIME_ONLYFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/__init__.py", line 28, in <module>from .base import register_errorFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 72, in <module>_LIB, _LIB_NAME = _load_lib()File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 52, in _load_liblib_path = libinfo.find_lib_path()File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/libinfo.py", line 147, in find_lib_pathraise RuntimeError(message)
RuntimeError: Cannot find the files.
List of candidates:
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm.so
/usr/local/Ascend/driver/libtvm.so
/data/qwen/mindformers/research/qwen/libtvm.so
/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm.so
/root/miniconda3/envs/mindspore2.2_py39/bin/libtvm.so
/root/miniconda3/condabin/libtvm.so
/usr/local/sbin/libtvm.so
/usr/local/bin/libtvm.so
/usr/sbin/libtvm.so
/usr/bin/libtvm.so
/usr/sbin/libtvm.so
/usr/bin/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm.so
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm_runtime.so
/usr/local/Ascend/driver/libtvm_runtime.so
/data/qwen/mindformers/research/qwen/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm_runtime.so
/root/miniconda3/envs/mindspore2.2_py39/bin/libtvm_runtime.so
/root/miniconda3/condabin/libtvm_runtime.so
/usr/local/sbin/libtvm_runtime.so
/usr/local/bin/libtvm_runtime.so
/usr/sbin/libtvm_runtime.so
/usr/bin/libtvm_runtime.so
/usr/sbin/libtvm_runtime.so
/usr/bin/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm_runtime.so

报错信息,应该是和配置芯片架构中缺少的文件,当前不做深入探究。

这篇关于麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/516092

相关文章

Oracle查询优化之高效实现仅查询前10条记录的方法与实践

《Oracle查询优化之高效实现仅查询前10条记录的方法与实践》:本文主要介绍Oracle查询优化之高效实现仅查询前10条记录的相关资料,包括使用ROWNUM、ROW_NUMBER()函数、FET... 目录1. 使用 ROWNUM 查询2. 使用 ROW_NUMBER() 函数3. 使用 FETCH FI

在C#中获取端口号与系统信息的高效实践

《在C#中获取端口号与系统信息的高效实践》在现代软件开发中,尤其是系统管理、运维、监控和性能优化等场景中,了解计算机硬件和网络的状态至关重要,C#作为一种广泛应用的编程语言,提供了丰富的API来帮助开... 目录引言1. 获取端口号信息1.1 获取活动的 TCP 和 UDP 连接说明:应用场景:2. 获取硬

JAVA系统中Spring Boot应用程序的配置文件application.yml使用详解

《JAVA系统中SpringBoot应用程序的配置文件application.yml使用详解》:本文主要介绍JAVA系统中SpringBoot应用程序的配置文件application.yml的... 目录文件路径文件内容解释1. Server 配置2. Spring 配置3. Logging 配置4. Ma

Golang的CSP模型简介(最新推荐)

《Golang的CSP模型简介(最新推荐)》Golang采用了CSP(CommunicatingSequentialProcesses,通信顺序进程)并发模型,通过goroutine和channe... 目录前言一、介绍1. 什么是 CSP 模型2. Goroutine3. Channel4. Channe

2.1/5.1和7.1声道系统有什么区别? 音频声道的专业知识科普

《2.1/5.1和7.1声道系统有什么区别?音频声道的专业知识科普》当设置环绕声系统时,会遇到2.1、5.1、7.1、7.1.2、9.1等数字,当一遍又一遍地看到它们时,可能想知道它们是什... 想要把智能电视自带的音响升级成专业级的家庭影院系统吗?那么你将面临一个重要的选择——使用 2.1、5.1 还是

Python MySQL如何通过Binlog获取变更记录恢复数据

《PythonMySQL如何通过Binlog获取变更记录恢复数据》本文介绍了如何使用Python和pymysqlreplication库通过MySQL的二进制日志(Binlog)获取数据库的变更记录... 目录python mysql通过Binlog获取变更记录恢复数据1.安装pymysqlreplicat

高效管理你的Linux系统: Debian操作系统常用命令指南

《高效管理你的Linux系统:Debian操作系统常用命令指南》在Debian操作系统中,了解和掌握常用命令对于提高工作效率和系统管理至关重要,本文将详细介绍Debian的常用命令,帮助读者更好地使... Debian是一个流行的linux发行版,它以其稳定性、强大的软件包管理和丰富的社区资源而闻名。在使用

Ubuntu系统怎么安装Warp? 新一代AI 终端神器安装使用方法

《Ubuntu系统怎么安装Warp?新一代AI终端神器安装使用方法》Warp是一款使用Rust开发的现代化AI终端工具,该怎么再Ubuntu系统中安装使用呢?下面我们就来看看详细教程... Warp Terminal 是一款使用 Rust 开发的现代化「AI 终端」工具。最初它只支持 MACOS,但在 20

windows系统下shutdown重启关机命令超详细教程

《windows系统下shutdown重启关机命令超详细教程》shutdown命令是一个强大的工具,允许你通过命令行快速完成关机、重启或注销操作,本文将为你详细解析shutdown命令的使用方法,并提... 目录一、shutdown 命令简介二、shutdown 命令的基本用法三、远程关机与重启四、实际应用

Debian如何查看系统版本? 7种轻松查看Debian版本信息的实用方法

《Debian如何查看系统版本?7种轻松查看Debian版本信息的实用方法》Debian是一个广泛使用的Linux发行版,用户有时需要查看其版本信息以进行系统管理、故障排除或兼容性检查,在Debia... 作为最受欢迎的 linux 发行版之一,Debian 的版本信息在日常使用和系统维护中起着至关重要的作