Internlm_xcomposer2模型结构解读

2024-06-10 18:52

本文主要是介绍Internlm_xcomposer2模型结构解读,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Internlm_xcomposer2模型结构解读

项目地址

Internlm_xcomposer2模型总体结构

<class 'transformers_modules.internlm-xcomposer2-4khd-7b.modeling_internlm_xcomposer2.InternLMXComposer2ForCausalLM'>
InternLMXComposer2ForCausalLM((model): InternLM2Model((tok_embeddings): Embedding(92544, 4096, padding_idx=2)(layers): ModuleList((0-31): 32 x InternLM2DecoderLayer((attention): InternLM2FlashAttention2((wqkv): PLoRA(in_features=4096, out_features=6144, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=4096, out_features=8, bias=False)(Plora_B): Linear(in_features=8, out_features=6144, bias=False))(wo): PLoRA(in_features=4096, out_features=4096, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=4096, out_features=256, bias=False)(Plora_B): Linear(in_features=256, out_features=4096, bias=False))(rotary_emb): InternLM2RotaryEmbedding())(feed_forward): InternLM2MLP((w1): PLoRA(in_features=4096, out_features=14336, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=4096, out_features=256, bias=False)(Plora_B): Linear(in_features=256, out_features=14336, bias=False))(w3): PLoRA(in_features=4096, out_features=14336, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=4096, out_features=256, bias=False)(Plora_B): Linear(in_features=256, out_features=14336, bias=False))(w2): PLoRA(in_features=14336, out_features=4096, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=14336, out_features=256, bias=False)(Plora_B): Linear(in_features=256, out_features=4096, bias=False))(act_fn): SiLUActivation())(attention_norm): InternLM2RMSNorm()(ffn_norm): InternLM2RMSNorm()))(norm): InternLM2RMSNorm())(output): Linear(in_features=4096, out_features=92544, bias=False)(vit): CLIPVisionTower((vision_tower): CLIPVisionModel((vision_model): CLIPVisionTransformer((embeddings): CLIPVisionEmbeddings((patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)(position_embedding): Embedding(577, 1024))(pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)(encoder): CLIPEncoder((layers): ModuleList((0-23): 24 x CLIPEncoderLayer((self_attn): CLIPAttention((k_proj): Linear(in_features=1024, out_features=1024, bias=True)(v_proj): Linear(in_features=1024, out_features=1024, bias=True)(q_proj): Linear(in_features=1024, out_features=1024, bias=True)(out_proj): Linear(in_features=1024, out_features=1024, bias=True))(layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)(mlp): CLIPMLP((activation_fn): QuickGELUActivation()(fc1): Linear(in_features=1024, out_features=4096, bias=True)(fc2): Linear(in_features=4096, out_features=1024, bias=True))(layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True))))(post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True))))(vision_proj): Sequential((0): Linear(in_features=4096, out_features=4096, bias=True)(1): GELU(approximate='none')(2): Linear(in_features=4096, out_features=4096, bias=True))
)

Internlm_xcomposer2模型详细结构(下面是从输入到输出的顺序输出的每层的参数量)


plora_glb_GN: torch.Size([1, 1, 4096])
plora_sub_GN: torch.Size([1, 1, 1, 4096])
model.tok_embeddings.weight: torch.Size([92544, 4096])#主体层(接受文本和后面vit图像的输入)
model.layers.0.attention.wqkv.weight: torch.Size([6144, 4096])
model.layers.0.attention.wqkv.Plora_A.weight: torch.Size([8, 4096])
model.layers.0.attention.wqkv.Plora_B.weight: torch.Size([6144, 8])
model.layers.0.attention.wo.weight: torch.Size([4096, 4096])
model.layers.0.attention.wo.Plora_A.weight: torch.Size([256, 4096])
model.layers.0.attention.wo.Plora_B.weight: torch.Size([4096, 256])
model.layers.0.feed_forward.w1.weight: torch.Size([14336, 4096])
model.layers.0.feed_forward.w1.Plora_A.weight: torch.Size([256, 4096])
model.layers.0.feed_forward.w1.Plora_B.weight: torch.Size([14336, 256])
model.layers.0.feed_forward.w3.weight: torch.Size([14336, 4096])
model.layers.0.feed_forward.w3.Plora_A.weight: torch.Size([256, 4096])
model.layers.0.feed_forward.w3.Plora_B.weight: torch.Size([14336, 256])
model.layers.0.feed_forward.w2.weight: torch.Size([4096, 14336])
model.layers.0.feed_forward.w2.Plora_A.weight: torch.Size([256, 14336])
model.layers.0.feed_forward.w2.Plora_B.weight: torch.Size([4096, 256])
model.layers.0.attention_norm.weight: torch.Size([4096])
model.layers.0.ffn_norm.weight: torch.Size([4096])...32个model.layers.层,这里省略model.layers.1----model.layers.30model.layers.31.attention.wqkv.weight: torch.Size([6144, 4096])
model.layers.31.attention.wqkv.Plora_A.weight: torch.Size([8, 4096])
model.layers.31.attention.wqkv.Plora_B.weight: torch.Size([6144, 8])
model.layers.31.attention.wo.weight: torch.Size([4096, 4096])
model.layers.31.attention.wo.Plora_A.weight: torch.Size([256, 4096])
model.layers.31.attention.wo.Plora_B.weight: torch.Size([4096, 256])
model.layers.31.feed_forward.w1.weight: torch.Size([14336, 4096])
model.layers.31.feed_forward.w1.Plora_A.weight: torch.Size([256, 4096])
model.layers.31.feed_forward.w1.Plora_B.weight: torch.Size([14336, 256])
model.layers.31.feed_forward.w3.weight: torch.Size([14336, 4096])
model.layers.31.feed_forward.w3.Plora_A.weight: torch.Size([256, 4096])
model.layers.31.feed_forward.w3.Plora_B.weight: torch.Size([14336, 256])
model.layers.31.feed_forward.w2.weight: torch.Size([4096, 14336])
model.layers.31.feed_forward.w2.Plora_A.weight: torch.Size([256, 14336])
model.layers.31.feed_forward.w2.Plora_B.weight: torch.Size([4096, 256])
model.layers.31.attention_norm.weight: torch.Size([4096])
model.layers.31.ffn_norm.weight: torch.Size([4096])#输出层
model.norm.weight: torch.Size([4096])
output.weight: torch.Size([92544, 4096])vit.vision_tower.vision_model.embeddings.class_embedding: torch.Size([1024])
vit.vision_tower.vision_model.embeddings.patch_embedding.weight: torch.Size([1024, 3, 14, 14])
vit.vision_tower.vision_model.embeddings.position_embedding.weight: torch.Size([577, 1024])
vit.vision_tower.vision_model.pre_layrnorm.weight: torch.Size([1024])
vit.vision_tower.vision_model.pre_layrnorm.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight: torch.Size([4096, 1024])
vit.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias: torch.Size([4096])
vit.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight: torch.Size([1024, 4096])
vit.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias: torch.Size([1024])...24个vit.vision_tower.vision_model.encoder.layers层,这里省略vit.vision_tower.vision_model.encoder.layers.1----vit.vision_tower.vision_model.encoder.layers.22vit.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight: torch.Size([4096, 1024])
vit.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias: torch.Size([4096])
vit.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight: torch.Size([1024, 4096])
vit.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias: torch.Size([1024])
vit.vision_tower.vision_model.post_layernorm.weight: torch.Size([1024])
vit.vision_tower.vision_model.post_layernorm.bias: torch.Size([1024])#对齐到model.layers.0层的vision_proj
vision_proj.0.weight: torch.Size([4096, 4096])
vision_proj.0.bias: torch.Size([4096])
vision_proj.2.weight: torch.Size([4096, 4096])
vision_proj.2.bias: torch.Size([4096])

这篇关于Internlm_xcomposer2模型结构解读的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1048985

相关文章

java之Objects.nonNull用法代码解读

《java之Objects.nonNull用法代码解读》:本文主要介绍java之Objects.nonNull用法代码,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐... 目录Java之Objects.nonwww.chinasem.cnNull用法代码Objects.nonN

Java的IO模型、Netty原理解析

《Java的IO模型、Netty原理解析》Java的I/O是以流的方式进行数据输入输出的,Java的类库涉及很多领域的IO内容:标准的输入输出,文件的操作、网络上的数据传输流、字符串流、对象流等,这篇... 目录1.什么是IO2.同步与异步、阻塞与非阻塞3.三种IO模型BIO(blocking I/O)NI

使用Java实现通用树形结构构建工具类

《使用Java实现通用树形结构构建工具类》这篇文章主要为大家详细介绍了如何使用Java实现通用树形结构构建工具类,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 目录完整代码一、设计思想与核心功能二、核心实现原理1. 数据结构准备阶段2. 循环依赖检测算法3. 树形结构构建4. 搜索子

基于Flask框架添加多个AI模型的API并进行交互

《基于Flask框架添加多个AI模型的API并进行交互》:本文主要介绍如何基于Flask框架开发AI模型API管理系统,允许用户添加、删除不同AI模型的API密钥,感兴趣的可以了解下... 目录1. 概述2. 后端代码说明2.1 依赖库导入2.2 应用初始化2.3 API 存储字典2.4 路由函数2.5 应

利用Python开发Markdown表格结构转换为Excel工具

《利用Python开发Markdown表格结构转换为Excel工具》在数据管理和文档编写过程中,我们经常使用Markdown来记录表格数据,但它没有Excel使用方便,所以本文将使用Python编写一... 目录1.完整代码2. 项目概述3. 代码解析3.1 依赖库3.2 GUI 设计3.3 解析 Mark

SpringCloud负载均衡spring-cloud-starter-loadbalancer解读

《SpringCloud负载均衡spring-cloud-starter-loadbalancer解读》:本文主要介绍SpringCloud负载均衡spring-cloud-starter-loa... 目录简述主要特点使用负载均衡算法1. 轮询负载均衡策略(Round Robin)2. 随机负载均衡策略(

解读spring.factories文件配置详情

《解读spring.factories文件配置详情》:本文主要介绍解读spring.factories文件配置详情,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录使用场景作用内部原理机制SPI机制Spring Factories 实现原理用法及配置spring.f

Spring MVC使用视图解析的问题解读

《SpringMVC使用视图解析的问题解读》:本文主要介绍SpringMVC使用视图解析的问题解读,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录Spring MVC使用视图解析1. 会使用视图解析的情况2. 不会使用视图解析的情况总结Spring MVC使用视图

Linux中的进程间通信之匿名管道解读

《Linux中的进程间通信之匿名管道解读》:本文主要介绍Linux中的进程间通信之匿名管道解读,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、基本概念二、管道1、温故知新2、实现方式3、匿名管道(一)管道中的四种情况(二)管道的特性总结一、基本概念我们知道多

Linux系统之authconfig命令的使用解读

《Linux系统之authconfig命令的使用解读》authconfig是一个用于配置Linux系统身份验证和账户管理设置的命令行工具,主要用于RedHat系列的Linux发行版,它提供了一系列选项... 目录linux authconfig命令的使用基本语法常用选项示例总结Linux authconfi