【更新】cyのMemo（20240609~）

2024-06-11 22:52

文章标签 更新 memo cy 20240609

本文主要是介绍【更新】cyのMemo（20240609~），希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

序言

节后首日，中期结束，接下来一切即将如期而至，时光像道皮鞭在赶鸭子上架，但似乎还并没有准备好一切。

最近几日，休息得不是很好，状态跌落，但总之手头事情告一段落，可以好好休整几日，韬光养晦。

群里静了许多，大家都很忙碌。

筵席终散，然，命运的交点将于未来邂逅。

有感，且歌一场。

《五月初五登高赋》囚生
端午登高祈福佑，轻语绕梁疾未休。
花开叶落人渐朽，日没月升活似囚。
丛木泥径盘峰秀，远岫倩影碧空游。
君不见月圆常有，空太息圆月难留。

文章目录

序言
20240611

20240611

第一层和第一个头的值权重矩阵如下所示。

v_layer0_head0 = v_layer0 [0] v_layer0_head0.shape
# torch.Size ([128, 4096])

现在使用值权重来获取每个 token 的注意力值，其大小为 [17x128]，其中 17 为提示中的 token 数，128 为每个 token 的值向量维数。

v_per_token = torch.matmul (token_embeddings, v_layer0_head0.T)v_per_token.shape
# torch.Size ([17, 128])

与每个 token 的值相乘后得到的注意力向量的形状为 [17*128]。

qkv_attention = torch.matmul (qk_per_token_after_masking_after_softmax, v_per_token) qkv_attention.shape
# torch.Size ([17, 128])

现在有了第一层和第一个头的注意力值。

接下来运行一个循环并执行与上面单元完全相同的数学运算，不过第一层中的每个头除外。

qkv_attention_store = []
for head in range (n_heads):q_layer0_head = q_layer0 [head]k_layer0_head = k_layer0 [head//4] # key weights are shared across 4 heads
v_layer0_head = v_layer0 [head//4] # value weights are shared across 4 heads
q_per_token = torch.matmul (token_embeddings, q_layer0_head.T)k_per_token = torch.matmul (token_embeddings, k_layer0_head.T)v_per_token = torch.matmul (token_embeddings, v_layer0_head.T)q_per_token_split_into_pairs = q_per_token.float ().view (q_per_token.shape [0], -1, 2)q_per_token_as_complex_numbers = torch.view_as_complex (q_per_token_split_into_pairs)q_per_token_split_into_pairs_rotated = torch.view_as_real (q_per_token_as_complex_numbers * freqs_cis [:len (tokens)])q_per_token_rotated = q_per_token_split_into_pairs_rotated.view (q_per_token.shape)k_per_token_split_into_pairs = k_per_token.float ().view (k_per_token.shape [0], -1, 2)k_per_token_as_complex_numbers = torch.view_as_complex (k_per_token_split_into_pairs)k_per_token_split_into_pairs_rotated = torch.view_as_real (k_per_token_as_complex_numbers * freqs_cis [:len (tokens)])k_per_token_rotated = k_per_token_split_into_pairs_rotated.view (k_per_token.shape)qk_per_token = torch.matmul (q_per_token_rotated, k_per_token_rotated.T)/(128)**0.5
mask = torch.full ((len (tokens), len (tokens)), float ("-inf"), device=tokens.device)mask = torch.triu (mask, diagonal=1)qk_per_token_after_masking = qk_per_token + mask
qk_per_token_after_masking_after_softmax = torch.nn.functional.softmax (qk_per_token_after_masking, dim=1).to (torch.bfloat16)qkv_attention = torch.matmul (qk_per_token_after_masking_after_softmax, v_per_token)qkv_attention = torch.matmul (qk_per_token_after_masking_after_softmax, v_per_token)qkv_attention_store.append (qkv_attention)
len (qkv_attention_store)
# 32

现在第一层上的所有 32 个头都有了 qkv_attention 矩阵，并在快结束的时候将所有注意力分数合并为一个大小为 [17x4096] 的大矩阵。

stacked_qkv_attention = torch.cat (qkv_attention_store, dim=-1) stacked_qkv_attention.shape
# torch.Size ([17, 4096])

这篇关于【更新】cyのMemo（20240609~）的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

【更新】cyのMemo（20240609~）

序言

文章目录

20240611

相关文章

MySQL追踪数据库表更新操作来源的全面指南

Oracle 通过 ROWID 批量更新表的方法

Redis中6种缓存更新策略详解

Pandas利用主表更新子表指定列小技巧

MySQL更新某个字段拼接固定字符串的实现

MySQL新增字段后Java实体未更新的潜在问题与解决方案

一文详解SQL Server如何跟踪自动统计信息更新

Redis缓存问题与缓存更新机制详解

Linux Mint Xia 22.1重磅发布: 重要更新一览

SpringCloud配置动态更新原理解析