本文主要是介绍大模型单次预测下一个token的过程分析,帮助理解model.generate,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
大模型单次预测下一个token的过程分析,帮助理解model.generate
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModel
from transformers.modeling_outputs import CausalLMOutputWithPast
device = "cuda" # the device to load the model ontompath='Models/Qwen2-1.5B-Instruct'
model = AutoModelForCausalLM.from_pretrained(mpath,torch_dtype="auto",device_map="auto"
)tokenizer = AutoTokenizer.from_pretrained(mpath)prompt = "Give me a short introduction to large language model."
messages = [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)print(model_inputs)
print(model_inputs['input_ids'].shape)
res=model(input_ids=model_inputs['input_ids'])print(type(res))
print(res.logits.shape)logg=res.logits[:,-1,:]
print(logg.shape)import torch
import torch.nn.functional as F
# 计算softmax
softmax_probs = F.softmax(logg, dim=1)# 获取最大概率的索引
max_index = torch.argmax(softmax_probs, dim=1)
print(max_index)# 将索引解码为文本
decoded_text = tokenizer.decode(max_index)print(decoded_text)
输出结果如下
{'input_ids': tensor([[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13,151645, 198, 151644, 872, 198, 35127, 752, 264, 2805,16800, 311, 3460, 4128, 1614, 13, 151645, 198, 151644,77091, 198]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1]], device='cuda:0')}
torch.Size([1, 29])
<class 'transformers.modeling_outputs.CausalLMOutputWithPast'>
torch.Size([1, 29, 151936])
torch.Size([1, 151936])
tensor([32], device='cuda:0')
A
这篇关于大模型单次预测下一个token的过程分析,帮助理解model.generate的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!