NLP(六十四)使用FastChat计算LLaMA-2模型的token长度

2023-10-18 14:20

本文主要是介绍NLP(六十四)使用FastChat计算LLaMA-2模型的token长度,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

LLaMA-2模型部署

  在文章NLP(五十九)使用FastChat部署百川大模型中,笔者介绍了FastChat框架,以及如何使用FastChat来部署百川模型。
  本文将会部署LLaMA-2 70B模型,使得其兼容OpenAI的调用风格。部署的Dockerfile文件如下:

FROM nvidia/cuda:11.7.1-runtime-ubuntu20.04RUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils curl
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
RUN pip3 install fschat

Docker-compose.yml文件如下:

version: "3.9"services:fastchat-controller:build:context: .dockerfile: Dockerfileimage: fastchat:latestports:- "21001:21001"entrypoint: ["python3.9", "-m", "fastchat.serve.controller", "--host", "0.0.0.0", "--port", "21001"]fastchat-model-worker:build:context: .dockerfile: Dockerfilevolumes:- ./model:/root/modelimage: fastchat:latestports:- "21002:21002"deploy:resources:reservations:devices:- driver: nvidiadevice_ids: ['0', '1']capabilities: [gpu]entrypoint: ["python3.9", "-m", "fastchat.serve.model_worker", "--model-names", "llama2-70b-chat", "--model-path", "/root/model/llama2/Llama-2-70b-chat-hf", "--num-gpus", "2", "--gpus",  "0,1", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002"]fastchat-api-server:build:context: .dockerfile: Dockerfileimage: fastchat:latestports:- "8000:8000"entrypoint: ["python3.9", "-m", "fastchat.serve.openai_api_server", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "8000"]

部署成功后,会占用2张A100,每张A100占用约66G显存。
  测试模型是否部署成功:

curl http://localhost:8000/v1/models

输出结果如下:

{"object": "list","data": [{"id": "llama2-70b-chat","object": "model","created": 1691504717,"owned_by": "fastchat","root": "llama2-70b-chat","parent": null,"permission": [{"id": "modelperm-3XG6nzMAqfEkwfNqQ52fdv","object": "model_permission","created": 1691504717,"allow_create_engine": false,"allow_sampling": true,"allow_logprobs": true,"allow_search_indices": true,"allow_view": true,"allow_fine_tuning": false,"organization": "*","group": null,"is_blocking": false}]}]
}

部署LLaMA-2 70B模型成功!

Prompt token长度计算

  在FastChat的Github开源项目中,项目提供了计算Prompt的token长度的API,文件路径为:fastchat/serve/model_worker.py,调用方法为:

curl --location 'localhost:21002/count_token' \
--header 'Content-Type: application/json' \
--data '{"prompt": "What is your name?"}'

输出结果如下:

{"count": 6,"error_code": 0
}

Conversation token长度计算

  在FastChat中计算Conversation(对话)的token长度较为麻烦。
  首先我们需要获取LLaMA-2 70B模型的对话配置,调用API如下:

curl --location --request POST 'http://localhost:21002/worker_get_conv_template'

输出结果如下:

{'conv': {'messages': [],'name': 'llama-2','offset': 0,'roles': ['[INST]', '[/INST]'],'sep': ' ','sep2': ' </s><s>','sep_style': 7,'stop_str': None,'stop_token_ids': [2],'system_message': 'You are a helpful, respectful and honest ''assistant. Always answer as helpfully as ''possible, while being safe. Your answers should ''not include any harmful, unethical, racist, ''sexist, toxic, dangerous, or illegal content. ''Please ensure that your responses are socially ''unbiased and positive in nature.\n''\n''If a question does not make any sense, or is not ''factually coherent, explain why instead of '"answering something not correct. If you don't ""know the answer to a question, please don't share "'false information.','system_template': '[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n'}}

  在FastChat中的对话文件(fastchat/conversation.py)中,提供了对话加工的代码,这里不再展示,使用时直接复制整个文件即可,该文件不依赖任何第三方模块。
  我们需要将对话按照OpenAI的方式加工成对应的Prompt,输入的对话(messages)如下:

messages = [{“role”: “system”, “content”: “You are Jack, you are 20 years old, answer questions with humor.”}, {“role”: “user”, “content”: “What is your name?”},{“role”: “assistant”, “content”: " Well, well, well! Look who’s asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend!“}, {“role”: “user”, “content”: “How old are you?”}, {“role”: “assistant”, “content”: " Oh, you want to know my age? Well, let’s just say I’m older than a bottle of wine but younger than a bottle of whiskey. I’m like a fine cheese, getting better with age, but still young enough to party like it’s 1999!”}, {“role”: “user”, “content”: “Where is your hometown?”}]

Python代码如下:

# -*- coding: utf-8 -*-
# @place: Pudong, Shanghai 
# @file: prompt.py
# @time: 2023/8/8 19:24
from conversation import Conversation, SeparatorStylemessages = [{"role": "system", "content": "You are Jack, you are 20 years old, answer questions with humor."}, {"role": "user", "content": "What is your name?"},{"role": "assistant", "content": " Well, well, well! Look who's asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend!"}, {"role": "user", "content": "How old are you?"}, {"role": "assistant", "content": " Oh, you want to know my age? Well, let's just say I'm older than a bottle of wine but younger than a bottle of whiskey. I'm like a fine cheese, getting better with age, but still young enough to party like it's 1999!"}, {"role": "user", "content": "Where is your hometown?"}]llama2_conv = {"conv":{"name":"llama-2","system_template":"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n","system_message":"You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.","roles":["[INST]","[/INST]"],"messages":[],"offset":0,"sep_style":7,"sep":" ","sep2":" </s><s>","stop_str":None,"stop_token_ids":[2]}}
conv = llama2_conv['conv']conv = Conversation(name=conv["name"],system_template=conv["system_template"],system_message=conv["system_message"],roles=conv["roles"],messages=list(conv["messages"]),  # prevent in-place modificationoffset=conv["offset"],sep_style=SeparatorStyle(conv["sep_style"]),sep=conv["sep"],sep2=conv["sep2"],stop_str=conv["stop_str"],stop_token_ids=conv["stop_token_ids"],)if isinstance(messages, str):prompt = messages
else:for message in messages:msg_role = message["role"]if msg_role == "system":conv.set_system_message(message["content"])elif msg_role == "user":conv.append_message(conv.roles[0], message["content"])elif msg_role == "assistant":conv.append_message(conv.roles[1], message["content"])else:raise ValueError(f"Unknown role: {msg_role}")# Add a blank message for the assistant.conv.append_message(conv.roles[1], None)prompt = conv.get_prompt()print(repr(prompt))

加工后的Prompt如下:

"[INST] <<SYS>>\nYou are Jack, you are 20 years old, answer questions with humor.\n<</SYS>>\n\nWhat is your name?[/INST]  Well, well, well! Look who's asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend! </s><s>[INST] How old are you? [/INST]  Oh, you want to know my age? Well, let's just say I'm older than a bottle of wine but younger than a bottle of whiskey. I'm like a fine cheese, getting better with age, but still young enough to party like it's 1999! </s><s>[INST] Where is your hometown? [/INST]"

  最后再调用计算Prompt的API(参考上节的Prompt token长度计算),输出该对话的token长度为199.
  我们使用FastChat提供的对话补充接口(v1/chat/completions)验证输入的对话token长度,请求命令为:

curl --location 'http://localhost:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{"model": "llama2-70b-chat","messages": [{"role": "system", "content": "You are Jack, you are 20 years old, answer questions with humor."}, {"role": "user", "content": "What is your name?"},{"role": "assistant", "content": " Well, well, well! Look who'\''s asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend!"}, {"role": "user", "content": "How old are you?"}, {"role": "assistant", "content": " Oh, you want to know my age? Well, let'\''s just say I'\''m older than a bottle of wine but younger than a bottle of whiskey. I'\''m like a fine cheese, getting better with age, but still young enough to party like it'\''s 1999!"}, {"role": "user", "content": "Where is your hometown?"}]
}'

输出结果为:

{"id": "chatcmpl-mQxcaQcNSNMFahyHS7pamA","object": "chat.completion","created": 1691506768,"model": "llama2-70b-chat","choices": [{"index": 0,"message": {"role": "assistant","content": " Ha! My hometown? Well, that's a tough one. I'm like a bird, I don't have a nest, I just fly around and land wherever the wind takes me. But if you really want to know, I'm from a place called \"The Internet\". It's a magical land where memes and cat videos roam free, and the Wi-Fi is always strong. It's a beautiful place, you should visit sometime!"},"finish_reason": "stop"}],"usage": {"prompt_tokens": 199,"total_tokens": 302,"completion_tokens": 103}
}

注意,输出的prompt_tokens为199,这与我们刚才计算的对话token长度的结果是一致的!

总结

  本文主要介绍了如何在FastChat中部署LLaMA-2 70B模型,并详细介绍了Prompt token长度计算以及对话(conversation)的token长度计算。希望能对读者有所帮助~
  笔者的一点心得是:阅读源码真的很重要。
  笔者的个人博客网址为:https://percent4.github.io/ ,欢迎大家访问~

参考网址

  1. NLP(五十九)使用FastChat部署百川大模型: https://blog.csdn.net/jclian91/article/details/131650918
  2. FastChat: https://github.com/lm-sys/FastChat

  欢迎关注我的公众号NLP奇幻之旅,原创技术文章第一时间推送。

  欢迎关注我的知识星球“自然语言处理奇幻之旅”,笔者正在努力构建自己的技术社区。

这篇关于NLP(六十四)使用FastChat计算LLaMA-2模型的token长度的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/233129

相关文章

大模型研发全揭秘:客服工单数据标注的完整攻略

在人工智能(AI)领域,数据标注是模型训练过程中至关重要的一步。无论你是新手还是有经验的从业者,掌握数据标注的技术细节和常见问题的解决方案都能为你的AI项目增添不少价值。在电信运营商的客服系统中,工单数据是客户问题和解决方案的重要记录。通过对这些工单数据进行有效标注,不仅能够帮助提升客服自动化系统的智能化水平,还能优化客户服务流程,提高客户满意度。本文将详细介绍如何在电信运营商客服工单的背景下进行

中文分词jieba库的使用与实景应用(一)

知识星球:https://articles.zsxq.com/id_fxvgc803qmr2.html 目录 一.定义: 精确模式(默认模式): 全模式: 搜索引擎模式: paddle 模式(基于深度学习的分词模式): 二 自定义词典 三.文本解析   调整词出现的频率 四. 关键词提取 A. 基于TF-IDF算法的关键词提取 B. 基于TextRank算法的关键词提取

使用SecondaryNameNode恢复NameNode的数据

1)需求: NameNode进程挂了并且存储的数据也丢失了,如何恢复NameNode 此种方式恢复的数据可能存在小部分数据的丢失。 2)故障模拟 (1)kill -9 NameNode进程 [lytfly@hadoop102 current]$ kill -9 19886 (2)删除NameNode存储的数据(/opt/module/hadoop-3.1.4/data/tmp/dfs/na

Hadoop数据压缩使用介绍

一、压缩原则 (1)运算密集型的Job,少用压缩 (2)IO密集型的Job,多用压缩 二、压缩算法比较 三、压缩位置选择 四、压缩参数配置 1)为了支持多种压缩/解压缩算法,Hadoop引入了编码/解码器 2)要在Hadoop中启用压缩,可以配置如下参数

Makefile简明使用教程

文章目录 规则makefile文件的基本语法:加在命令前的特殊符号:.PHONY伪目标: Makefilev1 直观写法v2 加上中间过程v3 伪目标v4 变量 make 选项-f-n-C Make 是一种流行的构建工具,常用于将源代码转换成可执行文件或者其他形式的输出文件(如库文件、文档等)。Make 可以自动化地执行编译、链接等一系列操作。 规则 makefile文件

使用opencv优化图片(画面变清晰)

文章目录 需求影响照片清晰度的因素 实现降噪测试代码 锐化空间锐化Unsharp Masking频率域锐化对比测试 对比度增强常用算法对比测试 需求 对图像进行优化,使其看起来更清晰,同时保持尺寸不变,通常涉及到图像处理技术如锐化、降噪、对比度增强等 影响照片清晰度的因素 影响照片清晰度的因素有很多,主要可以从以下几个方面来分析 1. 拍摄设备 相机传感器:相机传

Andrej Karpathy最新采访:认知核心模型10亿参数就够了,AI会打破教育不公的僵局

夕小瑶科技说 原创  作者 | 海野 AI圈子的红人,AI大神Andrej Karpathy,曾是OpenAI联合创始人之一,特斯拉AI总监。上一次的动态是官宣创办一家名为 Eureka Labs 的人工智能+教育公司 ,宣布将长期致力于AI原生教育。 近日,Andrej Karpathy接受了No Priors(投资博客)的采访,与硅谷知名投资人 Sara Guo 和 Elad G

pdfmake生成pdf的使用

实际项目中有时会有根据填写的表单数据或者其他格式的数据,将数据自动填充到pdf文件中根据固定模板生成pdf文件的需求 文章目录 利用pdfmake生成pdf文件1.下载安装pdfmake第三方包2.封装生成pdf文件的共用配置3.生成pdf文件的文件模板内容4.调用方法生成pdf 利用pdfmake生成pdf文件 1.下载安装pdfmake第三方包 npm i pdfma

零基础学习Redis(10) -- zset类型命令使用

zset是有序集合,内部除了存储元素外,还会存储一个score,存储在zset中的元素会按照score的大小升序排列,不同元素的score可以重复,score相同的元素会按照元素的字典序排列。 1. zset常用命令 1.1 zadd  zadd key [NX | XX] [GT | LT]   [CH] [INCR] score member [score member ...]

Retrieval-based-Voice-Conversion-WebUI模型构建指南

一、模型介绍 Retrieval-based-Voice-Conversion-WebUI(简称 RVC)模型是一个基于 VITS(Variational Inference with adversarial learning for end-to-end Text-to-Speech)的简单易用的语音转换框架。 具有以下特点 简单易用:RVC 模型通过简单易用的网页界面,使得用户无需深入了