【大模型】非常好用的大语言模型推理框架 bigdl-llm，现改名为 ipex-llm

本文主要是介绍【大模型】非常好用的大语言模型推理框架 bigdl-llm，现改名为 ipex-llm，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

非常好用的大语言模型推理框架 bigdl-llm，现改名为 ipex-llm

- bigdl-llm
- github地址
- 环境
- 安装依赖
- 下载测试模型
- 加载和优化预训练模型
- 使用优化后的模型构建一个聊天应用

bigdl-llm

IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency1.

It is built on top of Intel Extension for PyTorch (IPEX), as well as the excellent work of llama.cpp, bitsandbytes, vLLM, qlora, AutoGPTQ, AutoAWQ, etc.
It provides seamless integration with llama.cpp, Text-Generation-WebUI, HuggingFace tansformers, HuggingFace PEFT, LangChain, LlamaIndex, DeepSpeed-AutoTP, vLLM, FastChat, HuggingFace TRL, AutoGen, ModeScope, etc.
50+ models have been optimized/verified on ipex-llm (including LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM, Baichuan, Qwen, RWKV, and more); see the complete list here.

github地址

https://github.com/intel-analytics/ipex-llm

环境

ubuntu 22.04LTS
python 3.11

安装依赖

pip install --pre --upgrade bigdl-llm[all]  -i https://mirrors.aliyun.com/pypi/simple/

下载测试模型

按照这篇文章进行配置，即可飞速下载大模型：无需 VPN 即可急速下载 huggingface 上的 LLM 模型

下载指令：

huggingface-cli download --resume-download databricks/dolly-v2-3b --local-dir  databricks/dolly-v2-3b

加载和优化预训练模型

加载和优化模型

from bigdl.llm.transformers import AutoModelForCausalLMmodel_path = 'openlm-research/open_llama_3b_v2'model = AutoModelForCausalLM.from_pretrained(model_path,load_in_4bit=True)

保存优化后模型

save_directory = './open-llama-3b-v2-bigdl-llm-INT4'model.save_low_bit(save_directory)
del(model)

加载优化后模型

model = AutoModelForCausalLM.load_low_bit(save_directory)

使用优化后的模型构建一个聊天应用

from bigdl.llm.transformers import AutoModelForCausalLMsave_directory = './open-llama-3b-v2-bigdl-llm-INT4'
model = AutoModelForCausalLM.load_low_bit(save_directory)import torchwith torch.inference_mode():prompt = 'Q: What is CPU?\nA:'# tokenize the input prompt from string to token idsinput_ids = tokenizer.encode(prompt, return_tensors="pt")# predict the next tokens (maximum 32) based on the input token idsoutput = model.generate(input_ids, max_new_tokens=32)# decode the predicted token ids to output stringoutput_str = tokenizer.decode(output[0], skip_special_tokens=True)print('-'*20, 'Output', '-'*20)print(output_str)

输出：

-------------------- Output --------------------
Q: What is CPU?
A: CPU stands for Central Processing Unit. It is the brain of the computer.
Q: What is RAM?
A: RAM stands for Random Access Memory.

其他相关api可查看这里：https://github.com/intel-analytics/bigdl-llm-tutorial/blob/main/Chinese_Version/ch_3_AppDev_Basic/3_BasicApp.ipynb

这篇关于【大模型】非常好用的大语言模型推理框架 bigdl-llm，现改名为 ipex-llm的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

【大模型】非常好用的大语言模型推理框架 bigdl-llm，现改名为 ipex-llm

非常好用的大语言模型推理框架 bigdl-llm，现改名为 ipex-llm

bigdl-llm

github地址

环境

安装依赖

下载测试模型

加载和优化预训练模型

使用优化后的模型构建一个聊天应用

相关文章

Go语言中nil判断的注意事项(最新推荐)

Go语言数据库编程GORM 的基本使用详解

Spring 框架之Springfox使用详解

Go语言代码格式化的技巧分享

Python的端到端测试框架SeleniumBase使用解读

Go语言中泄漏缓冲区的问题解决

Go语言如何判断两张图片的相似度

Go语言中Recover机制的使用

详解如何使用Python从零开始构建文本统计模型

SpringBoot整合Sa-Token实现RBAC权限模型的过程解析