4bit/8bit 启动 Mixtral 8*7B 大语言模型

本文主要是介绍4bit/8bit 启动 Mixtral 8*7B 大语言模型，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

4bit/8bit 启动 Mixtral 8*7B 大语言模型

0. 背景
1. 修改代码

0. 背景

个人电脑配置实在难以以 float16 运行 Mixtral 8*7B 大语言模型，所以参数 4bit 或者 8bit 来启动。

实际测试结果，4bit 时推理速度明显变快了，8bit 时推理也非常慢。

使用的推理框架时 fastchat。

1. 修改代码

vi fastchat/model/model_adapter.py

修改前，

class MistralAdapter(BaseModelAdapter):"""The model adapter for Mistral AI models"""def match(self, model_path: str):return "mistral" in model_path.lower() or "mixtral" in model_path.lower()def load_model(self, model_path: str, from_pretrained_kwargs: dict):model, tokenizer = super().load_model(model_path, from_pretrained_kwargs)model.config.eos_token_id = tokenizer.eos_token_idmodel.config.pad_token_id = tokenizer.pad_token_idreturn model, tokenizer

修改后，

class MistralAdapter(BaseModelAdapter):"""The model adapter for Mistral AI models"""def match(self, model_path: str):return "mistral" in model_path.lower() or "mixtral" in model_path.lower()def load_model(self, model_path: str, from_pretrained_kwargs: dict):# model, tokenizer = super().load_model(model_path, from_pretrained_kwargs)tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)if "mixtral" in model_path.lower():model = AutoModelForCausalLM.from_pretrained(model_path,low_cpu_mem_usage=True,trust_remote_code=True,# attn_implementation="flash_attention_2",# load_in_8bit=True,load_in_4bit=True,**from_pretrained_kwargs,)else:model = AutoModelForCausalLM.from_pretrained(model_path,low_cpu_mem_usage=True,trust_remote_code=True,**from_pretrained_kwargs,)model.config.eos_token_id = tokenizer.eos_token_idmodel.config.pad_token_id = tokenizer.pad_token_idreturn model, tokenizer

完结！

这篇关于4bit/8bit 启动 Mixtral 8*7B 大语言模型的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！