AGI|无GPU也能畅行无阻!Ollama大模型本地运行教程

2024-03-26 11:36

本文主要是介绍AGI|无GPU也能畅行无阻!Ollama大模型本地运行教程,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

本文介绍了如何在无GPU环境下,通过安装Docker、Ollama、Anaconda并创建虚拟环境,实现大模型的本地运行。安装完成后,启动API服务并进行测试,确保模型的高效稳定运行。Ollama的本地部署方案为没有GPU资源的用户提供了便捷的大模型运行方案。

目录

一、实施步骤

安装Docker(可跳过)

安装Ollama

二、API服务

三、测试


一、实施步骤

系统推荐使用Linux,如果是Windows请使用WSL2(2虚拟了完整的Linux内核,相当于Linux)

安装Docker(可跳过)

#更新源
yum -y update
yum install -y yum-utils#添加源
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo#安装docker
yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin#启动docker
systemctl start docker#开机自启
systemctl enable docker#验证
docker --version
#Docker version 25.0.1, build 29cf629

安装Ollama

#启动 ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama#加载一个模型,这里以llama2为例
docker exec -itd ollama ollama run qwen:7b

安装Anaconda并创建虚拟环境(可跳过)

#进入安装目录
cd /opt#下载Anaconda,如果提示没有wget请安装一下
wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh#安装Anaconda
bash Anaconda3-2023.09-0-Linux-x86_64.sh#创建ollama虚拟环境
conda create -n ollama python=3.10#激活虚拟环境
conda activate ollama

二、API服务

ollama本身提供了API服务,但是流式处理有点问题,python版本的没问题,这里以一个api_demo为例对齐chatgpt的api。

代码来源:LLaMA-Factory/src/api_demo.py

# 安装依赖
pip install ollama sse_starlette fastapi# 创建api_demo.py 文件
touch api_demo.py
vi api_demo.py
python api_demo.py

import asyncio
import json
import os
from typing import Any, Dict, Sequenceimport ollama
from sse_starlette.sse import EventSourceResponse
from fastapi import FastAPI, HTTPException, status
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
import time
from enum import Enum, unique
from typing import List, Optionalfrom pydantic import BaseModel, Field
from typing_extensions import Literal@unique
class Role(str, Enum):USER = "user"ASSISTANT = "assistant"SYSTEM = "system"FUNCTION = "function"TOOL = "tool"OBSERVATION = "observation"@unique
class Finish(str, Enum):STOP = "stop"LENGTH = "length"TOOL = "tool_calls"class ModelCard(BaseModel):id: strobject: Literal["model"] = "model"created: int = Field(default_factory=lambda: int(time.time()))owned_by: Literal["owner"] = "owner"class ModelList(BaseModel):object: Literal["list"] = "list"data: List[ModelCard] = []class Function(BaseModel):name: strarguments: strclass FunctionCall(BaseModel):id: Literal["call_default"] = "call_default"type: Literal["function"] = "function"function: Functionclass ChatMessage(BaseModel):role: Rolecontent: strclass ChatCompletionMessage(BaseModel):role: Optional[Role] = Nonecontent: Optional[str] = Nonetool_calls: Optional[List[FunctionCall]] = Noneclass ChatCompletionRequest(BaseModel):model: strmessages: List[ChatMessage]tools: Optional[list] = []do_sample: bool = Truetemperature: Optional[float] = Nonetop_p: Optional[float] = Nonen: int = 1max_tokens: Optional[int] = Nonestream: bool = Falseclass ChatCompletionResponseChoice(BaseModel):index: intmessage: ChatCompletionMessagefinish_reason: Finishclass ChatCompletionResponseStreamChoice(BaseModel):index: intdelta: ChatCompletionMessagefinish_reason: Optional[Finish] = Noneclass ChatCompletionResponseUsage(BaseModel):prompt_tokens: intcompletion_tokens: inttotal_tokens: intclass ChatCompletionResponse(BaseModel):id: Literal["chatcmpl-default"] = "chatcmpl-default"object: Literal["chat.completion"] = "chat.completion"created: int = Field(default_factory=lambda: int(time.time()))model: strchoices: List[ChatCompletionResponseChoice]usage: ChatCompletionResponseUsageclass ChatCompletionStreamResponse(BaseModel):id: Literal["chatcmpl-default"] = "chatcmpl-default"object: Literal["chat.completion.chunk"] = "chat.completion.chunk"created: int = Field(default_factory=lambda: int(time.time()))model: strchoices: List[ChatCompletionResponseStreamChoice]class ScoreEvaluationRequest(BaseModel):model: strmessages: List[str]max_length: Optional[int] = Noneclass ScoreEvaluationResponse(BaseModel):id: Literal["scoreeval-default"] = "scoreeval-default"object: Literal["score.evaluation"] = "score.evaluation"model: strscores: List[float]def dictify(data: "BaseModel") -> Dict[str, Any]:try: # pydantic v2return data.model_dump(exclude_unset=True)except AttributeError: # pydantic v1return data.dict(exclude_unset=True)def jsonify(data: "BaseModel") -> str:try: # pydantic v2return json.dumps(data.model_dump(exclude_unset=True), ensure_ascii=False)except AttributeError: # pydantic v1return data.json(exclude_unset=True, ensure_ascii=False)def create_app() -> "FastAPI":app = FastAPI()app.add_middleware(CORSMiddleware,allow_origins=["*"],allow_credentials=True,allow_methods=["*"],allow_headers=["*"],)semaphore = asyncio.Semaphore(int(os.environ.get("MAX_CONCURRENT", 1)))@app.get("/v1/models", response_model=ModelList)async def list_models():model_card = ModelCard(id="gpt-3.5-turbo")return ModelList(data=[model_card])@app.post("/v1/chat/completions", response_model=ChatCompletionResponse, status_code=status.HTTP_200_OK)async def create_chat_completion(request: ChatCompletionRequest):if len(request.messages) == 0 or request.messages[-1].role not in [Role.USER, Role.TOOL]:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid length")messages = [dictify(message) for message in request.messages]if len(messages) and messages[0]["role"] == Role.SYSTEM:system = messages.pop(0)["content"]else:system = Noneif len(messages) % 2 == 0:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Only supports u/a/u/a/u...")for i in range(len(messages)):if i % 2 == 0 and messages[i]["role"] not in [Role.USER, Role.TOOL]:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")elif i % 2 == 1 and messages[i]["role"] not in [Role.ASSISTANT, Role.FUNCTION]:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")elif messages[i]["role"] == Role.TOOL:messages[i]["role"] = Role.OBSERVATIONtool_list = request.toolsif len(tool_list):try:tools = json.dumps([tool_list[0]["function"]], ensure_ascii=False)except Exception:raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid tools")else:tools = ""async with semaphore:loop = asyncio.get_running_loop()return await loop.run_in_executor(None, chat_completion, messages, system, tools, request)def chat_completion(messages: Sequence[Dict[str, str]], system: str, tools: str, request: ChatCompletionRequest):if request.stream:generate = stream_chat_completion(messages, system, tools, request)return EventSourceResponse(generate, media_type="text/event-stream")responses = ollama.chat(model=request.model,messages=messages,options={"top_p": request.top_p,"temperature": request.temperature})prompt_length, response_length = 0, 0choices = []result = responses['message']['content']response_message = ChatCompletionMessage(role=Role.ASSISTANT, content=result)finish_reason = Finish.STOP if responses.get("done", False) == True else Finish.LENGTHchoices.append(ChatCompletionResponseChoice(index=0, message=response_message, finish_reason=finish_reason))prompt_length = -1response_length += -1usage = ChatCompletionResponseUsage(prompt_tokens=prompt_length,completion_tokens=response_length,total_tokens=prompt_length + response_length,)return ChatCompletionResponse(model=request.model, choices=choices, usage=usage)def stream_chat_completion(messages: Sequence[Dict[str, str]], system: str, tools: str, request: ChatCompletionRequest):choice_data = ChatCompletionResponseStreamChoice(index=0, delta=ChatCompletionMessage(role=Role.ASSISTANT, content=""), finish_reason=None)chunk = ChatCompletionStreamResponse(model=request.model, choices=[choice_data])yield jsonify(chunk)for new_text in ollama.chat(model=request.model,messages=messages,stream=True,options={"top_p": request.top_p,"temperature": request.temperature}):if len(new_text) == 0:continuechoice_data = ChatCompletionResponseStreamChoice(index=0, delta=ChatCompletionMessage(content=new_text['message']['content']), finish_reason=None)chunk = ChatCompletionStreamResponse(model=request.model, choices=[choice_data])yield jsonify(chunk)choice_data = ChatCompletionResponseStreamChoice(index=0, delta=ChatCompletionMessage(), finish_reason=Finish.STOP)chunk = ChatCompletionStreamResponse(model=request.model, choices=[choice_data])yield jsonify(chunk)yield "[DONE]"return appif __name__ == "__main__":app = create_app()uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("API_PORT", 8000)), workers=1)

三、测试

curl --location 'http://127.0.0.1:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{"model": "qwen:7b","messages": [{"role": "user", "content": "What is the OpenAI mission?"}],"stream": true,"temperature": 0.7,"top_p": 1
}'

经过测试,速度在8token/s左右。

以上就是本期全部内容,有疑问的小伙伴欢迎留言讨论~

作者:徐辉| 后端开发工程师

更多AI小知识欢迎关注“神州数码云基地”公众号,回复“AI与数字化转型”进入社群交流

版权声明:文章由神州数码武汉云基地团队实践整理输出,转载请注明出处。

这篇关于AGI|无GPU也能畅行无阻!Ollama大模型本地运行教程的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/848364

相关文章

一文详解Git中分支本地和远程删除的方法

《一文详解Git中分支本地和远程删除的方法》在使用Git进行版本控制的过程中,我们会创建多个分支来进行不同功能的开发,这就容易涉及到如何正确地删除本地分支和远程分支,下面我们就来看看相关的实现方法吧... 目录技术背景实现步骤删除本地分支删除远程www.chinasem.cn分支同步删除信息到其他机器示例步骤

Python中Tensorflow无法调用GPU问题的解决方法

《Python中Tensorflow无法调用GPU问题的解决方法》文章详解如何解决TensorFlow在Windows无法识别GPU的问题,需降级至2.10版本,安装匹配CUDA11.2和cuDNN... 当用以下代码查看GPU数量时,gpuspython返回的是一个空列表,说明tensorflow没有找到

深度解析Spring AOP @Aspect 原理、实战与最佳实践教程

《深度解析SpringAOP@Aspect原理、实战与最佳实践教程》文章系统讲解了SpringAOP核心概念、实现方式及原理,涵盖横切关注点分离、代理机制(JDK/CGLIB)、切入点类型、性能... 目录1. @ASPect 核心概念1.1 AOP 编程范式1.2 @Aspect 关键特性2. 完整代码实

前端如何通过nginx访问本地端口

《前端如何通过nginx访问本地端口》:本文主要介绍前端如何通过nginx访问本地端口的问题,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、nginx安装1、下载(1)下载地址(2)系统选择(3)版本选择2、安装部署(1)解压(2)配置文件修改(3)启动(4)

Java Web实现类似Excel表格锁定功能实战教程

《JavaWeb实现类似Excel表格锁定功能实战教程》本文将详细介绍通过创建特定div元素并利用CSS布局和JavaScript事件监听来实现类似Excel的锁定行和列效果的方法,感兴趣的朋友跟随... 目录1. 模拟Excel表格锁定功能2. 创建3个div元素实现表格锁定2.1 div元素布局设计2.

SpringBoot连接Redis集群教程

《SpringBoot连接Redis集群教程》:本文主要介绍SpringBoot连接Redis集群教程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录1. 依赖2. 修改配置文件3. 创建RedisClusterConfig4. 测试总结1. 依赖 <de

k8s上运行的mysql、mariadb数据库的备份记录(支持x86和arm两种架构)

《k8s上运行的mysql、mariadb数据库的备份记录(支持x86和arm两种架构)》本文记录在K8s上运行的MySQL/MariaDB备份方案,通过工具容器执行mysqldump,结合定时任务实... 目录前言一、获取需要备份的数据库的信息二、备份步骤1.准备工作(X86)1.准备工作(arm)2.手

Java使用HttpClient实现图片下载与本地保存功能

《Java使用HttpClient实现图片下载与本地保存功能》在当今数字化时代,网络资源的获取与处理已成为软件开发中的常见需求,其中,图片作为网络上最常见的资源之一,其下载与保存功能在许多应用场景中都... 目录引言一、Apache HttpClient简介二、技术栈与环境准备三、实现图片下载与保存功能1.

Nexus安装和启动的实现教程

《Nexus安装和启动的实现教程》:本文主要介绍Nexus安装和启动的实现教程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、Nexus下载二、Nexus安装和启动三、关闭Nexus总结一、Nexus下载官方下载链接:DownloadWindows系统根

CnPlugin是PL/SQL Developer工具插件使用教程

《CnPlugin是PL/SQLDeveloper工具插件使用教程》:本文主要介绍CnPlugin是PL/SQLDeveloper工具插件使用教程,具有很好的参考价值,希望对大家有所帮助,如有错... 目录PL/SQL Developer工具插件使用安装拷贝文件配置总结PL/SQL Developer工具插