triton入门实战

2024-04-16 10:28
文章标签 实战 入门 triton

本文主要是介绍triton入门实战,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

这篇文章主要讲的是基于官方镜像及, pytorch script 格式模型,构建tritonserver 服务

1、环境准备:

  • 1.1. 下载 tritonserver镜像: Triton Inference Server | NVIDIA NGC

    • a. 注意:tritonserver 镜像中的invdia驱动版本对应,否则后面会启动失败。
  • 1.2. 然后,拉取Pytorch官方镜像作为推理系统的客户端同时进行一些预处理操作(当然也可以直接拉取tritonserver客户端SDK镜像)。

    • a. docker pull pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel暂时无法提供下载链接,因为无法访问dockerhub。
    • b. tritonserver客户端SDK镜像 Triton Inference Server | NVIDIA NGC
      # nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
      # docker pull nvcr.io/nvidia/tritonserver:23.04-py3-sdk
  • 1.3. 接下来,基于官方Pytorch镜像创建一个容器客户端。

    • a. 本地创建共享目录, D:\chinasoft\shumei\triton\demo_first\pytorch_container\workspace
    • b. docker run -dt --name pytorch200_cu117_dev --restart=always --gpus all --network=host --shm-size 4G -v /D/chinasoft/shumei/triton/demo_first/pytorch_container/workspace:/workspace -w /workspace pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel /bin/bash
    • c. 进入容器,docker exec -it pytorch200_cu117_dev bash
    • d. pip install datasets transformers -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
    • e. pip install tritonclient[all] -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn

2、模型准备

  • 2.1. 文将基于 PyTorch 后端使用 resnet50 模型来进行图片分类,因此,需预先下载 resnet50 模型,然后将其转换为torchscript格式。具体代码(resnet50_convert_torchscript.py )如下所示:
import torch
import torchvision.models as modelsresnet50 = models.resnet50(pretrained=True)
resnet50.eval()
image = torch.randn(1, 3, 244, 244)
resnet50_traced = torch.jit.trace(resnet50, image)
resnet50(image)
# resnet50_traced.save('/workspace/model/resnet50/model.pt')
torch.jit.save(resnet50_traced, "/workspace/model/resnet50/model.pt")
  • 2 2. 最后,拉取Triton Server 代码库。

    git clone -b r23.04 https://github.com/triton-inference-server/server.git

    一些常见后端backend的配置都在server/docs/examples目录下。

tree docs/examples -L 2
docs/examples
|-- README.md
|-- fetch_models.sh
|-- jetson
|   |-- README.md
|   `-- concurrency_and_dynamic_batching
`-- model_repository|-- densenet_onnx|-- inception_graphdef|-- simple|-- simple_dyna_sequence|-- simple_identity|-- simple_int8|-- simple_sequence`-- simple_string11 directories, 3 files
  • 2.3. 拉取Triton Tutorials库,该仓库中包含Triton的教程和样例,本文使用Quick_Deploy/PyTorch下部署一个Pytorch模型进行讲解。

    git clone https://github.com/triton-inference-server/tutorials.git

3、开发实践

  • 3.1 首先,在宿主机构建一个模型仓库,仓库的目录结构如下所示:
model_repository/
`-- resnet50|-- 1|   `-- model.pt`-- config.pbtxt

其中, config.pbtxt 是模型配置文件; 1表示模型版本号; resnet50表示模型名,需要与config.pbtxt文件中的name字段保存一致;model.pt为模型权重(即上面转换后的模型权重)。

  • 3.2. 编辑config.pbtxt文件,具体内容如下所示:
name: "resnet50"
platform: "pytorch_libtorch"
max_batch_size : 0
input [{name: "input__0"data_type: TYPE_FP32dims: [ 3, 224, 224 ]reshape { shape: [ 1, 3, 224, 224 ] }}
]
output [{name: "output__0"data_type: TYPE_FP32dims: [ 1, 1000 ,1, 1]reshape { shape: [ 1, 1000 ] }}
]

重要字段说明如下:

  • name:模型名
  • platform:用于指定模型对应的后端(backend),比如:pytorch_libtorch、onnxruntime_onnx、tensorrt_plan等
  • max_batch_size:模型推理在batch模式下支持的最大batch数
  • input:模型输入属性配置。
  • output:模型输出属性配置。

模型仓库构建好之后,接下来启动Triton推理服务端。

4、启动tritonserver推理服务

启动推理服务启动服务的方法有两种:一种是用 docker 启动并执行命令,一种是进入 docker 中然后手动调用命令。

我们在这里使用docker启动并执行命令:

docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /D/chinasoft/shumei/triton/demo_first/model_repository:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models

参数说明:

  • p:宿主机与容器内端口映射
  • v:将宿主机存储挂载进容器,这里将模型仓库挂载进容器
  • -model-repository:指定Triton服务模型仓库的地址
  • 这里注意指定的model_repository路径必须正确且模型文件已经配置无误,具体参考:模型准备章节。
(base) PS C:\Users\lenovo> docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /D/chinasoft/shumei/triton/demo_first/model_repository:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models=============================
== Triton Inference Server ==
=============================NVIDIA Release 22.12 (build 50109463)
Triton Server Version 2.29.0Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-licenseWARNING: CUDA Minor Version Compatibility mode ENABLED.Using driver version 516.94 which has support for CUDA 11.7.  This containerwas built with CUDA 11.8 and will be run in Minor Version Compatibility mode.CUDA Forward Compatibility is preferred over Minor Version Compatibility for usewith this container but was unavailable:[[]]See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.I0804 01:46:15.003883 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x304800000' with size 268435456
I0804 01:46:15.004050 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0804 01:46:15.322720 1 model_lifecycle.cc:459] loading: resnet50:1
I0804 01:46:17.472054 1 libtorch.cc:1985] TRITONBACKEND_Initialize: pytorch
I0804 01:46:17.472105 1 libtorch.cc:1995] Triton TRITONBACKEND API version: 1.10
I0804 01:46:17.472587 1 libtorch.cc:2001] 'pytorch' TRITONBACKEND API version: 1.10
I0804 01:46:17.472634 1 libtorch.cc:2034] TRITONBACKEND_ModelInitialize: resnet50 (version 1)
W0804 01:46:17.473291 1 libtorch.cc:284] skipping model configuration auto-complete for 'resnet50': not supported for pytorch backend
I0804 01:46:17.473618 1 libtorch.cc:313] Optimized execution is enabled for model instance 'resnet50'
I0804 01:46:17.473624 1 libtorch.cc:332] Cache Cleaning is disabled for model instance 'resnet50'
I0804 01:46:17.473626 1 libtorch.cc:349] Inference Mode is disabled for model instance 'resnet50'
I0804 01:46:17.473640 1 libtorch.cc:444] NvFuser is not specified for model instance 'resnet50'
I0804 01:46:17.473699 1 libtorch.cc:2078] TRITONBACKEND_ModelInstanceInitialize: resnet50 (GPU device 0)
I0804 01:46:22.750763 1 model_lifecycle.cc:694] successfully loaded 'resnet50' version 1
I0804 01:46:22.750870 1 server.cc:563]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+I0804 01:46:22.750917 1 server.cc:590]
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                    | Config                                                                                                                                                        |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+I0804 01:46:22.750948 1 server.cc:633]
+----------+---------+--------+
| Model    | Version | Status |
+----------+---------+--------+
| resnet50 | 1       | READY  |
+----------+---------+--------+I0804 01:46:22.810861 1 metrics.cc:864] Collecting metrics for GPU 0: NVIDIA GeForce GTX 1650
I0804 01:46:22.811494 1 metrics.cc:757] Collecting CPU metrics
I0804 01:46:22.811657 1 tritonserver.cc:2264]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                               |
| server_version                   | 2.29.0                                                                                                                                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                              |
| model_control_mode               | MODE_NONE                                                                                                                                                                                            |
| strict_model_config              | 0                                                                                                                                                                                                    |
| rate_limit                       | OFF                                                                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                            |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                             |
| response_cache_byte_size         | 0                                                                                                                                                                                                    |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                  |
| strict_readiness                 | 1                                                                                                                                                                                                    |
| exit_timeout                     | 30                                                                                                                                                                                                   |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+I0804 01:46:22.813086 1 grpc_server.cc:4819] Started GRPCInferenceService at 0.0.0.0:8001
I0804 01:46:22.813243 1 http_server.cc:3477] Started HTTPService at 0.0.0.0:8000
I0804 01:46:22.890915 1 http_server.cc:184] Started Metrics Service at 0.0.0.0:8002
W0804 01:46:23.822499 1 metrics.cc:603] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0804 01:46:24.822769 1 metrics.cc:603] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0804 01:46:25.831221 1 metrics.cc:603] Unable to get power limit for GPU 0. Status:Success, value:0.000000

可以看到resnet50模型已经 READY状态了,但是显卡没有用到,因为上面报警我宿主 机驱动版本和镜像驱动版本不匹配

WARNING: CUDA Minor Version Compatibility mode ENABLED.Using driver version 516.94 which has support for CUDA 11.7.  This containerwas built with CUDA 11.8 and will be run in Minor Version Compatibility mode.CUDA Forward Compatibility is preferred over Minor Version Compatibility for usewith this container but was unavailable:[[]]See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

5、发送推理请求

  • 5.1 首先,创建客户端脚本client.py,放到:
import numpy as np
from torchvision import transforms
from PIL import Image
import tritonclient.http as httpclient
from tritonclient.utils import triton_to_np_dtype# 图片预处理
# preprocessing function
def rn50_preprocess(img_path="img1.jpg"):img = Image.open(img_path)preprocess = transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),])return preprocess(img).numpy()transformed_img = rn50_preprocess()# 设置连接到Triton服务端
# Setting up client
client = httpclient.InferenceServerClient(url="localhost:8000")# 指定resnet50模型的输入和输出
inputs = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
inputs.set_data_from_numpy(transformed_img, binary_data=True)# class_count表示获取 TopK 分类预测结果。如果没有设置这个选项,默认值为0,那么将会得到一个 1000 维的向量。
outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)# 发送一个推理请求到Triton服务端
# Querying the server
results = client.infer(model_name="resnet50", inputs=[inputs], outputs=[outputs])
inference_output = results.as_numpy('output__0')
print(inference_output[:5])
  • 5.2. 进入客户端容器:docker exec -it pytorch200_cu117_dev bash
  • 5.3. 预先下载好,用于推理请求的图片:
    wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
  • 5.4. 执行客户端脚本发送请求:
python client.py
[b'12.474869:90' b'11.527128:92' b'9.659309:14' b'8.408504:136'b'8.216769:11']

输出的格式为<confidence_score>:<classification_index>。

这篇关于triton入门实战的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/908532

相关文章

Spring Boot + MyBatis Plus 高效开发实战从入门到进阶优化(推荐)

《SpringBoot+MyBatisPlus高效开发实战从入门到进阶优化(推荐)》本文将详细介绍SpringBoot+MyBatisPlus的完整开发流程,并深入剖析分页查询、批量操作、动... 目录Spring Boot + MyBATis Plus 高效开发实战:从入门到进阶优化1. MyBatis

MyBatis 动态 SQL 优化之标签的实战与技巧(常见用法)

《MyBatis动态SQL优化之标签的实战与技巧(常见用法)》本文通过详细的示例和实际应用场景,介绍了如何有效利用这些标签来优化MyBatis配置,提升开发效率,确保SQL的高效执行和安全性,感... 目录动态SQL详解一、动态SQL的核心概念1.1 什么是动态SQL?1.2 动态SQL的优点1.3 动态S

Pandas使用SQLite3实战

《Pandas使用SQLite3实战》本文主要介绍了Pandas使用SQLite3实战,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学... 目录1 环境准备2 从 SQLite3VlfrWQzgt 读取数据到 DataFrame基础用法:读

Python实战之屏幕录制功能的实现

《Python实战之屏幕录制功能的实现》屏幕录制,即屏幕捕获,是指将计算机屏幕上的活动记录下来,生成视频文件,本文主要为大家介绍了如何使用Python实现这一功能,希望对大家有所帮助... 目录屏幕录制原理图像捕获音频捕获编码压缩输出保存完整的屏幕录制工具高级功能实时预览增加水印多平台支持屏幕录制原理屏幕

最新Spring Security实战教程之Spring Security安全框架指南

《最新SpringSecurity实战教程之SpringSecurity安全框架指南》SpringSecurity是Spring生态系统中的核心组件,提供认证、授权和防护机制,以保护应用免受各种安... 目录前言什么是Spring Security?同类框架对比Spring Security典型应用场景传统

最新Spring Security实战教程之表单登录定制到处理逻辑的深度改造(最新推荐)

《最新SpringSecurity实战教程之表单登录定制到处理逻辑的深度改造(最新推荐)》本章节介绍了如何通过SpringSecurity实现从配置自定义登录页面、表单登录处理逻辑的配置,并简单模拟... 目录前言改造准备开始登录页改造自定义用户名密码登陆成功失败跳转问题自定义登出前后端分离适配方案结语前言

OpenManus本地部署实战亲测有效完全免费(最新推荐)

《OpenManus本地部署实战亲测有效完全免费(最新推荐)》文章介绍了如何在本地部署OpenManus大语言模型,包括环境搭建、LLM编程接口配置和测试步骤,本文给大家讲解的非常详细,感兴趣的朋友一... 目录1.概况2.环境搭建2.1安装miniconda或者anaconda2.2 LLM编程接口配置2

Python FastAPI入门安装使用

《PythonFastAPI入门安装使用》FastAPI是一个现代、快速的PythonWeb框架,用于构建API,它基于Python3.6+的类型提示特性,使得代码更加简洁且易于绶护,这篇文章主要介... 目录第一节:FastAPI入门一、FastAPI框架介绍什么是ASGI服务(WSGI)二、FastAP

基于Canvas的Html5多时区动态时钟实战代码

《基于Canvas的Html5多时区动态时钟实战代码》:本文主要介绍了如何使用Canvas在HTML5上实现一个多时区动态时钟的web展示,通过Canvas的API,可以绘制出6个不同城市的时钟,并且这些时钟可以动态转动,每个时钟上都会标注出对应的24小时制时间,详细内容请阅读本文,希望能对你有所帮助...

Spring AI与DeepSeek实战一之快速打造智能对话应用

《SpringAI与DeepSeek实战一之快速打造智能对话应用》本文详细介绍了如何通过SpringAI框架集成DeepSeek大模型,实现普通对话和流式对话功能,步骤包括申请API-KEY、项目搭... 目录一、概述二、申请DeepSeek的API-KEY三、项目搭建3.1. 开发环境要求3.2. mav