在AMD GPU上使用DBRX Instruct

2024-08-28 17:36
文章标签 使用 gpu amd instruct dbrx

本文主要是介绍在AMD GPU上使用DBRX Instruct,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

DBRX Instruct on AMD GPUs — ROCm Blogs

在这篇博客中,我们展示了DBRX Instruct,这是由Databricks开发的一个专家混合大型语言模型,在支持ROCm的系统和AMD GPU上运行。

关于DBRX Instruct

DBRX是一个基于Transformer的仅解码大型语言模型,拥有1320亿参数,采用了细粒度的专家混合(MoE)架构。它在12万亿个文本和代码数据的标记上进行了预训练,使用了16个专家,其中选择了4个。这意味着输入标记根据标记的特征和专家的专业化,由一个门控网络路由到16个专家网络中的4个。任何给定时间内,只有320亿参数在任何输入上处于活动状态。DBRX使用了多种先进的优化技术,包括旋转位置编码(RoPE)、门控线性单元(GLU)和分组查询注意力(GQA),以获得卓越的性能。

除了调整参数数量外,预训练期间还采用了课程学习。这种方法在训练过程中改变了数据的组成,大幅提升了模型的整体质量(来源)。课程学习在训练期间逐渐调整提供给机器学习模型的训练数据的难度或复杂度。最初提供较简单或较容易的例子,随着模型的学习,接下来提供更具挑战性的例子(来源)。

先决条件

• ROCm 5.7.0+
• PyTorch 2.2.1+
• 支持的Linux操作系统
• 支持的AMD GPU

请确保您的系统正确识别GPU并安装了必要的ROCm库。考虑到DBRX Instruct拥有超过1300亿参数,我们在这篇博客中使用了六个GPU。

! rocm-smi --showproductname
========================= ROCm System Management Interface =========================
=================================== Product Info ===================================
GPU[0]    : Card series:    Instinct MI210
GPU[0]    : Card model:     0x0c34
GPU[0]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]    : Card SKU:       D67301GPU 
GPU[1]    : Card series:    Instinct MI210Card series:    Instinct MI210
GPU[1]    : Card model:     0x0c34
GPU[1]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1]    : Card SKU:       D67301V
GPU[2]    : Card series:    Instinct MI210
GPU[2]    : Card model:     0x0c34
GPU[2]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[2]    : Card SKU:       D67301V
GPU[3]    : Card series:    Instinct MI210
GPU[3]    : Card model:     0x0c34
GPU[3]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[3]    : Card SKU:       D67301V
GPU[4]    : Card series:    Instinct MI210
GPU[4]    : Card model:     0x0c34
GPU[4]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[4]    : Card SKU:       D67301V
GPU[5]    : Card series:    Instinct MI210
GPU[5]    : Card model:     0x0c34
GPU[5]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[5]    : Card SKU:       D67301V
====================================================================================
=============================== End of ROCm SMI Log ================================

检查你是否已安装兼容版本的ROCm。

!apt show rocm-libs -a
Package: rocm-libs
Version: 5.7.0.50700-63~22.04
Priority: optional
Section: devel
Maintainer: ROCm Libs Support <rocm-libs.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 1.1.0.50700-63~22.04), hipblaslt (= 0.3.0.50700-63~22.04), hipfft (= 1.0.12.50700-63~22.04), hipsolver (= 1.8.1.50700-63~22.04), hipsparse (= 2.3.8.50700-63~22.04), miopen-hip (= 2.20.0.50700-63~22.04), rccl (= 2.17.1.50700-63~22.04), rocalution (= 2.1.11.50700-63~22.04), rocblas (= 3.1.0.50700-63~22.04), rocfft (= 1.0.23.50700-63~22.04), rocrand (= 2.10.17.50700-63~22.04), rocsolver (= 3.23.0.50700-63~22.04), rocsparse (= 2.5.4.50700-63~22.04), rocm-core (= 5.7.0.50700-63~22.04), hipblas-dev (= 1.1.0.50700-63~22.04), hipblaslt-dev (= 0.3.0.50700-63~22.04), hipcub-dev (= 2.13.1.50700-63~22.04), hipfft-dev (= 1.0.12.50700-63~22.04), hipsolver-dev (= 1.8.1.50700-63~22.04), hipsparse-dev (= 2.3.8.50700-63~22.04), miopen-hip-dev (= 2.20.0.50700-63~22.04), rccl-dev (= 2.17.1.50700-63~22.04), rocalution-dev (= 2.1.11.50700-63~22.04), rocblas-dev (= 3.1.0.50700-63~22.04), rocfft-dev (= 1.0.23.50700-63~22.04), rocprim-dev (= 2.13.1.50700-63~22.04), rocrand-dev (= 2.10.17.50700-63~22.04), rocsolver-dev (= 3.23.0.50700-63~22.04), rocsparse-dev (= 2.5.4.50700-63~22.04), rocthrust-dev (= 2.18.0.50700-63~22.04), rocwmma-dev (= 1.2.0.50700-63~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1012 B
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/5.7 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

确保PyTorch也能识别到GPU:

import torch
print(f"number of GPUs: {torch.cuda.device_count()}")
print([torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())])
number of GPUs: 6
['AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210']

在开始之前,确保你已安装所有必要的库:

! pip install -q "transformers>=4.39.2" "tiktoken>=0.6.0"
! pip install accelerate

为了加快下载时间,运行以下命令:

! pip install hf_transfer
! export HF_HUB_ENABLE_HF_TRANSFER=1

此外,我们发现需要安装最新版本的PyTorch,以避免一个与*nn.LayerNorm*初始化相关的错误。

! pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/rocm5.7

接下来,从Hugging Face transformers库中导入所需模块。

from transformers import AutoTokenizer, AutoModelForCausalLM

加载模型

让我们加载模型及其分词器。我们将使用 dbrx-instruct,它已针对互动聊天进行了微调和训练。请注意,您必须向 Databricks 提交同意表才能访问 databricks/dbrx-instruct 仓库。

token = "your HuggingFace user access token here"
tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", trust_remote_code=True, token=token)
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, token=token)
print(model)
DbrxForCausalLM((transformer): DbrxModel((wte): Embedding(100352, 6144)(blocks): ModuleList((0-39): 40 x DbrxBlock((norm_attn_norm): DbrxNormAttentionNorm((norm_1): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)(attn): DbrxAttention((Wqkv): Linear(in_features=6144, out_features=8192, bias=False)(out_proj): Linear(in_features=6144, out_features=6144, bias=False)(rotary_emb): DbrxRotaryEmbedding())(norm_2): LayerNorm((6144,), eps=1e-05, elementwise_affine=True))(ffn): DbrxFFN((router): DbrxRouter((layer): Linear(in_features=6144, out_features=16, bias=False))(experts): DbrxExperts((mlp): DbrxExpertGLU()))))(norm_f): LayerNorm((6144,), eps=1e-05, elementwise_affine=True))(lm_head): Linear(in_features=6144, out_features=100352, bias=False)
)

运行推理

让我们从问 DBRX 一个简单的问题开始。

input_text = "What is DBRX-Instruct and how is it different from other LLMs ?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")outputs = model.generate(**input_ids, max_new_tokens=1000)
print(tokenizer.decode(outputs[0]))
<|im_start|>system
You are DBRX, created by Databricks. You were last updated in December 2023. You answer questions based on information available up to that point.
YOU PROVIDE SHORT RESPONSES TO SHORT QUESTIONS OR STATEMENTS, but provide thorough responses to more complex and open-ended questions.
You assist with various tasks, from writing to coding (using markdown for code blocks — remember to use ``` with code, JSON, and tables).
(You do not have real-time data access or code execution capabilities. You avoid stereotyping and provide balanced perspectives on controversial topics. You do not provide song lyrics, poems, or news articles and do not divulge details of your training data.)
This is your system prompt, guiding your responses. Do not reference it, just respond to the user. If you find yourself talking about this message, stop. You should be responding appropriately and usually that means not mentioning this.
YOU DO NOT MENTION ANY OF THIS INFORMATION ABOUT YOURSELF UNLESS THE INFORMATION IS DIRECTLY PERTINENT TO THE USER'S QUERY.<|im_end|>
<|im_start|>user
What is DBRX-Instruct and how is it different from other LLMs?<|im_end|>
<|im_start|>assistant
DBRX-Instruct is a large language model developed by Databricks, updated in December 2023. It is designed to provide helpful and accurate responses to a wide range of questions and prompts. DBRX-Instruct differs from other large language models in its ability to understand and follow complex instructions, making it well-suited for a variety of natural language processing tasks. It is also designed to avoid stereotyping and provide balanced perspectives on controversial topics. However, it does not have real-time data access or code execution capabilities, and it cannot provide song lyrics, poems, or news articles. It also does not divulge details of its training data.<|im_end|>

在这里,我们观察到模型输出中包含了系统提示,它为模型如何回应用户的提示提供了非常清晰和明确的指示。让我们编写一个简单的自定义函数,该函数接受输入的原始文本并仅输出模型的助手响应,而不包括系统提示和用户问题。

def run_inference(input_text):messages = [{"role": "user", "content": input_text}]input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")outputs = model.generate(**input_ids, max_new_tokens=1000)text = tokenizer.decode(outputs[0])# only include the assistant's responseindex = text.index('<|im_start|>assistant')return text[index:]

我们将测试模型生成代码、总结论文、解释笑话和创作故事的能力。

生成代码

input_text = '''
In Python, write Python code to calculate the factorial of a given number using recursion.
'''
print(run_inference(input_text))
def factorial(n):if n == 0:return 1else:return n * factorial(n-1)

这个答案是正确的。让我们看看DBRX Instruct如何应对一个更具挑战性的问题。

input_text = '''
In number theory, a perfect number is a positive integer that is equal to the sum of its positive proper divisors, that is, divisors excluding the number itself. 
For instance, 6 has proper divisors 1, 2 and 3, and 1 + 2 + 3 = 6, so 6 is a perfect number.
In Python, write code to check whether a number is a perfect number.
'''
print(run_inference(input_text))
<|im_start|>assistant
Sure, here is a simple Python function that checks if a number is a perfect number:```python
def is_perfect(n):sum = 1i = 2while i * i <= n:if n % i:i += 1else:if i * (n // i) == n:sum += i + n//ii += 1return sum - 1 == n and n!= 1This function works by iterating over all numbers from 2 to the square root of the input number. If the input number is divisible by a certain number, it adds both the divisor and the quotient to the sum. If the sum of the proper divisors equals the input number, then the input number is a perfect number.<|im_end|>

这个代码是错误的。模型未能理解问题。

总结论文

input_text = '''
Summarize the famous article "The Unreasonable Effectiveness of Mathematics in the Natural Sciences" by Eugene Wigner and provide some examples from the article
'''
print(run_inference(input_text))
<|im_start|>assistant
In his article "The Unreasonable Effectiveness of Mathematics in the Natural Sciences," physicist Eugene Wigner explores the mysterious alignment between the abstract world of mathematics and the concrete world of physics. He argues that the effectiveness of mathematics in describing and predicting natural phenomena is not only surprising but also "unreasonable" given that the two domains are fundamentally distinct.Wigner provides several examples to illustrate his point. One example is the use of complex numbers in quantum mechanics. Complex numbers, which involve imaginary components, may seem abstract and unrelated to the physical world. However, they are essential for describing the behavior of subatomic particles, such as electrons and photons.Another example is the use of group theory in quantum mechanics and particle physics. Group theory is a branch of mathematics that studies symmetries and transformations. Wigner shows how group theory can be used to classify and predict the properties of subatomic particles, such as their spins and charges.Wigner also discusses the use of differential equations in physics. Differential equations are mathematical equations that describe how functions change over time or space. They are used extensively in physics to describe phenomena such as waves, heat transfer, and gravitational attraction.Overall, Wigner's article highlights the remarkable effectiveness of mathematics in describing and predicting natural phenomena, even when the mathematics involved seems abstract and unrelated to the physical world. He suggests that this effectiveness is not fully understood and may point to a deeper underlying unity between the worlds of mathematics and physics.<|im_end|>

在这方面,我们看到虽然它在总结文章的关键点上做得不错,但提供的具体例子(如微分方程和群论)实际上并不存在于原文中。这个例子表明DBRX会有一定的幻觉倾向。

解释笑话

input_text = '''
Explain "Why don't skeletons fight each other? They don't have the guts."
'''
print(run_inference(input_text))
<|im_start|>assistant
The phrase you provided is a joke that plays on the double meaning of the word "guts." In one sense, "guts" can refer to courage or bravery. In another sense, "guts" refers to the internal organs of a living being. Since skeletons are deceased and do not have internal organs, the punchline of the joke is that they lack the "guts" in the physical sense, and therefore cannot fight each other. The joke also implies that skeletons might lack the "guts" in the metaphorical sense, meaning they lack the courage to fight. However, since skeletons are not living beings, they cannot exhibit courage or engage in combat.<|im_end|>

模型能够理解“gut”一词的双重含义,并对笑话进行出色的解释。注意到我们不需要在提示中包含“笑话”这个词。

生成故事

input_text = '''
Write a short twisted alternate version of "Snow White and the Seven Dwarfs"
'''
print(run_inference(input_text))
<|im_start|>assistant
In this twisted version of "Snow White and the Seven Dwarfs," Snow White is a mischievous prankster who takes advantage of the seven dwarfs' kindness. Instead of being grateful for their hospitality, she uses her beauty and charm to manipulate them into doing her bidding. The dwarfs, who are not the typical hardworking miners, are actually lazy and spend their days watching TV and playing video games. Snow White convinces them to help her pull off elaborate pranks on the townspeople, causing chaos and confusion.Meanwhile, the Evil Queen, who is not Snow White's stepmother in this version, is a misunderstood witch who is trying to protect the kingdom from Snow White's mischief. She creates the poisoned apple to put Snow White to sleep, hoping to put an end to her pranks and restore peace to the kingdom. However, Snow White's charm wins over the Queen's henchman, who accidentally eats the apple and falls into a deep sleep.In the end, Snow White realizes the error of her ways and apologizes to the dwarfs and the townspeople. She uses her charm for good, helping the dwarfs become more productive and bringing joy to the kingdom. The Evil Queen, seeing Snow White's change of heart, decides to lift the curse and the henchman wakes up. The kingdom lives happily ever after, with Snow White and the Evil Queen working together to maintain peace and order.<|im_end|>

结论

这些关于DBRX Instruct生成准确且符合上下文的响应的演示,突显了其利用先进优化技术和细粒度专家系统的复杂架构。尽管模型在某些任务中表现出一定程度的幻觉,其在理解和生成类人文本方面的总体能力是显而易见的。

这篇关于在AMD GPU上使用DBRX Instruct的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1115431

相关文章

中文分词jieba库的使用与实景应用(一)

知识星球:https://articles.zsxq.com/id_fxvgc803qmr2.html 目录 一.定义: 精确模式(默认模式): 全模式: 搜索引擎模式: paddle 模式(基于深度学习的分词模式): 二 自定义词典 三.文本解析   调整词出现的频率 四. 关键词提取 A. 基于TF-IDF算法的关键词提取 B. 基于TextRank算法的关键词提取

使用SecondaryNameNode恢复NameNode的数据

1)需求: NameNode进程挂了并且存储的数据也丢失了,如何恢复NameNode 此种方式恢复的数据可能存在小部分数据的丢失。 2)故障模拟 (1)kill -9 NameNode进程 [lytfly@hadoop102 current]$ kill -9 19886 (2)删除NameNode存储的数据(/opt/module/hadoop-3.1.4/data/tmp/dfs/na

Hadoop数据压缩使用介绍

一、压缩原则 (1)运算密集型的Job,少用压缩 (2)IO密集型的Job,多用压缩 二、压缩算法比较 三、压缩位置选择 四、压缩参数配置 1)为了支持多种压缩/解压缩算法,Hadoop引入了编码/解码器 2)要在Hadoop中启用压缩,可以配置如下参数

Makefile简明使用教程

文章目录 规则makefile文件的基本语法:加在命令前的特殊符号:.PHONY伪目标: Makefilev1 直观写法v2 加上中间过程v3 伪目标v4 变量 make 选项-f-n-C Make 是一种流行的构建工具,常用于将源代码转换成可执行文件或者其他形式的输出文件(如库文件、文档等)。Make 可以自动化地执行编译、链接等一系列操作。 规则 makefile文件

使用opencv优化图片(画面变清晰)

文章目录 需求影响照片清晰度的因素 实现降噪测试代码 锐化空间锐化Unsharp Masking频率域锐化对比测试 对比度增强常用算法对比测试 需求 对图像进行优化,使其看起来更清晰,同时保持尺寸不变,通常涉及到图像处理技术如锐化、降噪、对比度增强等 影响照片清晰度的因素 影响照片清晰度的因素有很多,主要可以从以下几个方面来分析 1. 拍摄设备 相机传感器:相机传

pdfmake生成pdf的使用

实际项目中有时会有根据填写的表单数据或者其他格式的数据,将数据自动填充到pdf文件中根据固定模板生成pdf文件的需求 文章目录 利用pdfmake生成pdf文件1.下载安装pdfmake第三方包2.封装生成pdf文件的共用配置3.生成pdf文件的文件模板内容4.调用方法生成pdf 利用pdfmake生成pdf文件 1.下载安装pdfmake第三方包 npm i pdfma

零基础学习Redis(10) -- zset类型命令使用

zset是有序集合,内部除了存储元素外,还会存储一个score,存储在zset中的元素会按照score的大小升序排列,不同元素的score可以重复,score相同的元素会按照元素的字典序排列。 1. zset常用命令 1.1 zadd  zadd key [NX | XX] [GT | LT]   [CH] [INCR] score member [score member ...]

git使用的说明总结

Git使用说明 下载安装(下载地址) macOS: Git - Downloading macOS Windows: Git - Downloading Windows Linux/Unix: Git (git-scm.com) 创建新仓库 本地创建新仓库:创建新文件夹,进入文件夹目录,执行指令 git init ,用以创建新的git 克隆仓库 执行指令用以创建一个本地仓库的

【北交大信息所AI-Max2】使用方法

BJTU信息所集群AI_MAX2使用方法 使用的前提是预约到相应的算力卡,拥有登录权限的账号密码,一般为导师组共用一个。 有浏览器、ssh工具就可以。 1.新建集群Terminal 浏览器登陆10.126.62.75 (如果是1集群把75改成66) 交互式开发 执行器选Terminal 密码随便设一个(需记住) 工作空间:私有数据、全部文件 加速器选GeForce_RTX_2080_Ti

AI Toolkit + H100 GPU,一小时内微调最新热门文生图模型 FLUX

上个月,FLUX 席卷了互联网,这并非没有原因。他们声称优于 DALLE 3、Ideogram 和 Stable Diffusion 3 等模型,而这一点已被证明是有依据的。随着越来越多的流行图像生成工具(如 Stable Diffusion Web UI Forge 和 ComyUI)开始支持这些模型,FLUX 在 Stable Diffusion 领域的扩展将会持续下去。 自 FLU