图片速览 BitNet: 1-bit LLM

2024-03-08 15:28
文章标签 图片 llm 速览 bit bitnet

本文主要是介绍图片速览 BitNet: 1-bit LLM,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

输入数据

  • 模型使用absmax 量化方法进行b比特量化,将输入量化到 [ − Q b , Q b ] ( Q b = 2 b − 1 ) \left[-Q_{b},Q_{b}\right](Q_{b}=2^{b-1}) [Qb,Qb](Qb=2b1)
    x ~ = Q u a n t ( x ) = C l i p ( x × Q b γ , − Q b + ϵ , Q b − ϵ ) , Clip ⁡ ( x , a , b ) = max ⁡ ( a , min ⁡ ( b , x ) ) , γ = ∣ ∣ x ∣ ∣ ∞ , \widetilde{x}=\mathrm{Quant}(x)=\mathrm{Clip}(x\times\frac{Q_b}{\gamma},-Q_b+\epsilon,Q_b-\epsilon),\\ \operatorname{Clip}(x,a,b)=\max(a,\min(b,x)),\quad\gamma=||x||_\infty, x =Quant(x)=Clip(x×γQb,Qb+ϵ,Qbϵ),Clip(x,a,b)=max(a,min(b,x)),γ=∣∣x,

  • 其中 ε 是一个小的浮点数,可防止在执行截断时溢出。

// https://github.com/kyegomez/BitNet/blob/main/bitnet/bitbnet_b158.py
def absmean_quantize_weights(weights):"""Quantizes the weights to -1, 0, or +1 using an absmean quantization function.Parameters:- weights (Tensor): The weights of a neural network layer.Returns:- Tensor: The quantized weights."""# Calculate the average absolute value (γ) of the weightsgamma = torch.mean(torch.abs(weights))# Scale weights by γ and round to the nearest integer among {-1, 0, +1}quantized_weights = torch.clamp(torch.round(weights / gamma), min=-1, max=1)return quantized_weights

权重

  • 权重 W 的二值化可以公式化为:

α = 1 n m ∑ i j W i j W ~ = S i g n ( W − α ) , Sign ⁡ ( W i j ) = { + 1 , if W i j > 0 , − 1 , if W i j ≤ 0 , \\ \alpha=\frac1{nm}\sum_{ij}W_{ij} \\ \widetilde{W}=\mathrm{Sign}(W-\alpha),\\ \left.\operatorname{Sign}(W_{ij})=\left\{\begin{array}{ll}+1,&\quad\text{if}W_{ij}>0,\\-1,&\quad\text{if}W_{ij}\leq0,\end{array}\right.\right. α=nm1ijWijW =Sign(Wα),Sign(Wij)={+1,1,ifWij>0,ifWij0,

在这里插入图片描述

矩阵乘法

  • 使用上述量化方程,矩阵乘法可以写成:

y = W ~ x ~ y=\widetilde W\widetilde{x} y=W x

  • 为了保持量化后的方差,我们在激活量化之前引入了一个 LayerNorm函数。这样,输出 y 的方差就估计为 1

y = W ~ x ~ = W ~ Quant ( LN ( x ) ) × β γ Q b y=\widetilde{W}\widetilde{x}=\widetilde{W}\text{Quant}(\text{LN}(x))\times\frac{\beta\gamma}{Q_b} y=W x =W Quant(LN(x))×Qbβγ
L N ( x ) = x − E ( x ) V a r ( x ) + ϵ , β = 1 n m ∥ W ∥ 1 \mathrm{LN}(x)=\frac{x-E(x)}{\sqrt{\mathrm{Var}(x)+\epsilon}},\quad\beta=\frac1{nm}\|W\|_1 LN(x)=Var(x)+ϵ xE(x),β=nm1W1

在这里插入图片描述

// https://github.com/kyegomez/BitNet/blob/main/bitnet/bitlinear.py
import torch
from torch import Tensor, nnclass BitLinear(nn.Linear):"""BitLinear is a custom linear layer that performs binarization of weights and quantization of activationsin a group-wise manner.Args:in_features (int): Number of input features.out_features (int): Number of output features.bias (bool, optional): If set to False, the layer will not learn an additive bias. Default is True.num_groups (int, optional): Number of groups to divide the weights and activations into. Default is 1."""def __init__(self,in_features: int,out_features: int,bias: bool = True,num_groups: int = 1,b: int = 8,):super().__init__(in_features, out_features, bias)self.in_features = in_featuresself.out_features = out_featuresself.b = bself.num_groups = num_groupsself.eps = 1e-5self.norm = nn.LayerNorm(in_features)def ste(self, x):"""Applies the sign function for binarization and uses Straight-Through Estimator (STE) during backward pass.Args:x (Tensor): Input tensor.Returns:Tensor: Binarized tensor."""binarized_x = torch.sign(x)binarized_x = (binarized_x - x).detach() + xreturn binarized_xdef binarize_weights_groupwise(self):"""Binarizes the weights of the layer in a group-wise manner using STE.Returns:Tensor: Binarized weights tensor."""group_size = self.weight.shape[0] // self.num_groupsbinarized_weights = torch.zeros_like(self.weight)for g in range(self.num_groups):start_idx = g * group_sizeend_idx = (g + 1) * group_sizeweight_group = self.weight[start_idx:end_idx]alpha_g = weight_group.mean()binarized_weights[start_idx:end_idx] = self.ste(weight_group - alpha_g)return binarized_weightsdef quantize_activations_groupwise(self, x):"""Quantizes the activations of the layer in a group-wise manner.Args:x (Tensor): Input tensor.b (int, optional): Number of bits for quantization. Default is 8.Returns:Tensor: Quantized activations tensor."""Q_b = 2 ** (self.b - 1)group_size = x.shape[0] // self.num_groupsquantized_x = torch.zeros_like(x)for g in range(self.num_groups):start_idx = g * group_sizeend_idx = (g + 1) * group_sizeactivation_group = x[start_idx:end_idx]gamma_g = activation_group.abs().max()quantized_x[start_idx:end_idx] = torch.clamp(activation_group * Q_b / (gamma_g + self.eps),-Q_b + self.eps,Q_b - self.eps,)return quantized_xdef dequantize_activations_groupwise(self, x):"""Dequantizes the activations of the layer in a group-wise manner.Args:x (Tensor): Quantized input tensor.b (int, optional): Number of bits used during the quantization. Default is 8.Returns:Tensor: Dequantized activations tensor."""Q_b = 2 ** (self.b - 1)dequantized_x = torch.zeros_like(x)for g in range(self.num_groups):start_idx = g * x.shape[0] // self.num_groupsend_idx = (g + 1) * x.shape[0] // self.num_groupsquantized_group = x[start_idx:end_idx]gamma_g = quantized_group.abs().max()dequantized_x[start_idx:end_idx] = quantized_group * gamma_g / Q_breturn dequantized_xdef forward(self, x: Tensor) -> Tensor:"""Forward pass of the BitLinear layer.Args:x (Tensor): Input tensor.Returns:Tensor: Output tensor."""# Normalize inputx = self.norm(x)# Binarize weights and quantize activationsbinarized_weights = self.binarize_weights_groupwise()# Perform linear transformationoutput = torch.nn.functional.linear(x, binarized_weights, self.bias)# Quantize activationsoutput = self.quantize_activations_groupwise(output)# Dequantize activationsoutput = self.dequantize_activations_groupwise(output)# Return outputreturn output# Example usage
bitlinear = BitLinear(10, 5, num_groups=2, b=8)
input_tensor = torch.randn(5, 10)  # Example input tensor
output = bitlinear(input_tensor)
print(output)  # Example output tensor

CG

  • 【自然语言处理】【大模型】BitNet:用1-bit Transformer训练LLM

  • BitNet: Scaling 1-bit Transformers for Large Language Models

  • The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

  • Implementation of “BitNet: Scaling 1-bit Transformers for Large Language Models” in pytorch

  • DB-LLM: Accurate Dual-Binarization for Efficient LLMs

  • 如何看待微软提出的BitNet b1.58?

这篇关于图片速览 BitNet: 1-bit LLM的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/787539

相关文章

使用opencv优化图片(画面变清晰)

文章目录 需求影响照片清晰度的因素 实现降噪测试代码 锐化空间锐化Unsharp Masking频率域锐化对比测试 对比度增强常用算法对比测试 需求 对图像进行优化,使其看起来更清晰,同时保持尺寸不变,通常涉及到图像处理技术如锐化、降噪、对比度增强等 影响照片清晰度的因素 影响照片清晰度的因素有很多,主要可以从以下几个方面来分析 1. 拍摄设备 相机传感器:相机传

Android 10.0 mtk平板camera2横屏预览旋转90度横屏拍照图片旋转90度功能实现

1.前言 在10.0的系统rom定制化开发中,在进行一些平板等默认横屏的设备开发的过程中,需要在进入camera2的 时候,默认预览图像也是需要横屏显示的,在上一篇已经实现了横屏预览功能,然后发现横屏预览后,拍照保存的图片 依然是竖屏的,所以说同样需要将图片也保存为横屏图标了,所以就需要看下mtk的camera2的相关横屏保存图片功能, 如何实现实现横屏保存图片功能 如图所示: 2.mtk

Spring MVC 图片上传

引入需要的包 <dependency><groupId>commons-logging</groupId><artifactId>commons-logging</artifactId><version>1.1</version></dependency><dependency><groupId>commons-io</groupId><artifactId>commons-

Prompt - 将图片的表格转换成Markdown

Prompt - 将图片的表格转换成Markdown 0. 引言1. 提示词2. 原始版本 0. 引言 最近尝试将图片中的表格转换成Markdown格式,需要不断条件和优化提示词。记录一下调整好的提示词,以后在继续优化迭代。 1. 提示词 英文版本: You are an AI assistant tasked with extracting the content of

研究人员在RSA大会上演示利用恶意JPEG图片入侵企业内网

安全研究人员Marcus Murray在正在旧金山举行的RSA大会上公布了一种利用恶意JPEG图片入侵企业网络内部Windows服务器的新方法。  攻击流程及漏洞分析 最近,安全专家兼渗透测试员Marcus Murray发现了一种利用恶意JPEG图片来攻击Windows服务器的新方法,利用该方法还可以在目标网络中进行特权提升。几天前,在旧金山举行的RSA大会上,该Marcus现场展示了攻击流程,

恶意PNG:隐藏在图片中的“恶魔”

&lt;img src=&quot;https://i-blog.csdnimg.cn/blog_migrate/bffb187dc3546c6c5c6b8aa18b34b962.jpeg&quot; title=&quot;214201hhuuhubsuyuukbfy_meitu_1_meitu_2.jpg&quot;/&gt;&lt;/strong&gt;&lt;/span&gt;&lt;

PHP抓取网站图片脚本

方法一: <?phpheader("Content-type:image/jpeg"); class download_image{function read_url($str) { $file=fopen($str,"r");$result = ''; while(!feof($file)) { $result.=fgets($file,9999); } fclose($file); re

[论文笔记]LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

引言 今天带来第一篇量化论文LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale笔记。 为了简单,下文中以翻译的口吻记录,比如替换"作者"为"我们"。 大语言模型已被广泛采用,但推理时需要大量的GPU内存。我们开发了一种Int8矩阵乘法的过程,用于Transformer中的前馈和注意力投影层,这可以将推理所需

(入门篇)JavaScript 网页设计案例浅析-简单的交互式图片轮播

网页设计已经成为了每个前端开发者的必备技能,而 JavaScript 作为前端三大基础之一,更是为网页赋予了互动性和动态效果。本篇文章将通过一个简单的 JavaScript 案例,带你了解网页设计中的一些常见技巧和技术原理。今天就说一说一个常见的图片轮播效果。相信大家在各类电商网站、个人博客或者展示页面中,都看到过这种轮播图。它的核心功能是展示多张图片,并且用户可以通过点击按钮,左右切换图片。

LLM系列 | 38:解读阿里开源语音多模态模型Qwen2-Audio

引言 模型概述 模型架构 训练方法 性能评估 实战演示 总结 引言 金山挂月窥禅径,沙鸟听经恋法门。 小伙伴们好,我是微信公众号《小窗幽记机器学习》的小编:卖铁观音的小男孩,今天这篇小作文主要是介绍阿里巴巴的语音多模态大模型Qwen2-Audio。近日,阿里巴巴Qwen团队发布了最新的大规模音频-语言模型Qwen2-Audio及其技术报告。该模型在音频理解和多模态交互