【图像分类】2022-UniFormer IEEE

2024-03-27 22:30

本文主要是介绍【图像分类】2022-UniFormer IEEE,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

文章目录

  • 2022-UniFormer IEEE
    • 1. 简介
      • 1.1 动机
    • 2. 网络架构
      • 2.1 整体架构
      • 2.2 UniFormer block
      • 2.3 MHRA
        • 2.3.1 Local MHRA
        • 2.3.2 Global MHRA
      • 2.4 Dynamic Position Embedding
    • 3. 代码

2022-UniFormer IEEE

论文题目:UniFormer: Unifying Convolution and Self-attention for Visual Recognition

论文链接: https://arxiv.org/abs/2201.09450

论文代码:https://github.com/sense-x/uniformer‘

发表时间:2022年1月

引用:Li K, Wang Y, Zhang J, et al. Uniformer: Unifying convolution and self-attention for visual recognition[J]. arXiv preprint arXiv:2201.09450, 2022.

引用数:24

网络翻译:https://blog.csdn.net/weixin_45782047/article/details/123952524

1. 简介

1.1 动机

对image和video上的representation learning而言,有两大痛点:

  • local redundancy: 视觉数据在局部空间/时间/时空邻域具有相似性,这种局部性质容易引入大量低效的计算。
  • global dependency: 要实现准确的识别,需要动态地将不同区域中的目标关联,建模长时依赖。

现有的两大主流模型CNN和ViT,往往只关注解决问题之一。convolution只在局部小邻域聚合上下文,天然地避免了冗余的全局计算,但受限的感受野难以建模全局依赖。而self-attention通过比较全局相似度,自然将长距离目标关联,但如下可视化可以发现,ViT在浅层编码局部特征十分低效。

image-20221014191834919

DeiT可视化,可以看到即便是经过了三层的self-attention,输出特征仍保留了较多的局部细节。我们任选一个token作为query,可视化attention矩阵可以发现,被关注的token集中在3x3邻域中(红色越深关注越多)。

image-20221014191855063

TimeSformer可视化,同样可以看到即便是经过了三层的self-attention,输出的每一帧特征仍保留了较多的局部细节。我们任选一个token作为query,可视化spatial attention和temporal attention矩阵都可以发现,被关注的token都只在局部邻域中(红色越深关注越多)。

无论是spatial attention抑或temporal attention,在ViT的浅层,都仅会倾向于关注query token的邻近token。要知道attention矩阵是通过全局token相似度计算得到的,这无疑带来了大量不必要的计算。相较而言,convolution在提取这些浅层特征时,无论是在效果上还是计算量上都具有显著的优势。那么为何不针对网络不同层特征的差异,设计不同的特征学习算子,将convolution和self-attention有机地结合物尽其用呢?

2. 网络架构

2.1 整体架构

image-20221014192137918

2.2 UniFormer block

模型整体框架如上所示,借鉴了CNN的层次化设计,每层包含多个Transformer风格的UniFormer block。
X = DPE ⁡ ( X i n ) + X i n , Y = MHRA ⁡ ( Norm ⁡ ( X ) ) + X , Z = FFN ⁡ ( Norm ⁡ ( Y ) ) + Y . \begin{array}{l} \mathbf{X}=\operatorname{DPE}\left(\mathbf{X}_{i n}\right)+\mathbf{X}_{i n}, \\ \mathbf{Y}=\operatorname{MHRA}(\operatorname{Norm}(\mathbf{X}))+\mathbf{X}, \\ \mathbf{Z}=\operatorname{FFN}(\operatorname{Norm}(\mathbf{Y}))+\mathbf{Y} . \end{array} X=DPE(Xin)+Xin,Y=MHRA(Norm(X))+X,Z=FFN(Norm(Y))+Y.
每个UniFormer block主要由三部分组成,动态位置编码DPE多头关系聚合器MHRA及Transformer必备的前馈层FFN

2.3 MHRA

其中最关键的为多头关系聚合器:
R n ( X ) = A n V n ( X ) MHRA ⁡ ( X ) = Concat ⁡ ( R 1 ( X ) ; R 2 ( X ) ; ⋯ ; R N ( X ) ) U \begin{aligned} \mathrm{R}_{n}(\mathbf{X}) &=\mathbf{A}_{n} \mathrm{~V}_{n}(\mathbf{X}) \\ \operatorname{MHRA}(\mathbf{X}) &=\operatorname{Concat}\left(\mathrm{R}_{1}(\mathbf{X}) ; \mathrm{R}_{2}(\mathbf{X}) ; \cdots ; \mathrm{R}_{N}(\mathbf{X})\right) \mathbf{U} \end{aligned} Rn(X)MHRA(X)=An Vn(X)=Concat(R1(X);R2(X);;RN(X))U
与多头注意力相似,我们将关系聚合器设计为多头风格,每个头单独处理一组channel的信息。每组的channel先通过线性变换生成上下文token V n ( X ) V_n(X) Vn(X),然后在token affinity A n ( X ) A_n(X) An(X)的作用下,对上下文进行有机聚合。

2.3.1 Local MHRA

基于前面的可视化观察,我们认为在网络的浅层,token affinity应该仅关注局部邻域上下文,这与convolution的设计不谋而合。因此,我们将局部关系聚合 A n l o c a l A_n^{local} Anlocal设计为可学的参数矩阵:

A n local  ( X i , X j ) = a n i − j , where  j ∈ Ω i t × h × w ,  \mathrm{A}_{n}^{\text {local }}\left(\mathbf{X}_{i}, \mathbf{X}_{j}\right)=a_{n}^{i-j} \text {, where } j \in \Omega_{i}^{t \times h \times w} \text {, } Anlocal (Xi,Xj)=anij, where jΩit×h×w
其中 X i X_i Xi为anchor token, X j X_j Xj为局部邻域 Ω i t × h × w \Omega_{i}^{t \times h \times w} Ωit×h×w任一token, a n a_n an为可学参数矩阵, i − j i-j ij为二者相对位置,表明token affinity的值只与相对位置有关。这样我们的local UniFormer block实际是MobileNet block的风格相似,都是PWConv-DWConv-PWConv(见原论文解析),不同的是我们引入了额外的位置编码以及前馈层,这种特别的结合形式有效地增强了token的特征表达。

说的这么花里胡哨的其实就是使用了稍微大一点的卷积代替(5x5)实现局部关系

2.3.2 Global MHRA

网络的深层,我们需要对整个特征空间建立长时关系,这与self-attention的思想一致,因此我们通过比较全局上下文相似度建立token affinity:
A n global  ( X i , X j ) = e Q n ( X i ) T K n ( X j ) ∑ j ′ ∈ Ω T × H × W e Q n ( X i ) T K n ( X j ′ ) , \mathrm{A}_{n}^{\text {global }}\left(\mathbf{X}_{i}, \mathbf{X}_{j}\right)=\frac{e^{Q_{n}\left(\mathbf{X}_{i}\right)^{T} K_{n}\left(\mathbf{X}_{j}\right)}}{\sum_{j^{\prime} \in \Omega_{T \times H \times W}} e^{Q_{n}\left(\mathbf{X}_{i}\right)^{T} K_{n}\left(\mathbf{X}_{j^{\prime}}\right)}}, Anglobal (Xi,Xj)=jΩT×H×WeQn(Xi)TKn(Xj)eQn(Xi)TKn(Xj),
其中 Q n ( ⋅ ) , K n ( ⋅ ) Q_n(\cdot),K_n(\cdot) Qn(),Kn()为不同的线性变换。先前的video transformer,往往采用时空分离的注意力机制,以减少video输入带来的过量点积运算,但这种分离的操作无疑割裂了token的时空关联。相反,我们的UniFormer在网络的浅层采用local MHRA节省了冗余计算量,使得网络在深层可以轻松使用联合时空注意力,从而可以得到更具辨别性的video特征表达。

在网络的stage3和stage4,使用self-attention的结构,并进行了一些放缩。

2.4 Dynamic Position Embedding

流行的ViT往往采用绝对或者相对位置编码,但绝对位置编码在面对更大分辨率的输入时,需要进行线性插值以及额外的参数微调,而相对位置编码对self-attention的形式进行了修改。为了适配不同分辨率输入的需要,我们采用了最近流行的卷积位置编码设计动态位置编码:

DPE ⁡ ( X i n ) = DWConv ⁡ ( X i n ) , \operatorname{DPE}\left(\mathbf{X}_{i n}\right)=\operatorname{DWConv}\left(\mathbf{X}_{i n}\right), DPE(Xin)=DWConv(Xin),
其中DWConv为零填充的的深度可分离卷积。一方面,卷积对任何输入形式都很友好,也很容易拓展到空间维度统一编码时空位置信息。另一方面,深度可分离卷积十分轻量,额外的零填充可以帮助每个token确定自己的绝对位置。

3. 代码

# Copyright (c) 2015-present, Facebook, Inc.
# All rights reserved.
from collections import OrderedDict
import torch
import torch.nn as nn
from functools import partial
import torch.nn.functional as F
import math
from timm.models.vision_transformer import _cfg
from timm.models.registry import register_model
from timm.models.layers import trunc_normal_, DropPath, to_2tuplelayer_scale = False
init_value = 1e-6class Mlp(nn.Module):def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):super().__init__()out_features = out_features or in_featureshidden_features = hidden_features or in_featuresself.fc1 = nn.Linear(in_features, hidden_features)self.act = act_layer()self.fc2 = nn.Linear(hidden_features, out_features)self.drop = nn.Dropout(drop)def forward(self, x):x = self.fc1(x)x = self.act(x)x = self.drop(x)x = self.fc2(x)x = self.drop(x)return xclass CMlp(nn.Module):def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):super().__init__()out_features = out_features or in_featureshidden_features = hidden_features or in_featuresself.fc1 = nn.Conv2d(in_features, hidden_features, 1)self.act = act_layer()self.fc2 = nn.Conv2d(hidden_features, out_features, 1)self.drop = nn.Dropout(drop)def forward(self, x):x = self.fc1(x)x = self.act(x)x = self.drop(x)x = self.fc2(x)x = self.drop(x)return xclass Attention(nn.Module):def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0.):super().__init__()self.num_heads = num_headshead_dim = dim // num_heads# NOTE scale factor was wrong in my original version, can set manually to be compat with prev weightsself.scale = qk_scale or head_dim ** -0.5self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)self.attn_drop = nn.Dropout(attn_drop)self.proj = nn.Linear(dim, dim)self.proj_drop = nn.Dropout(proj_drop)def forward(self, x):B, N, C = x.shapeqkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)q, k, v = qkv[0], qkv[1], qkv[2]   # make torchscript happy (cannot use tensor as tuple)attn = (q @ k.transpose(-2, -1)) * self.scaleattn = attn.softmax(dim=-1)attn = self.attn_drop(attn)x = (attn @ v).transpose(1, 2).reshape(B, N, C)x = self.proj(x)x = self.proj_drop(x)return xclass CBlock(nn.Module):def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):super().__init__()self.pos_embed = nn.Conv2d(dim, dim, 3, padding=1, groups=dim)self.norm1 = nn.BatchNorm2d(dim)self.conv1 = nn.Conv2d(dim, dim, 1)self.conv2 = nn.Conv2d(dim, dim, 1)self.attn = nn.Conv2d(dim, dim, 5, padding=2, groups=dim)# NOTE: drop path for stochastic depth, we shall see if this is better than dropout hereself.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()self.norm2 = nn.BatchNorm2d(dim)mlp_hidden_dim = int(dim * mlp_ratio)self.mlp = CMlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)def forward(self, x):x = x + self.pos_embed(x)x = x + self.drop_path(self.conv2(self.attn(self.conv1(self.norm1(x)))))x = x + self.drop_path(self.mlp(self.norm2(x)))return xclass SABlock(nn.Module):def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):super().__init__()self.pos_embed = nn.Conv2d(dim, dim, 3, padding=1, groups=dim)self.norm1 = norm_layer(dim)self.attn = Attention(dim,num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,attn_drop=attn_drop, proj_drop=drop)# NOTE: drop path for stochastic depth, we shall see if this is better than dropout hereself.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()self.norm2 = norm_layer(dim)mlp_hidden_dim = int(dim * mlp_ratio)self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)global layer_scaleself.ls = layer_scaleif self.ls:global init_valueprint(f"Use layer_scale: {layer_scale}, init_values: {init_value}")self.gamma_1 = nn.Parameter(init_value * torch.ones((dim)),requires_grad=True)self.gamma_2 = nn.Parameter(init_value * torch.ones((dim)),requires_grad=True)def forward(self, x):x = x + self.pos_embed(x)B, N, H, W = x.shapex = x.flatten(2).transpose(1, 2)if self.ls:x = x + self.drop_path(self.gamma_1 * self.attn(self.norm1(x)))x = x + self.drop_path(self.gamma_2 * self.mlp(self.norm2(x)))else:x = x + self.drop_path(self.attn(self.norm1(x)))x = x + self.drop_path(self.mlp(self.norm2(x)))x = x.transpose(1, 2).reshape(B, N, H, W)return x        class head_embedding(nn.Module):def __init__(self, in_channels, out_channels):super(head_embedding, self).__init__()self.proj = nn.Sequential(nn.Conv2d(in_channels, out_channels // 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),nn.BatchNorm2d(out_channels // 2),nn.GELU(),nn.Conv2d(out_channels // 2, out_channels, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),nn.BatchNorm2d(out_channels),)def forward(self, x):x = self.proj(x)return xclass middle_embedding(nn.Module):def __init__(self, in_channels, out_channels):super(middle_embedding, self).__init__()self.proj = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),nn.BatchNorm2d(out_channels),)def forward(self, x):x = self.proj(x)return xclass PatchEmbed(nn.Module):""" Image to Patch Embedding"""def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768):super().__init__()img_size = to_2tuple(img_size)patch_size = to_2tuple(patch_size)num_patches = (img_size[1] // patch_size[1]) * (img_size[0] // patch_size[0])self.img_size = img_sizeself.patch_size = patch_sizeself.num_patches = num_patchesself.norm = nn.LayerNorm(embed_dim)self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)def forward(self, x):B, C, H, W = x.shape# FIXME look at relaxing size constraintsassert H == self.img_size[0] and W == self.img_size[1], \f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."x = self.proj(x)B, C, H, W = x.shapex = x.flatten(2).transpose(1, 2)x = self.norm(x)x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()return xclass UniFormer(nn.Module):""" Vision TransformerA PyTorch impl of : `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale`  -https://arxiv.org/abs/2010.11929"""def __init__(self, depth=[3, 4, 8, 3], img_size=224, in_chans=3, num_classes=1000, embed_dim=[64, 128, 320, 512],head_dim=64, mlp_ratio=4., qkv_bias=True, qk_scale=None, representation_size=None,drop_rate=0., attn_drop_rate=0., drop_path_rate=0., norm_layer=None, conv_stem=False):"""Args:depth (list): depth of each stageimg_size (int, tuple): input image sizein_chans (int): number of input channelsnum_classes (int): number of classes for classification headembed_dim (list): embedding dimension of each stagehead_dim (int): head dimensionmlp_ratio (int): ratio of mlp hidden dim to embedding dimqkv_bias (bool): enable bias for qkv if Trueqk_scale (float): override default qk scale of head_dim ** -0.5 if setrepresentation_size (Optional[int]): enable and set representation layer (pre-logits) to this value if setdrop_rate (float): dropout rateattn_drop_rate (float): attention dropout ratedrop_path_rate (float): stochastic depth ratenorm_layer (nn.Module): normalization layerconv_stem (bool): whether use overlapped patch stem"""super().__init__()self.num_classes = num_classesself.num_features = self.embed_dim = embed_dim  # num_features for consistency with other modelsnorm_layer = norm_layer or partial(nn.LayerNorm, eps=1e-6) if conv_stem:self.patch_embed1 = head_embedding(in_channels=in_chans, out_channels=embed_dim[0])self.patch_embed2 = middle_embedding(in_channels=embed_dim[0], out_channels=embed_dim[1])self.patch_embed3 = middle_embedding(in_channels=embed_dim[1], out_channels=embed_dim[2])self.patch_embed4 = middle_embedding(in_channels=embed_dim[2], out_channels=embed_dim[3])else:self.patch_embed1 = PatchEmbed(img_size=img_size, patch_size=4, in_chans=in_chans, embed_dim=embed_dim[0])self.patch_embed2 = PatchEmbed(img_size=img_size // 4, patch_size=2, in_chans=embed_dim[0], embed_dim=embed_dim[1])self.patch_embed3 = PatchEmbed(img_size=img_size // 8, patch_size=2, in_chans=embed_dim[1], embed_dim=embed_dim[2])self.patch_embed4 = PatchEmbed(img_size=img_size // 16, patch_size=2, in_chans=embed_dim[2], embed_dim=embed_dim[3])self.pos_drop = nn.Dropout(p=drop_rate)dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depth))]  # stochastic depth decay rulenum_heads = [dim // head_dim for dim in embed_dim]self.blocks1 = nn.ModuleList([CBlock(dim=embed_dim[0], num_heads=num_heads[0], mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer)for i in range(depth[0])])self.blocks2 = nn.ModuleList([CBlock(dim=embed_dim[1], num_heads=num_heads[1], mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i+depth[0]], norm_layer=norm_layer)for i in range(depth[1])])self.blocks3 = nn.ModuleList([SABlock(dim=embed_dim[2], num_heads=num_heads[2], mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i+depth[0]+depth[1]], norm_layer=norm_layer)for i in range(depth[2])])self.blocks4 = nn.ModuleList([SABlock(dim=embed_dim[3], num_heads=num_heads[3], mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i+depth[0]+depth[1]+depth[2]], norm_layer=norm_layer)for i in range(depth[3])])self.norm = nn.BatchNorm2d(embed_dim[-1])# Representation layerif representation_size:self.num_features = representation_sizeself.pre_logits = nn.Sequential(OrderedDict([('fc', nn.Linear(embed_dim, representation_size)),('act', nn.Tanh())]))else:self.pre_logits = nn.Identity()# Classifier headself.head = nn.Linear(embed_dim[-1], num_classes) if num_classes > 0 else nn.Identity()self.apply(self._init_weights)def _init_weights(self, m):if isinstance(m, nn.Linear):trunc_normal_(m.weight, std=.02)if isinstance(m, nn.Linear) and m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, nn.LayerNorm):nn.init.constant_(m.bias, 0)nn.init.constant_(m.weight, 1.0)@torch.jit.ignoredef no_weight_decay(self):return {'pos_embed', 'cls_token'}def get_classifier(self):return self.headdef reset_classifier(self, num_classes, global_pool=''):self.num_classes = num_classesself.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()def forward_features(self, x):x = self.patch_embed1(x)x = self.pos_drop(x)for blk in self.blocks1:x = blk(x)x = self.patch_embed2(x)for blk in self.blocks2:x = blk(x)x = self.patch_embed3(x)for blk in self.blocks3:x = blk(x)x = self.patch_embed4(x)for blk in self.blocks4:x = blk(x)x = self.norm(x)x = self.pre_logits(x)return xdef forward(self, x):x = self.forward_features(x)x = x.flatten(2).mean(-1)x = self.head(x)return x@register_model
def uniformer_small(pretrained=True, **kwargs):model = UniFormer(depth=[3, 4, 8, 3],embed_dim=[64, 128, 320, 512], head_dim=64, mlp_ratio=4, qkv_bias=True,norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)model.default_cfg = _cfg()return model@register_model
def uniformer_small_plus(pretrained=True, **kwargs):model = UniFormer(depth=[3, 5, 9, 3], conv_stem=True,embed_dim=[64, 128, 320, 512], head_dim=32, mlp_ratio=4, qkv_bias=True,norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)model.default_cfg = _cfg()return model@register_model
def uniformer_small_plus_dim64(pretrained=True, **kwargs):model = UniFormer(depth=[3, 5, 9, 3], conv_stem=True,embed_dim=[64, 128, 320, 512], head_dim=64, mlp_ratio=4, qkv_bias=True,norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)model.default_cfg = _cfg()return model@register_model
def uniformer_base(pretrained=True, **kwargs):model = UniFormer(depth=[5, 8, 20, 7],embed_dim=[64, 128, 320, 512], head_dim=64, mlp_ratio=4, qkv_bias=True,norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)model.default_cfg = _cfg()return model@register_model
def uniformer_base_ls(pretrained=True, **kwargs):global layer_scalelayer_scale = Truemodel = UniFormer(depth=[5, 8, 20, 7],embed_dim=[64, 128, 320, 512], head_dim=64, mlp_ratio=4, qkv_bias=True,norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)model.default_cfg = _cfg()return model

参考资料

ICLR2022高分论文 UniFormer 卷积与自注意力的高效统一 - 知乎 (zhihu.com)

论文阅读|Uniformer_xiaoweiyuya的博客-CSDN博客

这篇关于【图像分类】2022-UniFormer IEEE的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/853595

相关文章

使用Python开发一个图像标注与OCR识别工具

《使用Python开发一个图像标注与OCR识别工具》:本文主要介绍一个使用Python开发的工具,允许用户在图像上进行矩形标注,使用OCR对标注区域进行文本识别,并将结果保存为Excel文件,感兴... 目录项目简介1. 图像加载与显示2. 矩形标注3. OCR识别4. 标注的保存与加载5. 裁剪与重置图像

C#使用DeepSeek API实现自然语言处理,文本分类和情感分析

《C#使用DeepSeekAPI实现自然语言处理,文本分类和情感分析》在C#中使用DeepSeekAPI可以实现多种功能,例如自然语言处理、文本分类、情感分析等,本文主要为大家介绍了具体实现步骤,... 目录准备工作文本生成文本分类问答系统代码生成翻译功能文本摘要文本校对图像描述生成总结在C#中使用Deep

基于WinForm+Halcon实现图像缩放与交互功能

《基于WinForm+Halcon实现图像缩放与交互功能》本文主要讲述在WinForm中结合Halcon实现图像缩放、平移及实时显示灰度值等交互功能,包括初始化窗口的不同方式,以及通过特定事件添加相应... 目录前言初始化窗口添加图像缩放功能添加图像平移功能添加实时显示灰度值功能示例代码总结最后前言本文将

基于人工智能的图像分类系统

目录 引言项目背景环境准备 硬件要求软件安装与配置系统设计 系统架构关键技术代码示例 数据预处理模型训练模型预测应用场景结论 1. 引言 图像分类是计算机视觉中的一个重要任务,目标是自动识别图像中的对象类别。通过卷积神经网络(CNN)等深度学习技术,我们可以构建高效的图像分类系统,广泛应用于自动驾驶、医疗影像诊断、监控分析等领域。本文将介绍如何构建一个基于人工智能的图像分类系统,包括环境

认识、理解、分类——acm之搜索

普通搜索方法有两种:1、广度优先搜索;2、深度优先搜索; 更多搜索方法: 3、双向广度优先搜索; 4、启发式搜索(包括A*算法等); 搜索通常会用到的知识点:状态压缩(位压缩,利用hash思想压缩)。

Verybot之OpenCV应用一:安装与图像采集测试

在Verybot上安装OpenCV是很简单的,只需要执行:         sudo apt-get update         sudo apt-get install libopencv-dev         sudo apt-get install python-opencv         下面就对安装好的OpenCV进行一下测试,编写一个通过USB摄像头采

用Pytho解决分类问题_DBSCAN聚类算法模板

一:DBSCAN聚类算法的介绍 DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一种基于密度的聚类算法,DBSCAN算法的核心思想是将具有足够高密度的区域划分为簇,并能够在具有噪声的空间数据库中发现任意形状的簇。 DBSCAN算法的主要特点包括: 1. 基于密度的聚类:DBSCAN算法通过识别被低密

PMP–一、二、三模–分类–14.敏捷–技巧–看板面板与燃尽图燃起图

文章目录 技巧一模14.敏捷--方法--看板(类似卡片)1、 [单选] 根据项目的特点,项目经理建议选择一种敏捷方法,该方法限制团队成员在任何给定时间执行的任务数。此方法还允许团队提高工作过程中问题和瓶颈的可见性。项目经理建议采用以下哪种方法? 易错14.敏捷--精益、敏捷、看板(类似卡片)--敏捷、精益和看板方法共同的重点在于交付价值、尊重人、减少浪费、透明化、适应变更以及持续改善等方面。

【python计算机视觉编程——7.图像搜索】

python计算机视觉编程——7.图像搜索 7.图像搜索7.1 基于内容的图像检索(CBIR)从文本挖掘中获取灵感——矢量空间模型(BOW表示模型)7.2 视觉单词**思想****特征提取**: 创建词汇7.3 图像索引7.3.1 建立数据库7.3.2 添加图像 7.4 在数据库中搜索图像7.4.1 利用索引获取获选图像7.4.2 用一幅图像进行查询7.4.3 确定对比基准并绘制结果 7.

【python计算机视觉编程——8.图像内容分类】

python计算机视觉编程——8.图像内容分类 8.图像内容分类8.1 K邻近分类法(KNN)8.1.1 一个简单的二维示例8.1.2 用稠密SIFT作为图像特征8.1.3 图像分类:手势识别 8.2贝叶斯分类器用PCA降维 8.3 支持向量机8.3.2 再论手势识别 8.4 光学字符识别8.4.2 选取特征8.4.3 多类支持向量机8.4.4 提取单元格并识别字符8.4.5 图像校正