【OpenPCDet】稀疏卷积SPConv-v1.2代码解读(3)

2023-11-04 01:59

本文主要是介绍【OpenPCDet】稀疏卷积SPConv-v1.2代码解读(3),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

【构建rulebook】

        传统卷积通过img2col实现,稀疏卷积通过Rulebook来实现。什么是Rulebook? 本质上来说就是一个表。先通过建立输入、输出的哈希表,分别将输入、输出的张量坐标映射到序号。再将输入、输出的哈希表中的序号建立起联系,这样就可以基本实现稀疏卷积,因此这也是稀疏卷积实现的关键。项目代码中建立rulebook这一步会调用Python函数get_indice_pairs,再有该函数进一步调用spconv共享模块中的c++函数getIndicePairs来一步步完成。我们先来说说get_indice_pairs函数。

def get_indice_pairs(indices,batch_size,spatial_shape,ksize=3,stride=1,padding=0,dilation=1,out_padding=0,subm=False,transpose=False,grid=None,use_hash=False):ndim = indices.shape[1] - 1   #e.g. 4->3if not isinstance(ksize, (list, tuple)):ksize = [ksize] * ndim  #e.g. 3->[3,3,3],3x3x3 kernelif not isinstance(stride, (list, tuple)):stride = [stride] * ndim  #e.g. 1->[1,1,1]if not isinstance(padding, (list, tuple)):padding = [padding] * ndim  #e.g. 0->[0,0,0]if not isinstance(dilation, (list, tuple)):dilation = [dilation] * ndim #e.g. 1->[1,1,1]if not isinstance(out_padding, (list, tuple)):out_padding = [out_padding] * ndim#不支持s,d都不等于1的设定for d, s in zip(dilation, stride):#只要有一个为true,any则为trueassert any([s == 1, d == 1]), "don't support this."if not subm:if transpose:out_shape = get_deconv_output_size(spatial_shape, ksize, stride,padding, dilation, out_padding)else:out_shape = get_conv_output_size(spatial_shape, ksize, stride,padding, dilation)else:out_shape = spatial_shape    #subm,输入输出shape一样if grid is None:res = torch.ops.spconv.get_indice_pairs(indices, batch_size, out_shape,spatial_shape, ksize, stride,padding, dilation, out_padding,int(subm), int(transpose),int(use_hash))return reselse:#...省略...

它其实主要就是完成了一些参数的校验和预处理。首先,对于非子流形稀疏卷积,根据输入shape大小,kernel size,stride等参数计算出输出输出shape。当然,子流行稀疏卷积就不必计算了,输出shape和输入shape一样大小。输出shape的计算很重要,因为建立rulebook这一步就是为了输入和输出的映射关系。准备好参数之后就进入最核心的get_indice_pairs函数。因为spconv通过torch.ops.load_library加载.so文件注册,所以这里通过torch.ops.spconv.get_indice_pairs这种方式来调用该函数。在src/spconv/all.cc文件中通过Pytorch提供的OP Register(算子注册的方式)对底层c++ api进行了注册,所以这里实际调用的是src/spconv/spconv_ops.cc文件种的getIndicePairs函数。

 摘自文件:src/spconv/spconv_ops.cc

#include <spconv/fused_spconv_ops.h>
#include <spconv/nms_ops.h>
#include <spconv/pillar_scatter_ops.h>
#include <spconv/pool_ops.h>
#include <spconv/spconv_ops.h>
#include <torch/script.h>static auto registry =torch::RegisterOperators().op("spconv::get_indice_pairs", &spconv::getIndicePairs).op("spconv::indice_conv", &spconv::indiceConv).op("spconv::indice_conv_batch", &spconv::indiceConvBatch).op("spconv::indice_conv_backward", &spconv::indiceConvBackward).op("spconv::fused_indice_conv_bn", &spconv::fusedIndiceConvBatchNorm).op("spconv::indice_maxpool", &spconv::indiceMaxPool).op("spconv::indice_maxpool_backward", &spconv::indiceMaxPoolBackward).op("spconv::nms", &spconv::nonMaxSuppression<float>).op("spconv::pillar_scatter_float", &spconv::pointPillarScatter<float>).op("spconv::pillar_scatter_half", &spconv::pointPillarScatter<at::Half>);

【补充:关于OP Register】同C++ extension方式一样,OP Register也是Pytorch提供的一种底层扩展算子注册的方式。注册的算子可以通过torch.xxx或者tensor.xxx的方式进行调用,该方式同样与pytorch源码解耦,增加和修改算子不需要重新编译pytorch源码。用该方式注册一个新的算子,流程非常简单:先编写C++相关的算子实现,然后通过pytorch底层的注册接口(torch::RegisterOperators),将该算子注册即可。

摘自:src/spconv/spconv_ops.cc

#include <spconv/spconv_ops.h>
namespace spconv {std::vector<torch::Tensor>
getIndicePairs(torch::Tensor indices, int64_t batchSize,std::vector<int64_t> outSpatialShape,std::vector<int64_t> spatialShape,std::vector<int64_t> kernelSize, std::vector<int64_t> stride,std::vector<int64_t> padding, std::vector<int64_t> dilation,std::vector<int64_t> outPadding, int64_t _subM,int64_t _transpose, int64_t _useHash) {// auto timer = spconv::CudaContextTimer<>();bool subM = _subM != 0;bool transpose = _transpose != 0;auto NDim = kernelSize.size();  // CPU always use hash (tsl::robin_map).bool useHash = _useHash != 0 || indices.device().type() == torch::kCPU;auto numAct = indices.size(0);    //e.g. torch.Size([N,4]) -> Nauto coorDim = indices.size(1) - 1; TV_ASSERT_RT_ERR(NDim == coorDim, "error");TV_ASSERT_RT_ERR(kernelSize.size() == coorDim, "error");TV_ASSERT_RT_ERR(outSpatialShape.size() == coorDim, "error");TV_ASSERT_RT_ERR(stride.size() == coorDim, "error");TV_ASSERT_RT_ERR(padding.size() == coorDim, "error");TV_ASSERT_RT_ERR(outPadding.size() == coorDim, "error");TV_ASSERT_RT_ERR(dilation.size() == coorDim, "error");//e.g. [3,3,3] -> 3*3*3 -> 27auto kernelVolume = kernelSize[0];for (int i = 1; i < kernelSize.size(); ++i) {kernelVolume *= kernelSize[i];}TV_ASSERT_RT_ERR(kernelVolume <= 4096, "error");auto outputVolume = outSpatialShape[0];for (int i = 1; i < outSpatialShape.size(); ++i) {outputVolume *= outSpatialShape[i];}std::string msg = "due to limits of cuda hash, the volume of dense space ""include batch size ";msg += "must less than std::numeric_limits<int>::max() = 2e9";TV_ASSERT_RT_ERR(batchSize * outputVolume < std::numeric_limits<int>::max(),msg);//e.g. torch.Size([2,27,16000])torch::Tensor indicePairs = torch::full({2, kernelVolume, numAct}, -1,torch::dtype(torch::kInt32).device(indices.device()));//e.g. torch.Size([27])torch::Tensor indiceNum = torch::zeros({kernelVolume}, torch::dtype(torch::kInt32).device(indices.device()));auto gridSize = batchSize * outputVolume;if (useHash) {gridSize = batchSize;  //输入useHash为true,或者使用cpu}torch::Tensor gridOut = torch::full({gridSize}, -1, torch::dtype(torch::kInt32).device(indices.device()));gridOut = gridOut.view({batchSize, -1});int64_t numActOut = -1;for (int i = 0; i < NDim; ++i) {if (subM) {padding[i] = kernelSize[i] / 2; //根据kernel size计算pading大小stride[i] = 1;}}// tv::ssprint("prepare", timer.report() / 1000.0);if (subM) {if (indices.device().type() == torch::kCPU) {numActOut = create_submconv_indice_pair_cpu(indices, gridOut, indicePairs, indiceNum, kernelSize, stride, padding,dilation, outSpatialShape, transpose, false, useHash);}
#ifdef TV_CUDAelse if (indices.device().type() == torch::kCUDA) {numActOut = create_submconv_indice_pair_cuda(indices, gridOut, indicePairs, indiceNum, kernelSize, stride, padding,dilation, outSpatialShape, transpose, false, useHash);//啥子??GPU算出来-1就用cpu上?为什么?cpu算出来就不会是-1了??if (numActOut == -1) {auto device = indices.device();indicePairs = indicePairs.to({torch::kCPU});indiceNum = indiceNum.to({torch::kCPU});indices = indices.to({torch::kCPU});numActOut = create_submconv_indice_pair_cpu(indices, gridOut, indicePairs, indiceNum, kernelSize, stride,padding, dilation, outSpatialShape, transpose, false, useHash);return {indices.to(device), indicePairs.to(device),indiceNum.to(device)};}}
#endifelse {TV_THROW_INVALID_ARG("unknown device type");}// tv::ssprint("subm", timer.report() / 1000.0);return {indices, indicePairs, indiceNum};} else {//如果卷及类型是spconv,初始化indicePairUnique和outIndsauto indicePairUnique = torch::full({indicePairs.numel() / 2 + 1}, std::numeric_limits<int>::max(),torch::dtype(torch::kInt32).device(indices.device()));torch::Tensor outInds =//e.g. torch.Size([N*27,3+1])torch::zeros({numAct * kernelVolume, coorDim + 1},torch::dtype(torch::kInt32).device(indices.device()));if (indices.device().type() == torch::kCPU) {numActOut = create_conv_indice_pair_cpu(indices, outInds, gridOut, indicePairs, indiceNum, kernelSize, stride,padding, dilation, outSpatialShape, transpose, false, useHash);}
#ifdef TV_CUDAelse if (indices.device().type() == torch::kCUDA) {numActOut = create_conv_indice_pair_p1_cuda(indices, indicePairs, indiceNum, indicePairUnique, kernelSize, stride,padding, dilation, outSpatialShape, transpose);if (numActOut > 0) {auto res = torch::_unique(indicePairUnique);indicePairUnique = std::get<0>(res);numActOut = create_conv_indice_pair_p2_cuda(indices, outInds, gridOut, indicePairs, indiceNum, indicePairUnique,outSpatialShape, transpose, false, useHash);if (numActOut == -1) {auto device = indices.device();outInds = outInds.to({torch::kCPU});indicePairs = indicePairs.to({torch::kCPU});indiceNum = indiceNum.to({torch::kCPU});indices = indices.to({torch::kCPU});numActOut = create_conv_indice_pair_cpu(indices, outInds, gridOut, indicePairs, indiceNum, kernelSize,stride, padding, dilation, outSpatialShape, transpose, false,useHash);return {outInds.to(device).slice(0, 0, numActOut),indicePairs.to(device), indiceNum.to(device)};}}}
#endifelse {TV_THROW_INVALID_ARG("unknown device type");}return {outInds.slice(0, 0, numActOut), indicePairs, indiceNum};}
}

简单起见,在分析getIndicePairs建立rulebook的原理是我们只讨论GPU部分的逻辑,并且子流行3d稀疏卷积和正常3d稀疏卷积分开讨论,优先子流行3d稀疏卷积。代码种最重要的3个变量分别为:indicePairs,indiceNum和gridOut,其建立过程如下。

auto outputVolume = outSpatialShape[0];
for (int i = 1; i < outSpatialShape.size(); ++i) {outputVolume *= outSpatialShape[i];
}
//e.g. torch.Size([2,27,16000])
torch::Tensor indicePairs = torch::full({2, kernelVolume, numAct}, -1,torch::dtype(torch::kInt32).device(indices.device()));
//e.g. torch.Size([27])
torch::Tensor indiceNum = torch::zeros({kernelVolume}, torch::dtype(torch::kInt32).device(indices.device()));
auto gridSize = batchSize * outputVolume;
torch::Tensor gridOut = torch::full({gridSize}, -1, torch::dtype(torch::kInt32).device(indices.device()));
gridOut = gridOut.view({batchSize, -1});

对rulebook原理有过了解的话不难知道indicePairs最终就代表了稀疏卷积输入输出的映射规则。它的shape为{2,kernelVolume,numAct},2表示输入和输出两个方向,kernelVolume为卷积核的volume size。例如一个3x3x3的卷积核,其volume size就是27(3*3*3)。numAct表示输入有效(active)特征的数量。indiceNum用于保存卷积核每一个位置上的总的计算的次数,因为是稀疏卷积所以卷积核上每一个元素和有效数据的运算次数可能是不同的。

最终建立的rulebook如上图所示,代码中关于gpu建立rulebook调用create_submconv_indice_pair_cuda函数来完成。

摘自:src/spconv/indice.cu

int create_submconv_indice_pair_cuda(torch::Tensor indicesIn,   //e.g. torch.Size([N,4])torch::Tensor gridsOut,    //e.g torch.Size([bs, gridOutVolume])torch::Tensor indicePairs, //e.g torch.Size([2,kernelVolume, numAct])torch::Tensor indiceNum, std::vector<int64_t> kernelSize,std::vector<int64_t> stride, std::vector<int64_t> padding,std::vector<int64_t> dilation, std::vector<int64_t> outSpatialShape,bool transpose, bool resetGrid, bool useHash) {auto stream = at::cuda::getCurrentCUDAStream();auto ndim = outSpatialShape.size(); //3auto numActIn = indicesIn.size(0);int batchSize = gridsOut.size(0);auto kernelVolume = indiceNum.size(0);   // e.g. 3x3x3 => 27if (numActIn == 0)return 0;bool failed = false;tv::dispatch_torch<int32_t>(indicesIn.scalar_type(), [&](auto IndexValue) {using Index = TV_DECLTYPE(IndexValue); //类型推导using IndexGrid = int32_t;tv::dispatch_int<2, 3, 4>(ndim, [&](auto I) {constexpr int NDim = TV_DECLTYPE(I)::value; tv::SimpleVector<Index, NDim> ks(kernelSize.begin(), kernelSize.end());tv::SimpleVector<Index, NDim> st(stride.begin(), stride.end());tv::SimpleVector<Index, NDim> pa(padding.begin(), padding.end());tv::SimpleVector<Index, NDim> di(dilation.begin(), dilation.end());tv::SimpleVector<Index, NDim> ou(outSpatialShape.begin(), outSpatialShape.end());Index spatialVolume = 1;for (int i = 0; i < NDim; ++i) {spatialVolume *= outSpatialShape[i];}if (useHash) {//...省略...} else {// auto timer = spconv::CudaContextTimer<>();prepareSubMGridKernel<Index, IndexGrid, NDim><<<tv::cuda::getBlocks(numActIn), tv::cuda::CUDA_NUM_THREADS, 0, stream>>>(tv::torch2tv<Index>(indicesIn), tv::torch2tv<IndexGrid>(gridsOut), ou, spatialVolume);// tv::ssprint("prepareSubMGridKernel", timer.report() / 1000.0);TV_CHECK_CUDA_ERR_V2("prepareSubMGridKernel failed");// when dilation all one, we use a simple kernel to calc resultbool dilation_one = true;for (int i = 0; i < NDim; ++i) {dilation_one &= di[i] == 1;}auto found = false;if (dilation_one && (NDim == 2 || NDim == 3)) {auto indiceNumCpu = indiceNum.cpu(); ///do what??no use!if (NDim == 2) {//...省略...} else if (NDim == 3) {tv::SimpleVector<Index, 3> ou_(outSpatialShape.begin(), outSpatialShape.end());tv::dispatch_int_noexcept<1, 3, 5>(kernelSize[0], [&](auto K0C) {tv::dispatch_int_noexcept<1, 3, 5>(kernelSize[1], [&](auto K1C) {tv::dispatch_int_noexcept<1, 3, 5>(kernelSize[2], [&](auto K2C) {constexpr int K0 = TV_DECLTYPE(K0C)::value;constexpr int K1 = TV_DECLTYPE(K1C)::value;constexpr int K2 = TV_DECLTYPE(K2C)::value;found = true;getSubMIndicePairsKernel3<Index, IndexGrid, K0, K1, K2><<<tv::cuda::getBlocks(numActIn), tv::cuda::CUDA_NUM_THREADS, 0, stream>>>(tv::torch2tv<Index>(indicesIn),tv::torch2tv<IndexGrid>(gridsOut),  tv::torch2tv<Index>(indicePairs),tv::torch2tv<Index>(indiceNum), ou_,spatialVolume);});});});}}if (!found) {//...省略...}// tv::ssprint("getSubMIndicePairsKernel", timer.report() / 1000.0);}if (resetGrid && (!useHash)) {resetGridSubMKernel<Index, IndexGrid, NDim><<<tv::cuda::getBlocks(numActIn), tv::cuda::CUDA_NUM_THREADS, 0,stream>>>(indicesIn.data_ptr<Index>(),tv::torch2tv<IndexGrid>(gridsOut), ou, numActIn);TV_CHECK_CUDA_ERR_V2("resetGridKernel failed");}});});if (failed){return -1;}return numActIn;
}

在create_submconv_indice_pair_cuda我们大可不必深究以下动态分发机制的运行原理。

tv::dispatch_torch<int32_t>(indicesIn.scalar_type(), [&](auto IndexValue) {

    ....

tv::dispatch_int<2, 3, 4>(ndim, [&](auto I) {

    ....

}    

}

直接将重心锁定在核函数:

prepareSubMGridKernel<Index, IndexGrid, NDim>
            <<<tv::cuda::getBlocks(numActIn), tv::cuda::CUDA_NUM_THREADS, 0, stream>>>(
                tv::torch2tv<Index>(indicesIn), 
                tv::torch2tv<IndexGrid>(gridsOut), 
                ou, spatialVolume);

我们知道cuda核函数的启动配置形如:
<<<grid_size,block_size>>>

这里grid_size(网格大小)和block_size(线程块大小)一般来说是一个结构体类型的变量,但也可以是一个普通的整形变量。prepareSubMGridKernel核函数中grid_size和block_size实则都是用的整形变量。其中block_size为tv::cuda::CUDA_NUM_THREADS,在include/tensorview/cuda_utils.h文件中定义,大小为1024。而grid_size大小通过tv::cuda::getBlocks(numActIn)计算得到,其中numActIn表示有效(active)输入数据的数量。

摘自:include/tensorview/cuda_utils.h

template <typename T1, typename T2> inline int DivUp(const T1 a, const T2 b) {return (a + b - 1) / b;
}// Use 1024 threads per block, which requires cuda sm_2x or above
constexpr int CUDA_NUM_THREADS = 1024;
// CUDA: number of blocks for threads.inline int getNumThreads(const int N) {if (N > CUDA_NUM_THREADS) {return CUDA_NUM_THREADS;}return DivUp(N, 32) * 32;
}inline int getBlocks(const int N) {TV_ASSERT_RT_ERR(N > 0,"CUDA kernel launch blocks must be positive, but got N=", N);return DivUp(N, getNumThreads(N));
}

prepareSubMGridKernel的作用类似于建立输出张量坐标(通过index表示)到输出序号之间的一张哈希表。

摘自:include/spconv/indice.cu.h

template <typename Index, typename IndexGrid, unsigned NDim>
__global__ void prepareSubMGridKernel(tv::TensorView<const Index> indicesIn, tv::TensorView<IndexGrid> gridsOut,const tv::SimpleVector<Index, NDim> outSpatialShape, Index spatialVolume) {auto numActIn = indicesIn.dim(0);   //e.g. torch.Size([N,4]) => NIndex index = 0;for (int ix : tv::KernelLoopX<int>(numActIn)) {index = tv::ArrayIndexRowMajor<NDim, NDim>::runPtrs(indicesIn.data() + ix * (NDim + 1) + 1, outSpatialShape.data(), 0) +spatialVolume * indicesIn(ix, 0);gridsOut[index] = ix;}
}

第一眼看到tv::ArrayIndexRowMajor定义的时候被作者高大上的操作整懵了,总的来说还是一个以行优先顺序计算元素索引的过程,只是换了一种模板加递归调用的写法。

摘自:include/tensorview/tensorview.h

template <int N, int Ndim> struct ArrayIndexRowMajor {//...省略...template <typename TShape, typename Tinit>TV_HOST_DEVICE_INLINE static unsignedrunPtrs(const TShape *indexes, const TShape *shape, Tinit start) {return ArrayIndexRowMajor<N - 1, Ndim>::runPtrs(indexes, shape, (indexes[Ndim - N] + start) * shape[Ndim - N + 1]);}
};
template <int Ndim> struct ArrayIndexRowMajor<1, Ndim> {//...省略...template <typename TShape, typename Tinit>TV_HOST_DEVICE_INLINE static unsignedrunPtrs(const TShape *indexes, const TShape *shape, Tinit start) {return start + indexes[Ndim - 1];}
};

【参考文献】

稀疏卷积 Sparse Convolution Net - 知乎

通俗易懂的解释Sparse Convolution过程 - 知乎

模型部署入门教程(三):PyTorch 转 ONNX 详解 - 知乎

小红书《CUDA编程基础与实践》

这篇关于【OpenPCDet】稀疏卷积SPConv-v1.2代码解读(3)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/342453

相关文章

活用c4d官方开发文档查询代码

当你问AI助手比如豆包,如何用python禁止掉xpresso标签时候,它会提示到 这时候要用到两个东西。https://developers.maxon.net/论坛搜索和开发文档 比如这里我就在官方找到正确的id描述 然后我就把参数标签换过来

poj 1258 Agri-Net(最小生成树模板代码)

感觉用这题来当模板更适合。 题意就是给你邻接矩阵求最小生成树啦。~ prim代码:效率很高。172k...0ms。 #include<stdio.h>#include<algorithm>using namespace std;const int MaxN = 101;const int INF = 0x3f3f3f3f;int g[MaxN][MaxN];int n

MCU7.keil中build产生的hex文件解读

1.hex文件大致解读 闲来无事,查看了MCU6.用keil新建项目的hex文件 用FlexHex打开 给我的第一印象是:经过软件的解释之后,发现这些数据排列地十分整齐 :02000F0080FE71:03000000020003F8:0C000300787FE4F6D8FD75810702000F3D:00000001FF 把解释后的数据当作十六进制来观察 1.每一行数据

Java ArrayList扩容机制 (源码解读)

结论:初始长度为10,若所需长度小于1.5倍原长度,则按照1.5倍扩容。若不够用则按照所需长度扩容。 一. 明确类内部重要变量含义         1:数组默认长度         2:这是一个共享的空数组实例,用于明确创建长度为0时的ArrayList ,比如通过 new ArrayList<>(0),ArrayList 内部的数组 elementData 会指向这个 EMPTY_EL

计算机毕业设计 大学志愿填报系统 Java+SpringBoot+Vue 前后端分离 文档报告 代码讲解 安装调试

🍊作者:计算机编程-吉哥 🍊简介:专业从事JavaWeb程序开发,微信小程序开发,定制化项目、 源码、代码讲解、文档撰写、ppt制作。做自己喜欢的事,生活就是快乐的。 🍊心愿:点赞 👍 收藏 ⭐评论 📝 🍅 文末获取源码联系 👇🏻 精彩专栏推荐订阅 👇🏻 不然下次找不到哟~Java毕业设计项目~热门选题推荐《1000套》 目录 1.技术选型 2.开发工具 3.功能

代码随想录冲冲冲 Day39 动态规划Part7

198. 打家劫舍 dp数组的意义是在第i位的时候偷的最大钱数是多少 如果nums的size为0 总价值当然就是0 如果nums的size为1 总价值是nums[0] 遍历顺序就是从小到大遍历 之后是递推公式 对于dp[i]的最大价值来说有两种可能 1.偷第i个 那么最大价值就是dp[i-2]+nums[i] 2.不偷第i个 那么价值就是dp[i-1] 之后取这两个的最大值就是d

pip-tools:打造可重复、可控的 Python 开发环境,解决依赖关系,让代码更稳定

在 Python 开发中,管理依赖关系是一项繁琐且容易出错的任务。手动更新依赖版本、处理冲突、确保一致性等等,都可能让开发者感到头疼。而 pip-tools 为开发者提供了一套稳定可靠的解决方案。 什么是 pip-tools? pip-tools 是一组命令行工具,旨在简化 Python 依赖关系的管理,确保项目环境的稳定性和可重复性。它主要包含两个核心工具:pip-compile 和 pip

D4代码AC集

贪心问题解决的步骤: (局部贪心能导致全局贪心)    1.确定贪心策略    2.验证贪心策略是否正确 排队接水 #include<bits/stdc++.h>using namespace std;int main(){int w,n,a[32000];cin>>w>>n;for(int i=1;i<=n;i++){cin>>a[i];}sort(a+1,a+n+1);int i=1

Spring 源码解读:自定义实现Bean定义的注册与解析

引言 在Spring框架中,Bean的注册与解析是整个依赖注入流程的核心步骤。通过Bean定义,Spring容器知道如何创建、配置和管理每个Bean实例。本篇文章将通过实现一个简化版的Bean定义注册与解析机制,帮助你理解Spring框架背后的设计逻辑。我们还将对比Spring中的BeanDefinition和BeanDefinitionRegistry,以全面掌握Bean注册和解析的核心原理。

GPT系列之:GPT-1,GPT-2,GPT-3详细解读

一、GPT1 论文:Improving Language Understanding by Generative Pre-Training 链接:https://cdn.openai.com/research-covers/languageunsupervised/language_understanding_paper.pdf 启发点:生成loss和微调loss同时作用,让下游任务来适应预训