kernel3专题

train_gpt2_fp32.cu - layernorm_forward_kernel3

源码 __global__ void layernorm_forward_kernel3(float* __restrict__ out, float* __restrict__ mean, float* __restrict__ rstd,const float* __restrict__ inp, const float* __restrict__ weight,const float*