universaltransformer专题

UniversalTransformer with Adaptive Computation Time(ACT)

原论文链接:https://arxiv.org/abs/1807.03819 Main code import torchimport numpy as npclass PositionTimestepEmbedding(torch.nn.Module):def forward(self, x, t):device = x.devicesequence_length = x.si