luts专题

LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Gener

文章目录 abstractintroduction背景量化二值量化BCQ LUT-GEMM二值量化扩展Group-wise 𝛼分配基于LUT的量化矩阵乘法LUT-GEMM内存占用实验Simple Comparisons with Various KernelsComparison with FP16 Tensor ParallelismExploration of Compressio