首页
Python
Java
前端
数据库
Linux
Chatgpt专题
开发者工具箱
sparsity专题
Why is L1 regularization supposed to lead to sparsity than L2?
Just because of its geometric shape: Here is some intentionally non-rigorous intuition: suppose you have a linear system Ax=b for which you know there exists a sparse solution x∗ , and that
阅读更多...
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
iclr 2024 oral reviewer 评分 688 1 intro 目前LLM社区中通常使用GELU和SiLU来作为替代激活函数,它们在某些情况下可以提高LLM的预测准确率 但从节省模型计算量的角度考虑,论文认为经典的ReLU函数对模型收敛和性能的影响可以忽略不计,同时可以显着减少计算和权重IO量\ 2 激活函数影响效果吗? 选用了开源的大模型 OPT,Llama
阅读更多...