supposed专题

Why is L1 regularization supposed to lead to sparsity than L2?

Just because of its geometric shape: Here is some intentionally non-rigorous intuition: suppose you have a linear system Ax=b for which you know there exists a sparse solution x∗ , and that

[nlp] RuntimeError: Llama is supposed to be a BPE model!报错解决

# tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL) 改成这个legacy=False, use_fast=False: tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, legacy=False, use_fast=False)