Large Language Models(LLMs) Concepts

2024-09-03 13:12

本文主要是介绍Large Language Models(LLMs) Concepts,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

1、Introduction to Large Language Models(LLM)

1.1、Definition of LLMs

  • Large: Training data and resources.
  • Language: Human-like text.
  • Models: Learn complex patterns using text data.

The LLM is considered the defining moment in the history of AI.

Some applications:

  • Sentiment analysis
  • Identifying themes
  • Translating text or speech
  • Generating code
  • Next-word prediction

1.2、Real-world application

  • Transforming finance industry: 
    [Investment outlook] | [Annual reports] | [News articles] | [Social media posts]--> LLM[Market analysis] | [Portfolio management] [Investment opportunities]

  • Revolutionizing healthcare sector:
    - Analyze patient data to offer personalized recommendations.- Must adhere to privacy laws.

  • Education:
    - Personalized coaching and feedback.- Interactive learning experience.- AI-powered tutor:- Ask questions.- Receive guidance.- Discuss ideas.

  • Visual question answering:
    Defining multimodel:Multimodel:
    - Many types of processing or generationNun-multimodel:
    - One type of processing or generationVisual question answering:
    - Answers to questions about visual content
    - Object identification & relationships
    - Scene description

1.3、Challenges of language modeling

  • Sequence matters
  • Context modeling
  • Long-range dependency
  • Single-task learning

2、Building Blocks of LLMs

2.1、Novelty of LLMs

  • Overcome data's unstructured nature
  • Outperform traditional models
  • Understand linguistic subteties

The bulding blocks show below:

2.2、Generalized overview of NLP

2.2.1、Text Pre-processing

Can be done in a different order as they are independent.

  • Tokenization: Splits text into individual words, or tokens.

  • Stop word removal: Stop words do not add meaning.

  • Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.

2.2.2、Text Representation

  • Text data into numerical form.
  • Bag-of-words:

     
    Limitation:- Does not capture the order or context.- Does not capture the semantics between the words.

  • Word embeddings:

2.3、Fine-tuning

Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.

2.4、Learning techniques

N-shot learning: zero-shot, few-show, and multi-shot.

2.4.1、Zero-shot learning

  • No explicit training.
  • Uses language understanding and context.
  • Generalizes without any prior examples.

2.4.2、Few-shot learning

  • Learn a new task with a few examples.

2.4.3、Multi-shot learning

  • Requires more examples than few-shot.

3、Training Methodology and Techniques

3.1、Building blocks to train LLMs

3.1.1、Generative pre-training

Trained using generative pre-training- Input data of text tokens.- Trained to predict the tokens within the dataset.Types:- Next word prediction.- Masked language modeling.

3.1.2、Next word prediction

  • Supervised learning technique.
  • Predicts next word and generates coherent text.
  • Captures the dependencies between words.
  • Training data consist of pairs of input and output examples.

3.1.3、Masked language modeling

  • Hides a selective word.
  • Trained model predicts the masked word.

3.2、Introducing the transformer

3.2.1、Transformer architecture

  • Relationship between words.
  • Components: Pre-processing, Positional Encoding, Encoders, and Decoders.

3.2.2、Inside the transformer

(1) Text pre-processing and representation:

  • Text preprocessing: tokenization, stop word removal, lemmatization.
  • Text representation: word embedding.

(2) Positional encoding:

  • Information on the position of each word.
  • Understand distant words.

(3) Encoders:

  • Attention mechanism: directs attention to specific words and relationships.
  • Neural network: process specific features.

(4) Decoders:

  • Includes attention and neural networks.
  • Generates the output.

3.2.3、Transformers and long-range dependencies

  • Initial challenge: lone-range dependency.
  • Attention: focus on different parts of the input.

3.2.4、Processes multiple parts simultaneously

  • Limitation of traditional language models: Sequential - one word at a time.
  • Transformers: Process multiple parts simultaneously (Faster processing).

3.3、Attention mechanisms

3.3.1、Attention mechanisms

  • Understand complex structures.
  • Focus on important words.

3.3.2、Two primary types: Slef-attention and multi-head attention

For example:

3.4、Advanced fine-tuning

3.4.1、LLM training three steps:

  • Pre-training:
  • Fine-tuning:
  • RLHF:
    (1)Why RLHF?

    (2)Starts with the need to fine-tune

3.4.2、Simplifying RLHF

  • Model output reviewed by human.
  • Updates model based on the feedback.

Step1:

  • Receives a prompt.
  • Generates multiple responses.

Step2:

  • Human expert checks these responses.
  • Ranks the responses based on quality: Accuracy、Relevance、Coherence.

Step3:

  • Learns from expert's ranking.
  • To align its response in future with their preferences.

And it goes on:

  • Continues to generate responses.
  • Receives expert's rankings.
  • Adjusts the learning.

3.4.3、Recap

4、Concerns and Considerations

4.1、Data concerns and considerations

  • Data volume and compute power.
  • Data quality.
  • Labeling.
  • Bias.
  • Privacy.

4.1.1、Data volume and compute power

  • LLMs need a lot of data.
  • Extensive computing power.
  • Can cost millions of dollars.

4.1.2、Data quality

  • Quality data is essential.

4.1.3、Labeled data

  • Correct data label.
  • Labor-intensive.
  • Incorrect labels impact model performance.
  • Address errors: identify >>> analyze >>> iterate.

4.1.4、Data bias

  • Influenced by societal stereotypes.
  • Lack of diversity in training data.
  • Discrimination and unfair outcomes.

Spot and deal with the biased data:

  • Evaluate data imbalances.
  • Promote diversity.
  • Bias mitigation techniques: more diverse examples.

4.1.5、Data privacy

  • Compliance with data protection and privacy regulations.
  • Sensitive or personally identifiable information (PII).
  • Privacy is a concern.
  • Get permission.

4.2、Ethical and environmental concerns

4.2.1、Ethical concerns

  • Transparency risk - Challenging to understand the output.
  • Accountavility risk - Responsibility of LLMs' actions.
  • Information hazards - Disseminating harmful information.

4.2.2、Environmental concerns

  • Ecological footprint of LLMs.
  • Substantial energy resources to train.
  • Impact through carbon emissions.

4.3、Where are LLMs heading?

  • Model explainability.
  • Efficiency.
  • Unsupervised bias handling.
  • Enhanced creativity.

这篇关于Large Language Models(LLMs) Concepts的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1133051

相关文章

透彻!驯服大型语言模型(LLMs)的五种方法,及具体方法选择思路

引言 随着时间的发展,大型语言模型不再停留在演示阶段而是逐步面向生产系统的应用,随着人们期望的不断增加,目标也发生了巨大的变化。在短短的几个月的时间里,人们对大模型的认识已经从对其zero-shot能力感到惊讶,转变为考虑改进模型质量、提高模型可用性。 「大语言模型(LLMs)其实就是利用高容量的模型架构(例如Transformer)对海量的、多种多样的数据分布进行建模得到,它包含了大量的先验

论文翻译:arxiv-2024 Benchmark Data Contamination of Large Language Models: A Survey

Benchmark Data Contamination of Large Language Models: A Survey https://arxiv.org/abs/2406.04244 大规模语言模型的基准数据污染:一项综述 文章目录 大规模语言模型的基准数据污染:一项综述摘要1 引言 摘要 大规模语言模型(LLMs),如GPT-4、Claude-3和Gemini的快

论文翻译:ICLR-2024 PROVING TEST SET CONTAMINATION IN BLACK BOX LANGUAGE MODELS

PROVING TEST SET CONTAMINATION IN BLACK BOX LANGUAGE MODELS https://openreview.net/forum?id=KS8mIvetg2 验证测试集污染在黑盒语言模型中 文章目录 验证测试集污染在黑盒语言模型中摘要1 引言 摘要 大型语言模型是在大量互联网数据上训练的,这引发了人们的担忧和猜测,即它们可能已

UML- 统一建模语言(Unified Modeling Language)创建项目的序列图及类图

陈科肇 ============= 1.主要模型 在UML系统开发中有三个主要的模型: 功能模型:从用户的角度展示系统的功能,包括用例图。 对象模型:采用对象、属性、操作、关联等概念展示系统的结构和基础,包括类图、对象图、包图。 动态模型:展现系统的内部行为。 包括序列图、活动图、状态图。 因为要创建个人空间项目并不是一个很大的项目,我这里只须关注两种图的创建就可以了,而在开始创建UML图

速通GPT-3:Language Models are Few-Shot Learners全文解读

文章目录 论文实验总览1. 任务设置与测试策略2. 任务类别3. 关键实验结果4. 数据污染与实验局限性5. 总结与贡献 Abstract1. 概括2. 具体分析3. 摘要全文翻译4. 为什么不需要梯度更新或微调⭐ Introduction1. 概括2. 具体分析3. 进一步分析 Approach1. 概括2. 具体分析3. 进一步分析 Results1. 概括2. 具体分析2.1 语言模型

[论文笔记]QLoRA: Efficient Finetuning of Quantized LLMs

引言 今天带来LoRA的量化版论文笔记——QLoRA: Efficient Finetuning of Quantized LLMs 为了简单,下文中以翻译的口吻记录,比如替换"作者"为"我们"。 我们提出了QLoRA,一种高效的微调方法,它在减少内存使用的同时,能够在单个48GB GPU上对65B参数的模型进行微调,同时保持16位微调任务的完整性能。QLoRA通过一个冻结的4位量化预

高精度打表-Factoring Large Numbers

求斐波那契数,不打表的话会超时,打表的话普通的高精度开不出来那么大的数组,不如一个int存8位,特殊处理一下,具体看代码 #include<stdio.h>#include<string.h>#define MAX_SIZE 5005#define LEN 150#define to 100000000/*一个int存8位*/int num[MAX_SIZE][LEN];void

[论文笔记]Making Large Language Models A Better Foundation For Dense Retrieval

引言 今天带来北京智源研究院(BAAI)团队带来的一篇关于如何微调LLM变成密集检索器的论文笔记——Making Large Language Models A Better Foundation For Dense Retrieval。 为了简单,下文中以翻译的口吻记录,比如替换"作者"为"我们"。 密集检索需要学习具有区分性的文本嵌入,以表示查询和文档之间的语义关系。考虑到大语言模

ModuleNotFoundError: No module named ‘diffusers.models.dual_transformer_2d‘解决方法

Python应用运行报错,部分错误信息如下: Traceback (most recent call last): File “\pipelines_ootd\unet_vton_2d_blocks.py”, line 29, in from diffusers.models.dual_transformer_2d import DualTransformer2DModel ModuleNotF

阅读笔记--Guiding Attention in End-to-End Driving Models

作者:Diego Porres1, Yi Xiao1, Gabriel Villalonga1, Alexandre Levy1, Antonio M. L ́ opez1,2 出版时间:arXiv:2405.00242v1 [cs.CV] 30 Apr 2024 这篇论文研究了如何引导基于视觉的端到端自动驾驶模型的注意力,以提高它们的驾驶质量和获得更直观的激活图。 摘 要   介绍