Large Language Models(LLMs) Concepts

本文主要是介绍Large Language Models(LLMs) Concepts，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

1、Introduction to Large Language Models(LLM)

1.1、Definition of LLMs

Large: Training data and resources.
Language: Human-like text.
Models: Learn complex patterns using text data.

The LLM is considered the defining moment in the history of AI.

Some applications:

Sentiment analysis
Identifying themes
Translating text or speech
Generating code
Next-word prediction

1.2、Real-world application

Transforming finance industry:

[Investment outlook] | [Annual reports] | [News articles] | [Social media posts]--> LLM[Market analysis] | [Portfolio management] [Investment opportunities]

Revolutionizing healthcare sector:

- Analyze patient data to offer personalized recommendations.- Must adhere to privacy laws.

Education:

- Personalized coaching and feedback.- Interactive learning experience.- AI-powered tutor:- Ask questions.- Receive guidance.- Discuss ideas.

Visual question answering:

Defining multimodel:Multimodel:
- Many types of processing or generationNun-multimodel:
- One type of processing or generationVisual question answering:
- Answers to questions about visual content
- Object identification & relationships
- Scene description

1.3、Challenges of language modeling

Sequence matters
Context modeling
Long-range dependency
Single-task learning

2、Building Blocks of LLMs

2.1、Novelty of LLMs

Overcome data's unstructured nature
Outperform traditional models
Understand linguistic subteties

The bulding blocks show below:

2.2、Generalized overview of NLP

2.2.1、Text Pre-processing

Can be done in a different order as they are independent.

Tokenization: Splits text into individual words, or tokens.
Stop word removal: Stop words do not add meaning.
Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.

2.2.2、Text Representation

Text data into numerical form.

Bag-of-words:

Limitation:- Does not capture the order or context.- Does not capture the semantics between the words.

Word embeddings:

2.3、Fine-tuning

Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.

2.4、Learning techniques

N-shot learning: zero-shot, few-show, and multi-shot.

2.4.1、Zero-shot learning

No explicit training.
Uses language understanding and context.
Generalizes without any prior examples.

2.4.2、Few-shot learning

Learn a new task with a few examples.

2.4.3、Multi-shot learning

Requires more examples than few-shot.

3、Training Methodology and Techniques

3.1、Building blocks to train LLMs

3.1.1、Generative pre-training

Trained using generative pre-training- Input data of text tokens.- Trained to predict the tokens within the dataset.Types:- Next word prediction.- Masked language modeling.

3.1.2、Next word prediction

Supervised learning technique.
Predicts next word and generates coherent text.
Captures the dependencies between words.
Training data consist of pairs of input and output examples.

3.1.3、Masked language modeling

Hides a selective word.
Trained model predicts the masked word.

3.2、Introducing the transformer

3.2.1、Transformer architecture

Relationship between words.
Components: Pre-processing, Positional Encoding, Encoders, and Decoders.

3.2.2、Inside the transformer

(1) Text pre-processing and representation:

Text preprocessing: tokenization, stop word removal, lemmatization.
Text representation: word embedding.

(2) Positional encoding:

Information on the position of each word.
Understand distant words.

(3) Encoders:

Attention mechanism: directs attention to specific words and relationships.
Neural network: process specific features.

(4) Decoders:

Includes attention and neural networks.
Generates the output.

3.2.3、Transformers and long-range dependencies

Initial challenge: lone-range dependency.
Attention: focus on different parts of the input.

3.2.4、Processes multiple parts simultaneously

Limitation of traditional language models: Sequential - one word at a time.
Transformers: Process multiple parts simultaneously (Faster processing).

3.3、Attention mechanisms

3.3.1、Attention mechanisms

Understand complex structures.
Focus on important words.

3.3.2、Two primary types: Slef-attention and multi-head attention

For example:

3.4、Advanced fine-tuning

3.4.1、LLM training three steps:

Pre-training：
Fine-tuning:
RLHF:
（1）Why RLHF?

（2）Starts with the need to fine-tune

3.4.2、Simplifying RLHF

Model output reviewed by human.
Updates model based on the feedback.

Step1:

Receives a prompt.
Generates multiple responses.

Step2:

Human expert checks these responses.
Ranks the responses based on quality: Accuracy、Relevance、Coherence.

Step3:

Learns from expert's ranking.
To align its response in future with their preferences.

And it goes on:

Continues to generate responses.
Receives expert's rankings.
Adjusts the learning.

3.4.3、Recap

4、Concerns and Considerations

4.1、Data concerns and considerations

Data volume and compute power.
Data quality.
Labeling.
Bias.
Privacy.

4.1.1、Data volume and compute power

LLMs need a lot of data.
Extensive computing power.
Can cost millions of dollars.

4.1.2、Data quality

Quality data is essential.

4.1.3、Labeled data

Correct data label.
Labor-intensive.
Incorrect labels impact model performance.
Address errors: identify >>> analyze >>> iterate.

4.1.4、Data bias

Influenced by societal stereotypes.
Lack of diversity in training data.
Discrimination and unfair outcomes.

Spot and deal with the biased data:

Evaluate data imbalances.
Promote diversity.
Bias mitigation techniques: more diverse examples.

4.1.5、Data privacy

Compliance with data protection and privacy regulations.
Sensitive or personally identifiable information (PII).
Privacy is a concern.
Get permission.

4.2、Ethical and environmental concerns

4.2.1、Ethical concerns

Transparency risk - Challenging to understand the output.
Accountavility risk - Responsibility of LLMs' actions.
Information hazards - Disseminating harmful information.

4.2.2、Environmental concerns

Ecological footprint of LLMs.
Substantial energy resources to train.
Impact through carbon emissions.

4.3、Where are LLMs heading?

Model explainability.
Efficiency.
Unsupervised bias handling.
Enhanced creativity.

这篇关于Large Language Models(LLMs) Concepts的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！