Assembly Language

2024-01-27 15:48
文章标签 language assembly

本文主要是介绍Assembly Language,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

汇编语言是任何一种用于电子计算机、微处理器、微控制器的低级语言,亦称为符号语言。在汇编语言中,用助记符代替机器指令的操作码,用地址符号或标号代替指令或操作数的地址。在不同的设备中,汇编语言对应着不同的机器语言指令集,通过汇编过程转换成机器指令。特定的汇编语言和特定的机器语言指令集是一一对应的,不同平台之间不可直接移植。

文章目录

  • 0. Inline Assembler
  • 1. General-Purpose Registers (通用寄存器)
    • 1.1 Name of the registers
    • 1.2 Some Specialized Register Uses
  • 2. 数据类型
  • 3 Comments
  • 4. Instructions
    • 4.1 Labels
    • 4.2 Mnemonics and Operands
    • 4.3 Structure of instructions
    • 4.4 Basic instructions
      • 4.4.1 MOV
      • 4.4.2 XCHG Instruction
      • 4.4.3 INC and DEC Instructions
      • 4.4.4 ADD and SUB Instructions
      • 4.4.5 NEG Instruction
      • 4.4.6 Implementing Arithmetic Expressions
    • 4.5 Flow Controls
      • 4.5.1 JMP
      • 4.5.2 JCXZ and JECXZ
      • 4.5.3 Other Jump Instructions
      • 4.5.4 CMP
      • 4.5.5 LOOP
      • 4.5.6 LOOPNE
    • 4.6 Basic Data Structure
      • 4.6.1 Stack
    • 4.7 IO Instructions
      • 4.7.1 Input
      • 4.7.2 Output
  • 5. Addressing modes
  • 6. Flags Affected by Arithmetic
    • 6.1 Zero Flag (ZF)
    • 6.2 Sign Flag (SF)
  • 7. Useful Operators
    • 7.1 OFFSET
    • 7.2 TYPE
    • 7.3 LENGTHOF
    • 7.4 SIZEOF
  • 8. Spanning Multiple Lines
  • 9. LABEL Directive
  • 10. Subroutine
    • 10.1 Value Parameters
    • 10.2 Reference Parameters
    • 10.3 Stack Frame
    • 10.4 Recursive subroutines




0. Inline Assembler

这里使用 M i c r o s o f t V i s u a l C + + Microsoft~Visual~C++ Microsoft Visual C++ 来学习汇编的编写。你可以在 C++ 程序内通过使用 _asm 关键字来添加汇编语法。

e.g.

#include <stdio.h>
#include <iostream>
using namespace std;int main(void)
{_asm MOV EAX, var//或者添加多行_asm{MOV EAX, var1MOV EAX, var2}return 0;
}


1. General-Purpose Registers (通用寄存器)

1.1 Name of the registers

汇编中所有的运算与命令都需要放到寄存器内执行,为了方便,汇编已经定义好了一些寄存器的名字。
在这里插入图片描述
EAX, EBX, ECX, EDX have 8-bit name, 16-bit name, 32-bit name.
在这里插入图片描述
ESI, EDI, EBP, ESP only have a 16-bit name for their lower half
在这里插入图片描述

1.2 Some Specialized Register Uses

  • General-Purpose
    • EAX : accumulator
    • ECX : loop counter
    • ESP : stack pointer
    • ESI, EDI : index registers
    • EBP : extended frame pointer (stack) ESP是一个指针,始终执行堆栈的栈顶。而EBP就是那个栈
  • Segment
    • CS : Code Segment 汇编语言用内存中的某一片连续地址存放代码,称为 Code Segment,首地址存放在 CS
    • DS : Data Segment
    • SS : Stack Segment
    • ES, FS, GS : additional segment
  • EIP : instruction pointer EIP 寄存器存储着 CPU 要读取指令的地址,没有它,CPU 就无法读取下面的指令,每次汇编指令执行完相应的 EIP 值就会增加
  • EFLAGS : 文章1,文章2


2. 数据类型

在这里插入图片描述
数据类型与寄存器的对应关系:

  • BYTE, SBYTE : AH, AL, BH, BL, CH, CL, DH, DL
  • WORD, SWORD : AX, BX, CX, DX, SI, DI
  • DWORD, SDWORD : EAX, EBX, ECX, EDX, ESI, EDI


3 Comments

  • Single-line comments : begin with semicolon
  • Multi-line comments : begin with COMMENT directive and a programmer-chosen character. End with the same programmer-chosen character.


4. Instructions

we use the Intel IA-32 instruction set

An instruction contains :

  • Label (optional)
  • Mnemonic (required)
  • Operand (depends on the instruction)
  • Comment (optional)

4.1 Labels

  • Act as place markers. Marks the address of code and data.
  • Follow identifer rules
  • Data label :
    • must be unique
    • not followed by colon
  • Code label :
    • target of jump and loop instructions
    • followed by colon

4.2 Mnemonics and Operands

  • Instruction Mnemonics
    • memory aid
    • e.g. MOV, ADD, SUB, MUL, INC, DEC
  • Operands
    • constant
    • constant expression : Constants and constant expressions are often called immediate values
    • register
    • memory (data label)

4.3 Structure of instructions

The binary codes of almost all instruction contain three pieces of information:

  • The action or operation of the instruction
  • The operands involved (where to find the information to operate with)
  • Where the result is to go

Machine instructions are encoded with distinct bit fields in the prefix to contain information about

  • The operation required.
  • The location of operands and results
  • The data type of the operand

The length of the instructions is depended on:

  • The operation required
  • The addressing modes employed

(Pentium instructions can be from 1 to 15 bytes long)


4.4 Basic instructions

4.4.1 MOV

MOV destination, source

Move from source to destination.

No more than one memory operand permitted, CS, EIP, IP cannot be the destination. No immediate to segment moves. memory-to-memory move not permitted.

Zero Extension
When you copy a smaller value into a larger destination, the MOVZX instruction fills the upper half of the destination with zeros.

Sign Extension
The MOVSX instruction fills the upper half of the destination with a copy of the source operand’s sign bit.


4.4.2 XCHG Instruction

Exchanges the values of two operands. At least one operand must be a register. No immediate operands are permitted.


4.4.3 INC and DEC Instructions

  • INC : Add 1 from destination operand (register | memory)
  • DEC : Subtract 1

4.4.4 ADD and SUB Instructions

ADD | SUB destination, source

Same operand rules as for the MOV instruction


4.4.5 NEG Instruction

NEG source

Reverses the sign of an operand. Operand can be a register or memory operand.

NEG Instruction and the Flags
Any nonzero operand causes the Carry flag to be set.
在这里插入图片描述

  • CF : 进位标志
  • OF : 溢出标志

4.4.6 Implementing Arithmetic Expressions

HLL compilers translate mathematical expressions into assembly language.
e.g.
在这里插入图片描述

4.5 Flow Controls

4.5.1 JMP

JMP is an unconditional jump to a label that is usually within the same procedure.

target:...JMP target

logic : EIP ← \leftarrow target


4.5.2 JCXZ and JECXZ

There are more than 30 jump instructions, JCXZ and JECXZ are to of them, they are conditional jump to test whether CX and ECX is zero and remaining jump instructions test the status flags. Jump if the condition is true or continue if it is false.

	JCXZ target...
target:

4.5.3 Other Jump Instructions

  • JC / JB : Jump if Carry flag is set
  • JNC / JNB : Jump if Carry flag is clear
  • JE / JZ : Jump if Zero flag is set
  • JNE / JNZ : Jump if Zero flag is clear
  • JS : Jump if Sign flag is set
  • JNS : Jump if Sign flag is clear
  • JO : Jump if Overflow flag is set
  • JNO : Jump if Overflow flag is clear

4.5.4 CMP

The CMP instruction is the most common way to test for conditional jumps.

CMP EAX EBX

It will set zero flag Z = 1 if EAX and EBX are the same.

Jumps based on CMP

Assuming execution just after CMP

  • JE : Jump if the first operand (in CMP) is equal to the second operand.
  • JNE : Jump if the first and second operands are not equal.
  • JGE : Jump if first operand is greater or equal
  • JG : Jump if first operand is greater
  • JLE : Jump if first operand is less or equal
  • JL : Jump if first operand is less

4.5.5 LOOP

target:...LOOP target

logic : ECX ← \leftarrow ECX-1, if ECX != 0, jump to target


4.5.6 LOOPNE

e.g. While EAX is not equal to EBX, and not 200 times yet:

	MOV ECX 200
target:...CMP EAX, EBXLOOPNE target

4.6 Basic Data Structure

4.6.1 Stack

Runtime Stack

Managed by the CPU, using two registers

  • SS (stack segment)
  • ESP (stack pointer)

PUSH Operation (入栈 / 压栈)

A 32-bit push operation decrements ESP by 4 and copies a value into the location pointed to by ESP. The stack grows downward. The area below ESP is always available (unless the stack has overflowed)

PUSH EAX

POP Operation (出栈 / 弹栈)

Copies value at top of the stack into a register or variable, adds 2 or 4 to ESP, depends on the attribute of the operand receiving the data.

POP EAX

4.7 IO Instructions

4.7.1 Input

CALL scanf : It will take two parameters from the stack, the address of the format of the input, and the address of the variable to store the input.

4.7.2 Output

CALL printf : It will take one parameter from the stack, the variable (not address)

e.g.

#include <stdio.h>
#include <iostream>
using namespace std;int main(void) {char message[] = "The input number is %d\n";char format[] = "%d";int input;_asm {LEA EAX, inputPUSH EAXLEA EAX, formatPUSH EAXCALL scanfADD ESP, 8PUSH inputLEA EAX, messagePUSH EAXCALL printfADD ESP, 8}return 0;
}

5. Addressing modes

The way of forming operand addresses. Offering various addressing modes support better the needs of HLLs when they need to manipulate large data structures.

Immediate mode
MOV EAX, 104
Part of the binary code here is the value (= 104) of the operand

Data Register Direct
MOV EAX, EBX
This is the fastest to execute

Memory Direct
MOV EAX, a
a is a variable, stored in memory and the instruction contains the address of this variable.

Address Register Direct
LEA EAX, message
The instruction, contains the address of message variable, which is loaded into EAX register after the execution of the instruction

Register Indirect
MOV EAX, [EBX]
The instruction copies to the EAX register the content of a memory location with the address stored in EBX

Indexed Register Indirect with displacement
MOV EAX, [array + ESI]
MOV EAX, array[ESI]

通过循环操作数组

.data
array WORD 100h, 200h, 300h, 400h
.codeMOV EDI, OFFSET arrayMOV ECX, LENGTHOF arrayMOV AX, 0
L1:ADD ax, [EDI]ADD EDI, TYPE arrayLOOP L1

这里 [EDI] 是间接引用,不是直接使用寄存器中的值,而是把这个值作为地址,取该地址中的值。

.data
source BYTE "This is the source string", 0
target BYTE SIZEOF source DUP(0)
.codeMOV ESI, 0MOV ECX, SIZEOF source
L1:MOV al, source[ESI]MOV target[ESI], alINC ESILOOP L1

此处的 source[ESI] 表示取 source 中的某个元素,与 C++ 类似

The assembler calculate the distance between the offset of the following instruction and the target lable. It is called relative offset. The relative offset is added to EIP



6. Flags Affected by Arithmetic

The ALU has a number of status flags that reflect the outcome of arithmetic (and bitwise) operations based on the contents of the destination operand.
The MOV Instruction never affects the flags


6.1 Zero Flag (ZF)

The Zero Flag is set when the result of an operation produces zero in the destination operand.
当运算结果为 0 的时候,ZF 为 1


6.2 Sign Flag (SF)

The Sign Flag is set when the destination operand is negative, the flag is clear when the destination is positive.
数据的最高位是符号位,SF 是符号位的拷贝



7. Useful Operators

7.1 OFFSET

Returns the distance in bytes, of a label from the beginning of its enclosing segment
返回某变量在该片段中的偏移量(也就是地址),可类比 C / C++ 中的指针
在这里插入图片描述

7.2 TYPE

Returns the size in bytes.


7.3 LENGTHOF

Counts the number of elements in a single data declaration
在这里插入图片描述
x DUP(y) : 将 y y y 重复 x x x


7.4 SIZEOF

Returns a value that is equivalent to multiplying LENGTHOF by TYPE
有点像 C++ 中的 sizeof



8. Spanning Multiple Lines

A data declaration spans multiple lines if each line (except the last) ends with a comma. The LENGTHOF and SIZEOF operatiors include all lines belonging to the declaration.



9. LABEL Directive

Assigns an alternate label name and type to an existing storage location. LABEL does not allocate any storage of its own

LABEL 指令详解 (https://blog.csdn.net/deniece1/article/details/103213681)


10. Subroutine

label PROC...RET
label ENDP

The procedure can be called by the instruction CALL label

  • CALL : Records the current value of EIP as the return address and push into the stack. Places the required subroutine address into EIP

  • RET : Changes the control, causing execution to continue from the point following the CALL by poping the last address stored in the stack and put it into EIP


10.1 Value Parameters

普通的值传递

e.g. 返回两数中较大者

	MOV EAX, firstMOV EBX, secondCALL biggerMOV max, EAXbigger PROCMOV save1, EAXMOV save2, EBXCMP EAX, EBXJG first_bigMOV EAX, save2RET
first_big:MOV EAX, save1RET
bigger ENDP

10.2 Reference Parameters

引用传递,传递的是地址而不是数值,所以会直接改变原变量的值。

e.g.

	LEA EAX, firstLEA EBX, secondCALL swap
swap PROCMOV temp, [EAX]MOV [EBX], [EBX]MOV [EBX], tempRET
swap ENDP

10.3 Stack Frame

在 10.1 和 10.2 中我们使用寄存器来实现值的传递,但是这个方式太过局限了,可以使用堆栈 (Stack frame) 来替代,使其更加灵活

Just before and during the call of a subroutine the following happens:

  • The parameters are pushed on the stack
  • The return address is pushed on the stack
  • The address stored in EBP is pushed on the stack
  • A new stack frame is created
  • The current address of the top of the new stack frame is saved in EBP
  • The local variables are installed on the new stack

Once the subroutine done its job:

  • Pop all local variables out of the stack
  • Pop the previous EBP address from the top of the stack and restore it in EBP
  • Clean up parameters in the stack
  • Pop the return address and save it in EIP

popping order is crucial


10.4 Recursive subroutines

定义:递归:见递归

函数自己调用自己。

e.g. 阶乘

factorial 	PROCPUSH	EAXDEC		EAXJZ		finishCALL	factorialPUSH	EAXCALL	multiplyRET
finish:POP		EAXRET
factorial 	ENDPmultiply 	PROCPOP		EAXMOV		aux, EAXPOP		EAXMUL		EAX, auxRET
multiply	ENDP

这篇关于Assembly Language的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/650721

相关文章

论文翻译:arxiv-2024 Benchmark Data Contamination of Large Language Models: A Survey

Benchmark Data Contamination of Large Language Models: A Survey https://arxiv.org/abs/2406.04244 大规模语言模型的基准数据污染:一项综述 文章目录 大规模语言模型的基准数据污染:一项综述摘要1 引言 摘要 大规模语言模型(LLMs),如GPT-4、Claude-3和Gemini的快

论文翻译:ICLR-2024 PROVING TEST SET CONTAMINATION IN BLACK BOX LANGUAGE MODELS

PROVING TEST SET CONTAMINATION IN BLACK BOX LANGUAGE MODELS https://openreview.net/forum?id=KS8mIvetg2 验证测试集污染在黑盒语言模型中 文章目录 验证测试集污染在黑盒语言模型中摘要1 引言 摘要 大型语言模型是在大量互联网数据上训练的,这引发了人们的担忧和猜测,即它们可能已

UML- 统一建模语言(Unified Modeling Language)创建项目的序列图及类图

陈科肇 ============= 1.主要模型 在UML系统开发中有三个主要的模型: 功能模型:从用户的角度展示系统的功能,包括用例图。 对象模型:采用对象、属性、操作、关联等概念展示系统的结构和基础,包括类图、对象图、包图。 动态模型:展现系统的内部行为。 包括序列图、活动图、状态图。 因为要创建个人空间项目并不是一个很大的项目,我这里只须关注两种图的创建就可以了,而在开始创建UML图

速通GPT-3:Language Models are Few-Shot Learners全文解读

文章目录 论文实验总览1. 任务设置与测试策略2. 任务类别3. 关键实验结果4. 数据污染与实验局限性5. 总结与贡献 Abstract1. 概括2. 具体分析3. 摘要全文翻译4. 为什么不需要梯度更新或微调⭐ Introduction1. 概括2. 具体分析3. 进一步分析 Approach1. 概括2. 具体分析3. 进一步分析 Results1. 概括2. 具体分析2.1 语言模型

[论文笔记]Making Large Language Models A Better Foundation For Dense Retrieval

引言 今天带来北京智源研究院(BAAI)团队带来的一篇关于如何微调LLM变成密集检索器的论文笔记——Making Large Language Models A Better Foundation For Dense Retrieval。 为了简单,下文中以翻译的口吻记录,比如替换"作者"为"我们"。 密集检索需要学习具有区分性的文本嵌入,以表示查询和文档之间的语义关系。考虑到大语言模

【Live Archive】6393 Self-Assembly【强连通】

传送门:【Live Archive】6393 Self-Assembly 题目分析: 假设我们只用到向上或者向右的块,这样我们只要找到一个回路使得某个块可以和第一个块一样,那么我们就相当于找到了一个循环,这样就可以无限循环了。 但是我们要怎样去找这么一个环?考虑到必须是对应字母 X+,X− X^+,X^-才能建边,然后一个环中一定是多个一对一对的这样的对应字母组成的。 可以发现块的数量那么

教育LLM—大型教育语言模型: 调查,原文阅读:Large Language Models for Education: A Survey

Large Language Models for Education: A Survey 大型教育语言模型: 调查 paper: https://arxiv.org/abs/2405.13001 文章目录~ 原文阅读Abstract1 Introduction2 Characteristics of LLM in Education2.1.Characteristics of LLM

C# Assembly

Ⅰ.Assembly应用场景 Assembly 是 .NET 中的一个核心概念,代表了编译后的代码库(如 .exe 或 .dll 文件)。在 C# 开发中,Assembly 有许多实际应用场景。以下是一些常见的场景和示例: 1. 动态加载程序集 在运行时加载和使用程序集,而不是在编译时引用。这在插件系统或模块化应用程序中非常有用。 应用场景: 插件系统:根据需要动态加载插件或模块。版本控

If an application has more than one locale, then all the strings declared in one language should als

字符串资源多国语言版本的出错问题 假如你仅仅针对国内语言 加上这句即可 //保留中文支持resConfigs "zh"

Large Language Models(LLMs) Concepts

1、Introduction to Large Language Models(LLM) 1.1、Definition of LLMs Large: Training data and resources.Language: Human-like text.Models: Learn complex patterns using text data. The LLM is conside