本文主要是介绍Assembly Language,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
汇编语言是任何一种用于电子计算机、微处理器、微控制器的低级语言,亦称为符号语言。在汇编语言中,用助记符代替机器指令的操作码,用地址符号或标号代替指令或操作数的地址。在不同的设备中,汇编语言对应着不同的机器语言指令集,通过汇编过程转换成机器指令。特定的汇编语言和特定的机器语言指令集是一一对应的,不同平台之间不可直接移植。
文章目录
- 0. Inline Assembler
- 1. General-Purpose Registers (通用寄存器)
- 1.1 Name of the registers
- 1.2 Some Specialized Register Uses
- 2. 数据类型
- 3 Comments
- 4. Instructions
- 4.1 Labels
- 4.2 Mnemonics and Operands
- 4.3 Structure of instructions
- 4.4 Basic instructions
- 4.4.1 MOV
- 4.4.2 XCHG Instruction
- 4.4.3 INC and DEC Instructions
- 4.4.4 ADD and SUB Instructions
- 4.4.5 NEG Instruction
- 4.4.6 Implementing Arithmetic Expressions
- 4.5 Flow Controls
- 4.5.1 JMP
- 4.5.2 JCXZ and JECXZ
- 4.5.3 Other Jump Instructions
- 4.5.4 CMP
- 4.5.5 LOOP
- 4.5.6 LOOPNE
- 4.6 Basic Data Structure
- 4.6.1 Stack
- 4.7 IO Instructions
- 4.7.1 Input
- 4.7.2 Output
- 5. Addressing modes
- 6. Flags Affected by Arithmetic
- 6.1 Zero Flag (ZF)
- 6.2 Sign Flag (SF)
- 7. Useful Operators
- 7.1 OFFSET
- 7.2 TYPE
- 7.3 LENGTHOF
- 7.4 SIZEOF
- 8. Spanning Multiple Lines
- 9. LABEL Directive
- 10. Subroutine
- 10.1 Value Parameters
- 10.2 Reference Parameters
- 10.3 Stack Frame
- 10.4 Recursive subroutines
0. Inline Assembler
这里使用 M i c r o s o f t V i s u a l C + + Microsoft~Visual~C++ Microsoft Visual C++ 来学习汇编的编写。你可以在 C++ 程序内通过使用 _asm
关键字来添加汇编语法。
e.g.
#include <stdio.h>
#include <iostream>
using namespace std;int main(void)
{_asm MOV EAX, var//或者添加多行_asm{MOV EAX, var1MOV EAX, var2}return 0;
}
1. General-Purpose Registers (通用寄存器)
1.1 Name of the registers
汇编中所有的运算与命令都需要放到寄存器内执行,为了方便,汇编已经定义好了一些寄存器的名字。
EAX
, EBX
, ECX
, EDX
have 8-bit name, 16-bit name, 32-bit name.
ESI
, EDI
, EBP
, ESP
only have a 16-bit name for their lower half
1.2 Some Specialized Register Uses
- General-Purpose
EAX
: accumulatorECX
: loop counterESP
: stack pointerESI
,EDI
: index registersEBP
: extended frame pointer (stack) ESP是一个指针,始终执行堆栈的栈顶。而EBP就是那个栈
- Segment
CS
: Code Segment 汇编语言用内存中的某一片连续地址存放代码,称为 Code Segment,首地址存放在CS
中DS
: Data SegmentSS
: Stack SegmentES
,FS
,GS
: additional segment
EIP
: instruction pointerEIP
寄存器存储着 CPU 要读取指令的地址,没有它,CPU 就无法读取下面的指令,每次汇编指令执行完相应的EIP
值就会增加EFLAGS
: 文章1,文章2
2. 数据类型
数据类型与寄存器的对应关系:
BYTE, SBYTE
:AH, AL, BH, BL, CH, CL, DH, DL
WORD, SWORD
:AX, BX, CX, DX, SI, DI
DWORD, SDWORD
:EAX, EBX, ECX, EDX, ESI, EDI
3 Comments
- Single-line comments : begin with semicolon
- Multi-line comments : begin with COMMENT directive and a programmer-chosen character. End with the same programmer-chosen character.
4. Instructions
we use the Intel IA-32 instruction set
An instruction contains :
- Label (optional)
- Mnemonic (required)
- Operand (depends on the instruction)
- Comment (optional)
4.1 Labels
- Act as place markers. Marks the address of code and data.
- Follow identifer rules
- Data label :
- must be unique
- not followed by colon
- Code label :
- target of jump and loop instructions
- followed by colon
4.2 Mnemonics and Operands
- Instruction Mnemonics
- memory aid
- e.g.
MOV
,ADD
,SUB
,MUL
,INC
,DEC
- Operands
- constant
- constant expression : Constants and constant expressions are often called immediate values
- register
- memory (data label)
4.3 Structure of instructions
The binary codes of almost all instruction contain three pieces of information:
- The
action
oroperation
of the instruction - The operands involved (where to find the information to operate with)
- Where the result is to go
Machine instructions are encoded with distinct bit fields in the prefix to contain information about
- The operation required.
- The location of operands and results
- The data type of the operand
The length of the instructions is depended on:
- The operation required
- The addressing modes employed
(Pentium instructions can be from 1 to 15 bytes long)
4.4 Basic instructions
4.4.1 MOV
MOV destination, source
Move from source to destination.
No more than one memory operand permitted, CS
, EIP
, IP
cannot be the destination. No immediate to segment moves. memory-to-memory move not permitted.
Zero Extension
When you copy a smaller value into a larger destination, the MOVZX
instruction fills the upper half of the destination with zeros.
Sign Extension
The MOVSX
instruction fills the upper half of the destination with a copy of the source operand’s sign bit.
4.4.2 XCHG Instruction
Exchanges the values of two operands. At least one operand must be a register. No immediate operands are permitted.
4.4.3 INC and DEC Instructions
INC
: Add 1 from destination operand (register | memory)DEC
: Subtract 1
4.4.4 ADD and SUB Instructions
ADD | SUB destination, source
Same operand rules as for the MOV
instruction
4.4.5 NEG Instruction
NEG source
Reverses the sign of an operand. Operand can be a register or memory operand.
NEG Instruction and the Flags
Any nonzero operand causes the Carry flag to be set.
CF
: 进位标志OF
: 溢出标志
4.4.6 Implementing Arithmetic Expressions
HLL compilers translate mathematical expressions into assembly language.
e.g.
4.5 Flow Controls
4.5.1 JMP
JMP
is an unconditional jump to a label that is usually within the same procedure.
target:...JMP target
logic : EIP
← \leftarrow ← target
4.5.2 JCXZ and JECXZ
There are more than 30 jump instructions, JCXZ
and JECXZ
are to of them, they are conditional jump to test whether CX
and ECX
is zero and remaining jump instructions test the status flags. Jump if the condition is true or continue if it is false.
JCXZ target...
target:
4.5.3 Other Jump Instructions
JC / JB
: Jump if Carry flag is setJNC / JNB
: Jump if Carry flag is clearJE / JZ
: Jump if Zero flag is setJNE / JNZ
: Jump if Zero flag is clearJS
: Jump if Sign flag is setJNS
: Jump if Sign flag is clearJO
: Jump if Overflow flag is setJNO
: Jump if Overflow flag is clear
4.5.4 CMP
The CMP instruction is the most common way to test for conditional jumps.
CMP EAX EBX
It will set zero flag Z = 1 if EAX
and EBX
are the same.
Jumps based on CMP
Assuming execution just after CMP
JE
: Jump if the first operand (in CMP) is equal to the second operand.JNE
: Jump if the first and second operands are not equal.JGE
: Jump if first operand is greater or equalJG
: Jump if first operand is greaterJLE
: Jump if first operand is less or equalJL
: Jump if first operand is less
4.5.5 LOOP
target:...LOOP target
logic : ECX
← \leftarrow ← ECX-1
, if ECX != 0
, jump to target
4.5.6 LOOPNE
e.g. While EAX
is not equal to EBX
, and not 200 times yet:
MOV ECX 200
target:...CMP EAX, EBXLOOPNE target
4.6 Basic Data Structure
4.6.1 Stack
Runtime Stack
Managed by the CPU, using two registers
- SS (stack segment)
- ESP (stack pointer)
PUSH Operation (入栈 / 压栈)
A 32-bit push operation decrements ESP
by 4 and copies a value into the location pointed to by ESP
. The stack grows downward. The area below ESP
is always available (unless the stack has overflowed)
PUSH EAX
POP Operation (出栈 / 弹栈)
Copies value at top of the stack into a register or variable, adds 2 or 4 to ESP
, depends on the attribute of the operand receiving the data.
POP EAX
4.7 IO Instructions
4.7.1 Input
CALL scanf
: It will take two parameters from the stack, the address of the format of the input, and the address of the variable to store the input.
4.7.2 Output
CALL printf
: It will take one parameter from the stack, the variable (not address)
e.g.
#include <stdio.h>
#include <iostream>
using namespace std;int main(void) {char message[] = "The input number is %d\n";char format[] = "%d";int input;_asm {LEA EAX, inputPUSH EAXLEA EAX, formatPUSH EAXCALL scanfADD ESP, 8PUSH inputLEA EAX, messagePUSH EAXCALL printfADD ESP, 8}return 0;
}
5. Addressing modes
The way of forming operand addresses. Offering various addressing modes support better the needs of HLLs when they need to manipulate large data structures.
Immediate mode
MOV EAX, 104
Part of the binary code here is the value (= 104) of the operand
Data Register Direct
MOV EAX, EBX
This is the fastest to execute
Memory Direct
MOV EAX, a
a
is a variable, stored in memory and the instruction contains the address of this variable.
Address Register Direct
LEA EAX, message
The instruction, contains the address of message
variable, which is loaded into EAX
register after the execution of the instruction
Register Indirect
MOV EAX, [EBX]
The instruction copies to the EAX
register the content of a memory location with the address stored in EBX
Indexed Register Indirect with displacement
MOV EAX, [array + ESI]
MOV EAX, array[ESI]
通过循环操作数组
.data
array WORD 100h, 200h, 300h, 400h
.codeMOV EDI, OFFSET arrayMOV ECX, LENGTHOF arrayMOV AX, 0
L1:ADD ax, [EDI]ADD EDI, TYPE arrayLOOP L1
这里 [EDI]
是间接引用,不是直接使用寄存器中的值,而是把这个值作为地址,取该地址中的值。
.data
source BYTE "This is the source string", 0
target BYTE SIZEOF source DUP(0)
.codeMOV ESI, 0MOV ECX, SIZEOF source
L1:MOV al, source[ESI]MOV target[ESI], alINC ESILOOP L1
此处的 source[ESI]
表示取 source
中的某个元素,与 C++ 类似
The assembler calculate the distance between the offset of the following instruction and the target lable. It is called relative offset. The relative offset is added to EIP
6. Flags Affected by Arithmetic
The ALU has a number of status flags that reflect the outcome of arithmetic (and bitwise) operations based on the contents of the destination operand.
The MOV Instruction never affects the flags
6.1 Zero Flag (ZF)
The Zero Flag is set when the result of an operation produces zero in the destination operand.
当运算结果为 0 的时候,ZF 为 1
6.2 Sign Flag (SF)
The Sign Flag is set when the destination operand is negative, the flag is clear when the destination is positive.
数据的最高位是符号位,SF 是符号位的拷贝
7. Useful Operators
7.1 OFFSET
Returns the distance in bytes, of a label from the beginning of its enclosing segment
返回某变量在该片段中的偏移量(也就是地址),可类比 C / C++ 中的指针
7.2 TYPE
Returns the size in bytes.
7.3 LENGTHOF
Counts the number of elements in a single data declaration
x DUP(y)
: 将 y y y 重复 x x x 次
7.4 SIZEOF
Returns a value that is equivalent to multiplying LENGTHOF by TYPE
有点像 C++ 中的 sizeof
8. Spanning Multiple Lines
A data declaration spans multiple lines if each line (except the last) ends with a comma. The LENGTHOF
and SIZEOF
operatiors include all lines belonging to the declaration.
9. LABEL Directive
Assigns an alternate label name and type to an existing storage location. LABEL does not allocate any storage of its own
LABEL 指令详解 (https://blog.csdn.net/deniece1/article/details/103213681)
10. Subroutine
label PROC...RET
label ENDP
The procedure can be called by the instruction CALL label
-
CALL
: Records the current value ofEIP
as the return address and push into the stack. Places the required subroutine address intoEIP
-
RET
: Changes the control, causing execution to continue from the point following theCALL
by poping the last address stored in the stack and put it intoEIP
10.1 Value Parameters
普通的值传递
e.g. 返回两数中较大者
MOV EAX, firstMOV EBX, secondCALL biggerMOV max, EAXbigger PROCMOV save1, EAXMOV save2, EBXCMP EAX, EBXJG first_bigMOV EAX, save2RET
first_big:MOV EAX, save1RET
bigger ENDP
10.2 Reference Parameters
引用传递,传递的是地址而不是数值,所以会直接改变原变量的值。
e.g.
LEA EAX, firstLEA EBX, secondCALL swap
swap PROCMOV temp, [EAX]MOV [EBX], [EBX]MOV [EBX], tempRET
swap ENDP
10.3 Stack Frame
在 10.1 和 10.2 中我们使用寄存器来实现值的传递,但是这个方式太过局限了,可以使用堆栈 (Stack frame) 来替代,使其更加灵活
Just before and during the call of a subroutine the following happens:
- The parameters are pushed on the stack
- The return address is pushed on the stack
- The address stored in
EBP
is pushed on the stack - A new stack frame is created
- The current address of the top of the new stack frame is saved in
EBP
- The local variables are installed on the new stack
Once the subroutine done its job:
- Pop all local variables out of the stack
- Pop the previous
EBP
address from the top of the stack and restore it inEBP
- Clean up parameters in the stack
- Pop the return address and save it in
EIP
popping order is crucial
10.4 Recursive subroutines
定义:递归:见递归
函数自己调用自己。
e.g. 阶乘
factorial PROCPUSH EAXDEC EAXJZ finishCALL factorialPUSH EAXCALL multiplyRET
finish:POP EAXRET
factorial ENDPmultiply PROCPOP EAXMOV aux, EAXPOP EAXMUL EAX, auxRET
multiply ENDP
这篇关于Assembly Language的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!