本文主要是介绍mit6.828 Lab1,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
Part 1
Exercise 1
熟悉x86汇编语言
参考资料:https://pdos.csail.mit.edu/6.828/2018/readings/pcasm-book.pdf
这本书中介绍的是使用nasm汇编器所支持的汇编(Intel Syntax),但在这个lab中实际使用的是GNU汇编器(AT&T Syntax)
http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html
Exercise 2
使用gdb对JOS进行调试,调试步骤先在lab的终端中输入make qemu-gdb,再开另外一个终端输入make gdb
用GDB的si指令去追踪BIOS中所用到的指令
Part 2
Exercise 3
查看lab tools guide,这里面包括一些调试OS特殊的GDB技巧
See the GDB manual for a full guide to GDB commands. Here are some particularly useful commands for 6.828, some of which don't typically come up outside of OS development.Ctrl-c
Halt the machine and break in to GDB at the current instruction. If QEMU has multiple virtual CPUs, this halts all of them.
c (or continue)
Continue execution until the next breakpoint or Ctrl-c.
si (or stepi)
Execute one machine instruction.
b function or b file:line (or breakpoint)
Set a breakpoint at the given function or line.
b *addr (or breakpoint)
Set a breakpoint at the EIP addr.
set print pretty
Enable pretty-printing of arrays and structs.
info registers
Print the general purpose registers, eip, eflags, and the segment selectors. For a much more thorough dump of the machine register state, see QEMU's own info registers command.
x/Nx addr
Display a hex dump of N words starting at virtual address addr. If N is omitted, it defaults to 1. addr can be any expression.
x/Ni addr
Display the N assembly instructions starting at addr. Using $eip as addr will display the instructions at the current instruction pointer.
symbol-file file
(Lab 3+) Switch to symbol file file. When GDB attaches to QEMU, it has no notion of the process boundaries within the virtual machine, so we have to tell it which symbols to use. By default, we configure GDB to use the kernel symbol file, obj/kern/kernel. If the machine is running user code, say hello.c, you can switch to the hello symbol file using symbol-file obj/user/hello.
QEMU represents each virtual CPU as a thread in GDB, so you can use all of GDB's thread-related commands to view or manipulate QEMU's virtual CPUs.thread n
GDB focuses on one thread (i.e., CPU) at a time. This command switches that focus to thread n, numbered from zero.
info threads
List all threads (i.e., CPUs), including their state (active or halted) and what function they're in.
- 通过指令b *0x7c00在地址0x7c00中打个断点
- 通过指令c来时程序运行到断点处
- 使用si进行单步执行,与boot.S中的指令进行比较
- 使用x/8i $eip指令来查看从当前执行到的地址开始的8条指令
- 对boot/main.c中的bootmain()打断点(尚未完成,打不到该函数的断点)
回答以下问题:
- 在哪里开始处理器开始执行32位代码?是什么导致从16位到32位的转换?
从下面这段开始处理器开始执行32位代码
# Jump to next instruction, but in 32-bit code segment.# Switches processor into 32-bit mode.ljmp $PROT_MODE_CSEG, $protcseg
- boot loader的最后一条指令是什么?加载kernel后的第一条指令是什么?
- kernel的第一条指令的地址是什么?
- 为了将整个kernel加载从磁盘加载到内存boot loader如何决定要加载多少个扇区?boot loader从哪里找到要加载的扇区的信息
Exercise 4
- 推荐弄懂其中的每一个细节,保证对C语言的掌握足以应对接下来的实验
pointer.c
#include <stdio.h>
#include <stdlib.h>void
f(void)
{int a[4];int *b = malloc(16);int *c;int i;printf("1: a = %p, b = %p, c = %p\n", a, b, c);c = a;for (i = 0; i < 4; i++)a[i] = 100 + i;c[0] = 200;printf("2: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",a[0], a[1], a[2], a[3]);c[1] = 300;*(c + 2) = 301;3[c] = 302; //C语言中,数组和下标可以互换,这是由数组下标的指针定义决定的,由于存在加法交换律,只要一个是指针,另一个是整型就 //行,而无关顺序,a[3]等价于3[a],等价于*(a+3),等价于*(3+a)。printf("3: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",a[0], a[1], a[2], 3[a]);c = c + 1;*c = 400;printf("4: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",a[0], a[1], a[2], a[3]);c = (int *) ((char *) c + 1);*c = 500;printf("5: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",a[0], a[1], a[2], a[3]);b = (int *) a + 1; // addr = a.addr + sizeof(int) * 1,指针的算术运算c = (int *) ((char *) a + 1); // addr = a.addr + sizeof(char) * 1printf("6: a = %p, b = %p, c = %p\n", a, b, c);
}int
main(int ac, char **av)
{f();return 0;
}
输出结果
1: a = 0x7fffb9dbf680, b = 0x117f010, c = 0x1
2: a[0] = 200, a[1] = 101, a[2] = 102, a[3] = 103
3: a[0] = 200, a[1] = 300, a[2] = 301, a[3] = 302
4: a[0] = 200, a[1] = 400, a[2] = 301, a[3] = 302
5: a[0] = 200, a[1] = 128144, a[2] = 256, a[3] = 302
6: a = 0x7fffb9dbf680, b = 0x7fffb9dbf684, c = 0x7fffb9dbf681
- C语言中,数组和下标可以互换,这是由数组下标的指针定义决定的,由于存在加法交换律,只要一个是指针,另一个是整型就行,而无关顺序,a[3]等价于3[a],等价于*(a+3),等价于*(3+a)。
当对一个C程序进行编译链接时,编译器将C源代码转化为包含二进制格式的汇编指令的object file(.o),链接器将所有的obejct file链接成单个二进制镜像,该镜像以ELF(executable linklable format)为标准。
- ELF
- ELF文件由4部分组成,分别是ELF头(ELF header)、程序头表(Program header table)、节(Section)和节头表(Section header table)
elf section:
- .text: 存放程序的可执行指令
- .rodata:存放只读数据,例如C语言中的字符串常量
- .data:存放程序中已初始化的数据,例如被初始化的全局变量
列出obj/kern/kernel所有section的信息
$objdump -h obj/kern/kernelobj/kern/kernel: 文件格式 elf32-i386节:
Idx Name Size VMA LMA File off Algn0 .text 000019e9 f0100000 00100000 00001000 2**4CONTENTS, ALLOC, LOAD, READONLY, CODE1 .rodata 000006c0 f0101a00 00101a00 00002a00 2**5CONTENTS, ALLOC, LOAD, READONLY, DATA2 .stab 00003b95 f01020c0 001020c0 000030c0 2**2CONTENTS, ALLOC, LOAD, READONLY, DATA3 .stabstr 00001948 f0105c55 00105c55 00006c55 2**0CONTENTS, ALLOC, LOAD, READONLY, DATA4 .data 00009300 f0108000 00108000 00009000 2**12CONTENTS, ALLOC, LOAD, DATA5 .got 00000008 f0111300 00111300 00012300 2**2CONTENTS, ALLOC, LOAD, DATA6 .got.plt 0000000c f0111308 00111308 00012308 2**2CONTENTS, ALLOC, LOAD, DATA7 .data.rel.local 00001000 f0112000 00112000 00013000 2**12CONTENTS, ALLOC, LOAD, DATA8 .data.rel.ro.local 00000044 f0113000 00113000 00014000 2**2CONTENTS, ALLOC, LOAD, DATA9 .bss 00000648 f0113060 00113060 00014060 2**5CONTENTS, ALLOC, LOAD, DATA10 .comment 0000002b 00000000 00000000 000146a8 2**0CONTENTS, READONLY
加载地址为程序在外存中存储的位置,链接地址为加载到内存中的地址。
Exercise 5
将Makefrag中的链接地址从0x7c00改为0x7c10,重新构建项目,将断点打在0x7c2a
(gdb) b *0x7c2a
Breakpoint 1 at 0x7c2a
(gdb) c
Continuing.
[ 0:7c2a] => 0x7c2a: mov %eax,%cr0Breakpoint 1, 0x00007c2a in ?? ()
(gdb) si
[ 0:7c2d] => 0x7c2d: ljmp $0x8,$0x7c42
0x00007c2d in ?? () //报错
Exercise 6
加载kernel前查看0x00100000地址处的内容
(gdb) x /8wx 0x00100000
0x100000: 0x00000000 0x00000000 0x00000000 0x00000000
0x100010: 0x00000000 0x00000000 0x00000000 0x00000000
加载kernel后查看0x00100000地址处的内容
(gdb) b *0x10000c
Breakpoint 2 at 0x10000c
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x10000c: movw $0x1234,0x472Breakpoint 2, 0x0010000c in ?? ()
(gdb) x /8wx 0x00100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0x100010: 0x34000004 0x2000b812 0x220f0011 0xc0200fd8
(gdb)
Exercise 7
trace到movl %eax, %cr0指令处,检查0x00100000和0xf0100000处的内存
(gdb) b *0x100025
Breakpoint 1 at 0x100025
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x100025: mov %eax,%cr0Breakpoint 1, 0x00100025 in ?? ()
(gdb) x/8x 0x00100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0x100010: 0x34000004 0x2000b812 0x220f0011 0xc0200fd8
(gdb) x/8x 0xf0100000
0xf0100000 <_start+4026531828>: 0x00000000 0x00000000 0x00000000 0x00000000
0xf0100010 <entry+4>: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) si
=> 0x100028: mov $0xf010002f,%eax
0x00100028 in ?? ()
(gdb) x/8x 0xf0100000
0xf0100000 <_start+4026531828>: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0xf0100010 <entry+4>: 0x34000004 0x2000b812 0x220f0011 0xc0200fd8
(gdb) x/8x 0x00100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0x100010: 0x34000004 0x2000b812 0x220f0011 0xc0200fd8
将movl %eax, %cr0注释掉后,重新构建项目,运行会出现以下错误:
qemu: fatal: Trying to execute code outside RAM or ROM at 0xf010002c
Exercise 8
c语言中可变参数的用法
- 可变参数定义在stdarg.h中
- va_list定义一个指向参数列表的指针
- void va_start(va_list, last_arg); 对va_list的初始化,last_arg为省略号前的那个参数
- type va_arg(va_list, type) 获取参数的下一个参数,并以type类型返回
- void va_end(va_list ap) 回收参数列表
example
#include "stdarg.h"
#include <iostream>int sum(char* msg, ...);int main()
{int total = 0;total = sum("hello world", 1, 2, 3);std::cout << "total = " << total << std::endl;system("pause");return 0;
}int sum(char* msg, ...)
{va_list vaList; //定义一个具有va_list型的变量,这个变量是指向参数的指针。va_start(vaList, msg);//第一个参数指向可变列表的地址,地址自动增加,第二个参数位固定值std::cout << msg << std::endl;int sumNum = 0;int step;while ( 0 != (step = va_arg(vaList, int)))//va_arg第一个参数是可变参数的地址,第二个参数是传入参数的类型,返回值就是va_list中接着的地址值,类型和va_arg的第二个参数一样{ //va_arg 取得下一个指针//不等于0表示,va_list中还有参数可取sumNum += step;}va_end(vaList);//结束可变参数列表return sumNum;
}
- 补充%o输出
case 'o':// Replace this with your code.num = getint(&ap, lflag);if ((long long) num < 0) { // 判断该数是否为负数,如果是负数在屏幕上显示负号putch('-', putdat);num = -(long long) num; // abs(num)}base = 16;goto number;
处理与%d类似,只是base改为16
- question 1:
Explain the interface between printf.c and console.c. Specifically, what function does console.c export? How is this function used by printf.c?
答:console.c中提供了cputchar()函数 - question 2:
Explain the following from console.c:
1 if (crt_pos >= CRT_SIZE) {
2 int i;
3 memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
4 for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
5 crt_buf[i] = 0x0700 | ' ';
6 crt_pos -= CRT_COLS;
7 }
答:将1-79行挪到0-78行,将79行的每个字符全部置为’ ’
-
question 3
int x = 1, y = 3, z = 4;
cprintf(“x %d, y %x, z %d\n”, x, y, z);
In the call to cprintf(), to what does fmt point? To what does ap point?
List (in order of execution) each call to cons_putc, va_arg, and vcprintf. For cons_putc, list its argument as well. For va_arg, list what ap points to before and after the call. For vcprintf list the values of its two arguments.
答:fmt是格式化输出字符串,ap是输出的数据 -
question 4
Run the following code.
unsigned int i = 0x00646c72;cprintf("H%x Wo%s", 57616, &i);
运行后屏幕输出
He110 World
原因:d(57601) = 0xe101,ASCII(0x72) = r,ASCII(0x6c)=l,ASCII(0x64)=d,ASCII(0x00) = ‘\0’
,x86是little-endian,显示rld.
- question 5
In the following code, what is going to be printed after ‘y=’? (note: the answer is not a specific value.) Why does this happen?
cprintf("x=%d y=%d", 3);
答:由于y未指定,所以会输出一个不确定的值。
Exercise 9
堆栈的初始化位于entry.S line 75
movl $0x0,%ebp # nuke frame pointer# Set the stack pointermovl $(bootstacktop),%esp# now to C codecall i386_init
根据上面的指令可知第一个栈的栈底是0x0,当调用i386_init()后会push eip,push ebp,更新ebp。
kernel通过在entry.S 的bootstack段中使用.space伪指令来预留堆栈空间
bootstack:.space KSTKSIZE # 8 * 4096,预留32KB
bootstacktop是预留堆栈的顶端
Exercise 10
使用gdb在obj/kern/kernel.asm中的test_backtrace()打上断电,查看函数调用的细节。当使用call指令时,第1步push返回地址,第2步push ebp,第3步更新ebp为esp的值(mov %esp %ebp)。
disas test_backtrace
0xf0100040 <+0>: push %ebp
=> 0xf0100041 <+1>: mov %esp,%ebp0xf0100043 <+3>: push %esi0xf0100044 <+4>: push %ebx0xf0100045 <+5>: call 0xf01001bc <__x86.get_pc_thunk.bx>0xf010004a <+10>: add $0x112be,%ebx0xf0100050 <+16>: mov 0x8(%ebp),%esi0xf0100053 <+19>: sub $0x8,%esp0xf0100056 <+22>: push %esi0xf0100057 <+23>: lea -0xf868(%ebx),%eax0xf010005d <+29>: push %eax0xf010005e <+30>: call 0xf0100a79 <cprintf>0xf0100063 <+35>: add $0x10,%esp0xf0100066 <+38>: test %esi,%esi0xf0100068 <+40>: jg 0xf0100095 <test_backtrace+85>0xf010006a <+42>: sub $0x4,%esp0xf010006d <+45>: push $0x00xf010006f <+47>: push $0x00xf0100071 <+49>: push $0x00xf0100073 <+51>: call 0xf0100883 <mon_backtrace>0xf0100078 <+56>: add $0x10,%esp0xf010007b <+59>: sub $0x8,%esp
Exercise 11
Exercise 12
objdump -G obj/kern/kernel > output.md将内核的符号表信息输出到output.md文件,在output.md文件中可以看到以下片段:
Symnum n_type n_othr n_desc n_value n_strx String
118 FUN 0 0 f01000a6 2987 i386_init:F(0,25)
119 SLINE 0 24 00000000 0
120 SLINE 0 34 00000012 0
121 SLINE 0 36 00000017 0
122 SLINE 0 39 0000002b 0
123 SLINE 0 43 0000003a 0
这个片段是什么意思呢?首先要理解第一行给出的每列字段的含义:
Symnum是符号索引,换句话说,整个符号表看作一个数组,Symnum是当前符号在数组中的下标
n_type是符号类型,FUN指函数名,SLINE指在text段中的行号
n_othr目前没被使用,其值固定为0
n_desc表示在文件中的行号
n_value表示地址。特别要注意的是,这里只有FUN类型的符号的地址是绝对地址,SLINE符号的地址是偏移量,其实际地址为函数入口地址加上偏移量。比如第3行的含义是地址f01000b8(=0xf01000a6+0x00000012)对应文件第34行。
理解stabs每行记录的含义后,调用stab_binsearch便能找到某个地址对应的行号了。由于前面的代码已经找到地址在哪个函数里面以及函数入口地址,将原地址减去函数入口地址即可得到偏移量,再根据偏移量在符号表中的指定区间查找对应的记录即可。
objdump -h kernel
kernel: 文件格式 elf32-i386节:
Idx Name Size VMA LMA File off Algn0 .text 00001ad9 f0100000 00100000 00001000 2**4CONTENTS, ALLOC, LOAD, READONLY, CODE1 .rodata 00000714 f0101ae0 00101ae0 00002ae0 2**5CONTENTS, ALLOC, LOAD, READONLY, DATA2 .stab 00003cd9 f01021f4 001021f4 000031f4 2**2CONTENTS, ALLOC, LOAD, READONLY, DATA3 .stabstr 0000196b f0105ecd 00105ecd 00006ecd 2**0CONTENTS, ALLOC, LOAD, READONLY, DATA4 .data 00009300 f0108000 00108000 00009000 2**12CONTENTS, ALLOC, LOAD, DATA5 .got 00000008 f0111300 00111300 00012300 2**2CONTENTS, ALLOC, LOAD, DATA6 .got.plt 0000000c f0111308 00111308 00012308 2**2CONTENTS, ALLOC, LOAD, DATA7 .data.rel.local 00001000 f0112000 00112000 00013000 2**12CONTENTS, ALLOC, LOAD, DATA8 .data.rel.ro.local 00000044 f0113000 00113000 00014000 2**2CONTENTS, ALLOC, LOAD, DATA9 .bss 00000648 f0113060 00113060 00014060 2**5CONTENTS, ALLOC, LOAD, DATA10 .comment 0000002b 00000000 00000000 000146a8 2**0CONTENTS, READONLY
注意printf的这个用法printf("%.*s", length, string)
- lab1获取50分
根据测试的脚本中的正则表达式来推出需要正确打印的内容
mon_backtrace中所添加的内容
struct Eipdebuginfo info;if (debuginfo_eip(p[1], &info) == 0) {cprintf("\t%s:%d: %.*s+%d\n", info.eip_file, info.eip_line, info.eip_fn_namelen,info.eip_fn_name, p[1] - info.eip_fn_addr);}
相关知识
MMU(memory management unit):内存管理单元,负责将虚拟地址映射为物理机制,以及提供硬件的访问授权。
-
GCC中的内嵌ASM
__volatile__修饰符的作用是让编译器不要把这条指令优化掉 -
汇编中in和out指令
IN AL,21H;表示从21H端口读取一字节数据到ALIN AX,21H;表示从端口地址21H读取1字节数据到AL,从端口地址22H读取1字节到AHMOV DX,379HIN AL,DX ;从端口379H读取1字节到ALOUT 21H,AL;将AL的值写入21H端口OUT 21H,AX;将AX的值写入端口地址21H开始的连续两个字节。(port[21H]=AL,port[22h]=AH)MOV DX,378HOUT DX,AX ;将AH和AL分别写入端口379H和378H
-
cld与DF标志
DF在串处理指令中,控制每次操作后si和di的增减
DF=0时,每次操作后si,di递增
DF=1时,每次操作后si,di递减
cld指令的作用是将DF置0 -
关于内存的使用
-
关于GDT和LDT
首先,在计算机中存在两个表,GDT,LDT。它们两个其实是同类型的表,前者叫做全局段描述符表,后者叫做本地段描述符表。他们都是用来存放关于某个运行在内存中的程序的分段信息的。比如某个程序的代码段是从哪里开始,有多大;数据段又是从哪里开始,有多大。GDT表是全局可见的,也就是说每一个运行在内存中的程序都能看到这个表。所以操作系统内核程序的段信息就存在这里面。还有一个LDT表,这个表是每一个在内存中的程序都包含的,里面指明了每一个程序的段信息。我们可以看一下这两个表的结构,如下图所示:
Base : 32位,代表这个程序的这个段的基地址。
Limit : 20位,代表这个程序的这个段的大小。
Flags :12位,代表这个程序的这个段的访问权限
这篇关于mit6.828 Lab1的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!