mit6.828 Lab1

本文主要是介绍mit6.828 Lab1，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

Part 1

Exercise 1

熟悉x86汇编语言
参考资料：https://pdos.csail.mit.edu/6.828/2018/readings/pcasm-book.pdf
这本书中介绍的是使用nasm汇编器所支持的汇编(Intel Syntax)，但在这个lab中实际使用的是GNU汇编器(AT&T Syntax)
http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html

Exercise 2

使用gdb对JOS进行调试，调试步骤先在lab的终端中输入make qemu-gdb，再开另外一个终端输入make gdb
用GDB的si指令去追踪BIOS中所用到的指令

Part 2

Exercise 3

查看lab tools guide，这里面包括一些调试OS特殊的GDB技巧

See the GDB manual for a full guide to GDB commands. Here are some particularly useful commands for 6.828, some of which don't typically come up outside of OS development.Ctrl-c
Halt the machine and break in to GDB at the current instruction. If QEMU has multiple virtual CPUs, this halts all of them.
c (or continue)
Continue execution until the next breakpoint or Ctrl-c.
si (or stepi)
Execute one machine instruction.
b function or b file:line (or breakpoint)
Set a breakpoint at the given function or line.
b *addr (or breakpoint)
Set a breakpoint at the EIP addr.
set print pretty
Enable pretty-printing of arrays and structs.
info registers
Print the general purpose registers, eip, eflags, and the segment selectors. For a much more thorough dump of the machine register state, see QEMU's own info registers command.
x/Nx addr
Display a hex dump of N words starting at virtual address addr. If N is omitted, it defaults to 1. addr can be any expression.
x/Ni addr
Display the N assembly instructions starting at addr. Using $eip as addr will display the instructions at the current instruction pointer.
symbol-file file
(Lab 3+) Switch to symbol file file. When GDB attaches to QEMU, it has no notion of the process boundaries within the virtual machine, so we have to tell it which symbols to use. By default, we configure GDB to use the kernel symbol file, obj/kern/kernel. If the machine is running user code, say hello.c, you can switch to the hello symbol file using symbol-file obj/user/hello.
QEMU represents each virtual CPU as a thread in GDB, so you can use all of GDB's thread-related commands to view or manipulate QEMU's virtual CPUs.thread n
GDB focuses on one thread (i.e., CPU) at a time. This command switches that focus to thread n, numbered from zero.
info threads
List all threads (i.e., CPUs), including their state (active or halted) and what function they're in.

通过指令b *0x7c00在地址0x7c00中打个断点
通过指令c来时程序运行到断点处
使用si进行单步执行，与boot.S中的指令进行比较
使用x/8i $eip指令来查看从当前执行到的地址开始的8条指令

对boot/main.c中的bootmain()打断点（尚未完成，打不到该函数的断点）

回答以下问题：

在哪里开始处理器开始执行32位代码？是什么导致从16位到32位的转换？
从下面这段开始处理器开始执行32位代码

  # Jump to next instruction, but in 32-bit code segment.# Switches processor into 32-bit mode.ljmp    $PROT_MODE_CSEG, $protcseg

boot loader的最后一条指令是什么？加载kernel后的第一条指令是什么？
kernel的第一条指令的地址是什么？
为了将整个kernel加载从磁盘加载到内存boot loader如何决定要加载多少个扇区？boot loader从哪里找到要加载的扇区的信息

Exercise 4

推荐弄懂其中的每一个细节，保证对C语言的掌握足以应对接下来的实验
pointer.c

#include <stdio.h>
#include <stdlib.h>void
f(void)
{int a[4];int *b = malloc(16);int *c;int i;printf("1: a = %p, b = %p, c = %p\n", a, b, c);c = a;for (i = 0; i < 4; i++)a[i] = 100 + i;c[0] = 200;printf("2: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",a[0], a[1], a[2], a[3]);c[1] = 300;*(c + 2) = 301;3[c] = 302; //C语言中，数组和下标可以互换，这是由数组下标的指针定义决定的，由于存在加法交换律，只要一个是指针，另一个是整型就			 //行，而无关顺序，a[3]等价于3[a]，等价于*(a+3)，等价于*(3+a)。printf("3: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",a[0], a[1], a[2], 3[a]);c = c + 1;*c = 400;printf("4: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",a[0], a[1], a[2], a[3]);c = (int *) ((char *) c + 1);*c = 500;printf("5: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",a[0], a[1], a[2], a[3]);b = (int *) a + 1;	// addr = a.addr + sizeof(int) * 1，指针的算术运算c = (int *) ((char *) a + 1);	// addr  = a.addr + sizeof(char) * 1printf("6: a = %p, b = %p, c = %p\n", a, b, c);
}int
main(int ac, char **av)
{f();return 0;
}

输出结果

1: a = 0x7fffb9dbf680, b = 0x117f010, c = 0x1
2: a[0] = 200, a[1] = 101, a[2] = 102, a[3] = 103
3: a[0] = 200, a[1] = 300, a[2] = 301, a[3] = 302
4: a[0] = 200, a[1] = 400, a[2] = 301, a[3] = 302
5: a[0] = 200, a[1] = 128144, a[2] = 256, a[3] = 302
6: a = 0x7fffb9dbf680, b = 0x7fffb9dbf684, c = 0x7fffb9dbf681

C语言中，数组和下标可以互换，这是由数组下标的指针定义决定的，由于存在加法交换律，只要一个是指针，另一个是整型就行，而无关顺序，a[3]等价于3[a]，等价于*(a+3)，等价于*(3+a)。

当对一个C程序进行编译链接时，编译器将C源代码转化为包含二进制格式的汇编指令的object file(.o)，链接器将所有的obejct file链接成单个二进制镜像，该镜像以ELF(executable linklable format)为标准。

ELF文件由4部分组成，分别是ELF头（ELF header）、程序头表（Program header table）、节（Section）和节头表（Section header table）
elf section:

.text: 存放程序的可执行指令
.rodata:存放只读数据，例如C语言中的字符串常量
.data:存放程序中已初始化的数据，例如被初始化的全局变量

列出obj/kern/kernel所有section的信息

$objdump -h obj/kern/kernelobj/kern/kernel：     文件格式 elf32-i386节：
Idx Name          Size      VMA       LMA       File off  Algn0 .text         000019e9  f0100000  00100000  00001000  2**4CONTENTS, ALLOC, LOAD, READONLY, CODE1 .rodata       000006c0  f0101a00  00101a00  00002a00  2**5CONTENTS, ALLOC, LOAD, READONLY, DATA2 .stab         00003b95  f01020c0  001020c0  000030c0  2**2CONTENTS, ALLOC, LOAD, READONLY, DATA3 .stabstr      00001948  f0105c55  00105c55  00006c55  2**0CONTENTS, ALLOC, LOAD, READONLY, DATA4 .data         00009300  f0108000  00108000  00009000  2**12CONTENTS, ALLOC, LOAD, DATA5 .got          00000008  f0111300  00111300  00012300  2**2CONTENTS, ALLOC, LOAD, DATA6 .got.plt      0000000c  f0111308  00111308  00012308  2**2CONTENTS, ALLOC, LOAD, DATA7 .data.rel.local 00001000  f0112000  00112000  00013000  2**12CONTENTS, ALLOC, LOAD, DATA8 .data.rel.ro.local 00000044  f0113000  00113000  00014000  2**2CONTENTS, ALLOC, LOAD, DATA9 .bss          00000648  f0113060  00113060  00014060  2**5CONTENTS, ALLOC, LOAD, DATA10 .comment      0000002b  00000000  00000000  000146a8  2**0CONTENTS, READONLY

加载地址为程序在外存中存储的位置，链接地址为加载到内存中的地址。

Exercise 5

将Makefrag中的链接地址从0x7c00改为0x7c10,重新构建项目，将断点打在0x7c2a

(gdb) b *0x7c2a
Breakpoint 1 at 0x7c2a
(gdb) c
Continuing.
[   0:7c2a] => 0x7c2a:	mov    %eax,%cr0Breakpoint 1, 0x00007c2a in ?? ()
(gdb) si
[   0:7c2d] => 0x7c2d:	ljmp   $0x8,$0x7c42
0x00007c2d in ?? ()   //报错

Exercise 6

加载kernel前查看0x00100000地址处的内容

(gdb) x /8wx 0x00100000
0x100000:	0x00000000	0x00000000	0x00000000	0x00000000
0x100010:	0x00000000	0x00000000	0x00000000	0x00000000

加载kernel后查看0x00100000地址处的内容

(gdb) b *0x10000c
Breakpoint 2 at 0x10000c
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x10000c:	movw   $0x1234,0x472Breakpoint 2, 0x0010000c in ?? ()
(gdb) x /8wx 0x00100000
0x100000:	0x1badb002	0x00000000	0xe4524ffe	0x7205c766
0x100010:	0x34000004	0x2000b812	0x220f0011	0xc0200fd8
(gdb)

Exercise 7

trace到movl %eax, %cr0指令处，检查0x00100000和0xf0100000处的内存

(gdb) b *0x100025
Breakpoint 1 at 0x100025
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x100025:	mov    %eax,%cr0Breakpoint 1, 0x00100025 in ?? ()
(gdb) x/8x 0x00100000
0x100000:	0x1badb002	0x00000000	0xe4524ffe	0x7205c766
0x100010:	0x34000004	0x2000b812	0x220f0011	0xc0200fd8
(gdb) x/8x 0xf0100000
0xf0100000 <_start+4026531828>:	0x00000000	0x00000000	0x00000000	0x00000000
0xf0100010 <entry+4>:	0x00000000	0x00000000	0x00000000	0x00000000
(gdb) si
=> 0x100028:	mov    $0xf010002f,%eax
0x00100028 in ?? ()
(gdb) x/8x 0xf0100000
0xf0100000 <_start+4026531828>:	0x1badb002	0x00000000	0xe4524ffe	0x7205c766
0xf0100010 <entry+4>:	0x34000004	0x2000b812	0x220f0011	0xc0200fd8
(gdb) x/8x 0x00100000
0x100000:	0x1badb002	0x00000000	0xe4524ffe	0x7205c766
0x100010:	0x34000004	0x2000b812	0x220f0011	0xc0200fd8

将movl %eax, %cr0注释掉后，重新构建项目，运行会出现以下错误：

qemu: fatal: Trying to execute code outside RAM or ROM at 0xf010002c

Exercise 8

c语言中可变参数的用法

可变参数定义在stdarg.h中
va_list定义一个指向参数列表的指针
void va_start(va_list, last_arg); 对va_list的初始化，last_arg为省略号前的那个参数
type va_arg(va_list, type) 获取参数的下一个参数，并以type类型返回
void va_end(va_list ap) 回收参数列表

example

#include "stdarg.h"
#include <iostream>int sum(char* msg, ...);int main()
{int total = 0;total = sum("hello world", 1, 2, 3);std::cout << "total = " << total << std::endl;system("pause");return 0;
}int sum(char* msg, ...)
{va_list vaList; //定义一个具有va_list型的变量，这个变量是指向参数的指针。va_start(vaList, msg);//第一个参数指向可变列表的地址,地址自动增加，第二个参数位固定值std::cout << msg << std::endl;int sumNum = 0;int step;while ( 0 != (step = va_arg(vaList, int)))//va_arg第一个参数是可变参数的地址，第二个参数是传入参数的类型，返回值就是va_list中接着的地址值，类型和va_arg的第二个参数一样{                          //va_arg 取得下一个指针//不等于0表示，va_list中还有参数可取sumNum += step;}va_end(vaList);//结束可变参数列表return sumNum;
}

补充%o输出

case 'o':// Replace this with your code.num = getint(&ap, lflag);if ((long long) num < 0) { // 判断该数是否为负数，如果是负数在屏幕上显示负号putch('-', putdat);num = -(long long) num;	// abs(num)}base = 16;goto number;

处理与%d类似，只是base改为16

question 1:
Explain the interface between printf.c and console.c. Specifically, what function does console.c export? How is this function used by printf.c?
答：console.c中提供了cputchar()函数
question 2:
Explain the following from console.c:

1      if (crt_pos >= CRT_SIZE) {
2              int i;
3              memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
4              for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
5                      crt_buf[i] = 0x0700 | ' ';
6              crt_pos -= CRT_COLS;
7      }

答：将1-79行挪到0-78行，将79行的每个字符全部置为’ ’

question 3
int x = 1, y = 3, z = 4;
cprintf(“x %d, y %x, z %d\n”, x, y, z);
In the call to cprintf(), to what does fmt point? To what does ap point?
List (in order of execution) each call to cons_putc, va_arg, and vcprintf. For cons_putc, list its argument as well. For va_arg, list what ap points to before and after the call. For vcprintf list the values of its two arguments.
答：fmt是格式化输出字符串，ap是输出的数据
question 4
Run the following code.

    unsigned int i = 0x00646c72;cprintf("H%x Wo%s", 57616, &i);

运行后屏幕输出

He110 World

原因：d(57601) = 0xe101,ASCII(0x72) = r,ASCII(0x6c)=l,ASCII(0x64)=d,ASCII(0x00) = ‘\0’
,x86是little-endian，显示rld.

question 5
In the following code, what is going to be printed after ‘y=’? (note: the answer is not a specific value.) Why does this happen?

   cprintf("x=%d y=%d", 3);

答：由于y未指定，所以会输出一个不确定的值。

Exercise 9

堆栈的初始化位于entry.S line 75

	movl	$0x0,%ebp			# nuke frame pointer# Set the stack pointermovl	$(bootstacktop),%esp# now to C codecall	i386_init

根据上面的指令可知第一个栈的栈底是0x0，当调用i386_init()后会push eip，push ebp，更新ebp。
kernel通过在entry.S 的bootstack段中使用.space伪指令来预留堆栈空间

bootstack:.space		KSTKSIZE	# 8 * 4096,预留32KB

bootstacktop是预留堆栈的顶端

Exercise 10

使用gdb在obj/kern/kernel.asm中的test_backtrace()打上断电，查看函数调用的细节。当使用call指令时，第1步push返回地址，第2步push ebp，第3步更新ebp为esp的值(mov %esp %ebp)。
disas test_backtrace

 0xf0100040 <+0>:	push   %ebp
=> 0xf0100041 <+1>:	mov    %esp,%ebp0xf0100043 <+3>:	push   %esi0xf0100044 <+4>:	push   %ebx0xf0100045 <+5>:	call   0xf01001bc <__x86.get_pc_thunk.bx>0xf010004a <+10>:	add    $0x112be,%ebx0xf0100050 <+16>:	mov    0x8(%ebp),%esi0xf0100053 <+19>:	sub    $0x8,%esp0xf0100056 <+22>:	push   %esi0xf0100057 <+23>:	lea    -0xf868(%ebx),%eax0xf010005d <+29>:	push   %eax0xf010005e <+30>:	call   0xf0100a79 <cprintf>0xf0100063 <+35>:	add    $0x10,%esp0xf0100066 <+38>:	test   %esi,%esi0xf0100068 <+40>:	jg     0xf0100095 <test_backtrace+85>0xf010006a <+42>:	sub    $0x4,%esp0xf010006d <+45>:	push   $0x00xf010006f <+47>:	push   $0x00xf0100071 <+49>:	push   $0x00xf0100073 <+51>:	call   0xf0100883 <mon_backtrace>0xf0100078 <+56>:	add    $0x10,%esp0xf010007b <+59>:	sub    $0x8,%esp

Exercise 11

栈帧结构

Exercise 12

objdump -G obj/kern/kernel > output.md将内核的符号表信息输出到output.md文件，在output.md文件中可以看到以下片段：

Symnum n_type n_othr n_desc n_value  n_strx String
118    FUN    0      0      f01000a6 2987   i386_init:F(0,25)
119    SLINE  0      24     00000000 0      
120    SLINE  0      34     00000012 0      
121    SLINE  0      36     00000017 0      
122    SLINE  0      39     0000002b 0      
123    SLINE  0      43     0000003a 0

这个片段是什么意思呢？首先要理解第一行给出的每列字段的含义：

Symnum是符号索引，换句话说，整个符号表看作一个数组，Symnum是当前符号在数组中的下标
n_type是符号类型，FUN指函数名，SLINE指在text段中的行号
n_othr目前没被使用，其值固定为0
n_desc表示在文件中的行号
n_value表示地址。特别要注意的是，这里只有FUN类型的符号的地址是绝对地址，SLINE符号的地址是偏移量，其实际地址为函数入口地址加上偏移量。比如第3行的含义是地址f01000b8(=0xf01000a6+0x00000012)对应文件第34行。
理解stabs每行记录的含义后，调用stab_binsearch便能找到某个地址对应的行号了。由于前面的代码已经找到地址在哪个函数里面以及函数入口地址，将原地址减去函数入口地址即可得到偏移量，再根据偏移量在符号表中的指定区间查找对应的记录即可。
objdump -h kernel

kernel：     文件格式 elf32-i386节：
Idx Name          Size      VMA       LMA       File off  Algn0 .text         00001ad9  f0100000  00100000  00001000  2**4CONTENTS, ALLOC, LOAD, READONLY, CODE1 .rodata       00000714  f0101ae0  00101ae0  00002ae0  2**5CONTENTS, ALLOC, LOAD, READONLY, DATA2 .stab         00003cd9  f01021f4  001021f4  000031f4  2**2CONTENTS, ALLOC, LOAD, READONLY, DATA3 .stabstr      0000196b  f0105ecd  00105ecd  00006ecd  2**0CONTENTS, ALLOC, LOAD, READONLY, DATA4 .data         00009300  f0108000  00108000  00009000  2**12CONTENTS, ALLOC, LOAD, DATA5 .got          00000008  f0111300  00111300  00012300  2**2CONTENTS, ALLOC, LOAD, DATA6 .got.plt      0000000c  f0111308  00111308  00012308  2**2CONTENTS, ALLOC, LOAD, DATA7 .data.rel.local 00001000  f0112000  00112000  00013000  2**12CONTENTS, ALLOC, LOAD, DATA8 .data.rel.ro.local 00000044  f0113000  00113000  00014000  2**2CONTENTS, ALLOC, LOAD, DATA9 .bss          00000648  f0113060  00113060  00014060  2**5CONTENTS, ALLOC, LOAD, DATA10 .comment      0000002b  00000000  00000000  000146a8  2**0CONTENTS, READONLY

注意printf的这个用法printf("%.*s", length, string)

lab1获取50分
根据测试的脚本中的正则表达式来推出需要正确打印的内容
mon_backtrace中所添加的内容

struct Eipdebuginfo info;if (debuginfo_eip(p[1], &info) == 0) {cprintf("\t%s:%d: %.*s+%d\n", info.eip_file, info.eip_line, info.eip_fn_namelen,info.eip_fn_name,  p[1] - info.eip_fn_addr);}