编译原理Lab2-用bison完成语法分析器

本文主要是介绍编译原理Lab2-用bison完成语法分析器，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

HNU编译原理lab2实验–在 Lab1 已完成的 flex 词法分析器的基础上，进一步使用 bison 完成语法分析器。也就是补全两个文件。（其实我也是抄的，什么也不会 >.>）

本文没有添加任何图片，但是以复制输出的形式展现出来了实验结果。

实验要求

本次实验需要各位同学首先将自己的 lab1 的词法部分复制到 /src/parser 目录的 lexical_analyzer.l并合理修改相应部分，然后根据 cminus-f 的语法补全 syntax_analyer.y 文件，完成语法分析器，要求最终能够输出解析树。如：
输入：

int bar;
float foo(void) { return 1.0; }

则 parser 将输出如下解析树：

>--+ program
|  >--+ declaration-list
|  |  >--+ declaration-list
|  |  |  >--+ declaration
|  |  |  |  >--+ var-declaration
|  |  |  |  |  >--+ type-specifier
|  |  |  |  |  |  >--* int
|  |  |  |  |  >--* bar
|  |  |  |  |  >--* ;
|  |  >--+ declaration
|  |  |  >--+ fun-declaration
|  |  |  |  >--+ type-specifier
|  |  |  |  |  >--* float
|  |  |  |  >--* foo
|  |  |  |  >--* (
|  |  |  |  >--+ params
|  |  |  |  |  >--* void
|  |  |  |  >--* )
|  |  |  |  >--+ compound-stmt
|  |  |  |  |  >--* {
|  |  |  |  |  >--+ local-declarations
|  |  |  |  |  |  >--* epsilon
|  |  |  |  |  >--+ statement-list
|  |  |  |  |  |  >--+ statement-list
|  |  |  |  |  |  |  >--* epsilon
|  |  |  |  |  |  >--+ statement
|  |  |  |  |  |  |  >--+ return-stmt
|  |  |  |  |  |  |  |  >--* return
|  |  |  |  |  |  |  |  >--+ expression
|  |  |  |  |  |  |  |  |  >--+ simple-expression
|  |  |  |  |  |  |  |  |  |  >--+ additive-expression
|  |  |  |  |  |  |  |  |  |  |  >--+ term
|  |  |  |  |  |  |  |  |  |  |  |  >--+ factor
|  |  |  |  |  |  |  |  |  |  |  |  |  >--+ float
|  |  |  |  |  |  |  |  |  |  |  |  |  |  >--* 1.0
|  |  |  |  |  |  |  |  >--* ;
|  |  |  |  |  >--* }

请注意，上述解析树含有每个解析规则的所有子成分，包括诸如 ; { } 这样的符号，请在编写规则时务必不要忘了它们。

实验难点

掌握flex和bison的基本原理和使用方法，怎么将文法产生式转换为bison语句，如何联合flex和bison
了解bison源程序的写法
熟悉语法分析器的工作流程
看懂yylval.node的含义

实验设计

bisonn源程序的写法

bison 是一个语法分析器的生成器，网址为：http://www.delorie.com/gnu/docs/bison/bison_toc.html。
bison 和 flex 配合使用，它可以将用户提供的语法规则转化成一个语法分析器。简单来说，从语法的产生式开始，通过一系列的复杂的构造流程，最终得到了一个动作表，然后又利用这个动作表去解析句子。 bison 就是干这个事情的，它读取用户提供的语法的产生式，生成一个 C 语言格式的 LALR(1) 动作表，并将其包含进一个名为 yyparse 的 C 函数，这个函数的作用就是利用这个动作表来解析 token stream ，而这个 token stream 是由 flex 生成的词法分析器扫描源程序得到。
bison 的自定义语法文件的格式和 flex 分词模式文件的格式非常相似，共分为 4 段，如下：

%{ Declarations %}
Definitions
%% 
Productions 
%% 
User subroutines

其中的 Declarations 段和 User subroutines 和 flex 文件中是一样的， bison 会将这些代码原样的拷贝到 y.tab.c 文件中； Definitions 段和 flex 中的功能也差不多，也是在这个段定义一些 bison 专有的变量，稍后再解释这个文件中的这个段里的代码；最重要的是 Productions 段，这里面是用户编写的语法产生式，比如以下定义的产生式用常规的方式书写的格式如下：

S : S E '\n' { printf("ans = %d\n", $2); } | /* empty */ { /* empty */ } ; 
E : E '+' E { $$ = $1 + $3; } | E '-' E { $$ = $1 - $3; } | E '*' E { $$ = $1 * $3; } | E '/' E { $$ = $1 / $3; } | T_NUM { $$ = $1; } | '(' E ')' { $$ = $2; } 
S -> S E \n | ε 
E -> E + E | E - E | E * E | E / E | T_NUM | ( E )

bison 里面 ”:” 代表一个 “->” ，同一个非终结符的不同产生式用 “|” 隔开，用 ”;” 结束表示一个非终结符产生式的结束；每条产生式的后面花括号内是一段 C 代码、这些代码将在该产生式被应用时执行，这些代码被称为 action ，产生式的中间以及 C 代码内部可以插入注释；产生式右边是 ε 时，不需要写任何符号，一般用一个注释 /* empty */ 代替。

补齐lexical_analyzer.l文件

根据样例的修改参考

/* Example for you :-) */
\+  { pos_start = pos_end; pos_end += 1; pass_node(yytext); return ADD; }

post_start和post_end等部分不变，然后添加pass_node函数，观察到pass_node函数，这个函数用于flex和bison之间的协作。
yytest是匹配到正则表达式的那转字符串
这里对yylval进行了复制，yylval的类型是接下来要补充的union，用来记录token的内容。
所以在词法分析中，每个token匹配时都需要在原基础上添加一句 pass_node(yytext) ，相当于在语法分析树中创建了一个新的语法单元节点。
但是，因为语法分析树不考虑制表符TAB，注释COMMENT，换行EOL以及空格BLANK，所以不用建立节点，但仍需要读取识别，故保持原有的正则表达式以及 pos_start、pos_end 等的代码，但删去return 代码（即不返回，仅读取识别），也不添加 pass_node（因为无需建立节点，也无需传到 bison 中）。
对于语法错误的直接return ERROR即可。

void pass_node(char *text){yylval.node = new_syntax_tree_node(text);
}

因此词法分析部分修改增加为：

/***************TO STUDENTS: Copy your Lab1 here. Make adjustments if necessary.*/
%%
\+  { pos_start = pos_end; pos_end += 1; pass_node(yytext); return ADD; }
\- { pos_start = pos_end;pos_end++; pass_node(yytext); return SUB;}
\* { pos_start = pos_end;pos_end++; pass_node(yytext); return MUL;}
\/ { pos_start = pos_end;pos_end++; pass_node(yytext); return DIV;}
\< { pos_start = pos_end;pos_end++; pass_node(yytext); return LT;}
\<\= { pos_start = pos_end;pos_end += 2; pass_node(yytext); return LTE;}
\> { pos_start = pos_end;pos_end++; pass_node(yytext); return GT;}
\>\= { pos_start = pos_end;pos_end += 2; pass_node(yytext); return GTE;}
\=\= { pos_start = pos_end;pos_end += 2; pass_node(yytext); return EQ;}
\!\= { pos_start = pos_end;pos_end += 2; pass_node(yytext); return NEQ;}
\= { pos_start = pos_end;pos_end++; pass_node(yytext); return ASSIN;}
\; { pos_start = pos_end;pos_end++; pass_node(yytext); return SEMICOLON;}
\, { pos_start = pos_end;pos_end++; pass_node(yytext); return COMMA;}
\( { pos_start = pos_end;pos_end++; pass_node(yytext); return LPARENTHESE;}
\) { pos_start = pos_end;pos_end++; pass_node(yytext); return RPARENTHESE;}
\[ { pos_start = pos_end;pos_end++; pass_node(yytext); return LBRACKET;}
\] { pos_start = pos_end;pos_end++; pass_node(yytext); return RBRACKET;}
\{ { pos_start = pos_end;pos_end++; pass_node(yytext); return LBRACE;}
\} { pos_start = pos_end;pos_end++; pass_node(yytext); return RBRACE;}
else { pos_start = pos_end;pos_end += 4; pass_node(yytext); return ELSE;}
if { pos_start = pos_end;pos_end += 2; pass_node(yytext); return IF;}
int { pos_start = pos_end;pos_end += 3; pass_node(yytext); return INT;}
float { pos_start = pos_end;pos_end += 5; pass_node(yytext); return FLOAT;}
return { pos_start = pos_end;pos_end += 6; pass_node(yytext); return RETURN;}
void { pos_start = pos_end;pos_end += 4; pass_node(yytext); return VOID;}
while { pos_start = pos_end;pos_end += 5; pass_node(yytext); return WHILE;}[a-zA-Z]+ { pos_start = pos_end;pos_end += strlen(yytext); pass_node(yytext); return IDENTIFIER;}
[0-9]+ { pos_start = pos_end;pos_end += strlen(yytext); pass_node(yytext); return INTEGER;}
[0-9]+\.|[0-9]*\.[0-9]+ { pos_start = pos_end;pos_end += strlen(yytext); pass_node(yytext); return FLOATPOINT;}
\[\] { pos_start = pos_end;pos_end += 2; pass_node(yytext); return ARRAY;}
[a-zA-Z] { pos_start = pos_end;pos_end++; pass_node(yytext); return LETTER;}
[\n] { pos_start = 1;pos_end = 1;lines++;}
\/\*([*]*(([^*/])+([/])*)*)*\*\/ {}
[ \f\n\r\t\v] { pos_end++;pos_start = pos_end;}%%
/*Note: don't modify the prologue unless you know what you are doing.
***************/

补全syntax_analyer.y文件

先考虑%union，因为在语法树上，每个节点都对应一个语义值，这个值的类型是 YYSTYPE。YYSTYPE 的具体内容是由 %union 构造指出的。如图所示，不管是%token还是%type，都应该是syntax_tree_node * node类型，这样才能构建解析树。

/* TODO: Complete this definition. */
%union {syntax_tree_node * node;}

根据lexical_analyzer.l中的token定义%token<node> ,以及lab2的要求定义%type<node>。

/* TODO: Your tokens here. */
%token <node> ADD SUB MUL DIV LT LTE GT GTE EQ NEQ ASSINSEMICOLON COMMA LPARENTHESE RPARENTHESE LBRACKET RBRACKETLBRACE RBRACE ELSE IF INT FLOAT RETURN VOID WHILEIDENTIFIER INTEGER FLOATPOINT ARRAY LETTER%type <node> program declaration-list declaration var-declarationtype-specifier fun-declaration params param-list paramcompound-stmt local-declarations statement-liststatement expression-stmt selection-stmt iteration-stmtreturn-stmt expression var simple-expression relopadditive-expression addop term mulop factor integer float callargs arg-list%start program
%%

根据实验中所给的Cminus-f语法，补充每个%type<node>的文法解析。
注意的是每个条文法解析后面都要有分号（除了本身给出的第一条），{}里写与该解析对应的处理代码。例如declaration-list ,其第一条解析所得到的是两个解析符号declaration-list declaration，所以对于该解析要执行的操作是

program : declaration-list
{ $$ = node("program", 1, $1); gt->root = $$; }

再查看 node 函数的函数原型syntax_tree_node *node(const char node_name, int children_num, …)，结合实验资料 README.md 中对于 bison 语法的介绍，可以看出 program 为语法分析的起始，该结点类型为 syntax_tree_node。
==KaTeX parse error: Can't use function '$' in math mode at position 23: …("program", 1, $̲1)表示为当前识别到的 pro…，将 program 结点作为根节点，使其作为语法分析树的起始。
因此可以看出，对于语法的分析结构应该为如下的形式：

parent : son				{ $$ = node("parent", 1, $1);}| daughter			{ $$ = node("parent", 1, $1);}| son AND daughter	{ $$ = node("parent", 3, $1, $2, $3);}

其中 parent、son、daughter 都是 syntax_tree_node* 类型，都为非终结符；AND 也是 syntax_tree_node* 类型，为终结符。

再结合token和Cminus-f语法补全剩下的代码：

/* TODO: Your rules here. */
program : declaration-list { $$ = node("program", 1, $1); gt->root = $$; }declaration-list : declaration-list declaration { $$ = node("declaration-list", 2, $1, $2);}| declaration { $$ = node("declaration-list", 1, $1); }declaration : var-declaration { $$ = node("declaration", 1, $1); }| fun-declaration { $$ = node("declaration", 1, $1); }var-declaration : type-specifier IDENTIFIER SEMICOLON { $$ = node("var-declaration", 3, $1, $2, $3); }| type-specifier IDENTIFIER LBRACKET INTEGER RBRACKET SEMICOLON { $$ = node("var-declaration", 6, $1, $2, $3, $4, $5, $6); }type-specifier : INT { $$ = node("type-specifier", 1, $1); }| FLOAT { $$ = node("type-specifier", 1, $1); }| VOID { $$ = node("type-specifier", 1, $1); }fun-declaration : type-specifier IDENTIFIER LPARENTHESE params RPARENTHESE compound-stmt { $$ = node("fun-declaration", 6, $1, $2, $3, $4, $5, $6); }params : param-list { $$ = node("params", 1, $1); }| VOID { $$ = node("params", 1, $1); }param-list : param-list COMMA param { $$ = node("param-list", 3, $1, $2, $3); }| param { $$ = node("param-list", 1, $1); }param : type-specifier IDENTIFIER { $$ = node("param", 2, $1, $2); }| type-specifier IDENTIFIER ARRAY { $$ = node("param", 3, $1, $2, $3); }compound-stmt : LBRACE local-declarations statement-list RBRACE { $$ = node("compound-stmt", 4, $1, $2, $3, $4); }local-declarations : { $$ = node("local-declarations", 0); }| local-declarations var-declaration { $$ = node("local-declarations", 2, $1, $2); }statement-list : { $$ = node("statement-list", 0); }| statement-list statement { $$ = node("statement-list", 2, $1, $2); }statement : expression-stmt { $$ = node("statement", 1, $1); }| compound-stmt { $$ = node("statement", 1, $1); }| selection-stmt { $$ = node("statement", 1, $1); }| iteration-stmt { $$ = node("statement", 1, $1); }| return-stmt { $$ = node("statement", 1, $1); }expression-stmt : expression SEMICOLON { $$ = node("expression-stmt", 2, $1, $2); }| SEMICOLON { $$ = node("expression-stmt", 1, $1); }selection-stmt : IF LPARENTHESE expression RPARENTHESE statement { $$ = node("selection-stmt", 5, $1, $2, $3, $4, $5); }| IF LPARENTHESE expression RPARENTHESE statement ELSE statement { $$ = node("selection-stmt", 7, $1, $2, $3, $4, $5, $6, $7); }iteration-stmt : WHILE LPARENTHESE expression RPARENTHESE statement { $$ = node("iteration-stmt", 5, $1, $2, $3, $4, $5); }return-stmt : RETURN SEMICOLON { $$ = node("return-stmt", 2, $1, $2); }| RETURN expression SEMICOLON { $$ = node("return-stmt", 3, $1, $2, $3); }expression : var ASSIN expression { $$ = node("expression", 3, $1, $2, $3); }| simple-expression { $$ = node("expression", 1, $1); }var : IDENTIFIER { $$ = node("var", 1, $1); }| IDENTIFIER LBRACKET expression RBRACKET { $$ = node("var", 4, $1, $2, $3, $4); }simple-expression : additive-expression relop additive-expression { $$ = node("simple-expression", 3, $1, $2, $3); }| additive-expression { $$ = node("simple-expression", 1, $1); }relop : LTE { $$ = node("relop", 1, $1); }| LT { $$ = node("relop", 1, $1); }| GT { $$ = node("relop", 1, $1); }| GTE { $$ = node("relop", 1, $1); }| EQ { $$ = node("relop", 1, $1); }| NEQ { $$ = node("relop", 1, $1); }additive-expression : additive-expression addop term { $$ = node("additive-expression", 3, $1, $2, $3); }| term { $$ = node("additive-expression", 1, $1); }addop : ADD { $$ = node("addop", 1, $1); }| SUB { $$ = node("addop", 1, $1); }term : term mulop factor { $$ = node("term", 3, $1, $2, $3); }| factor { $$ = node("term", 1, $1); }mulop : MUL { $$ = node("mulop", 1, $1); }| DIV { $$ = node("mulop", 1, $1); }factor : LPARENTHESE expression RPARENTHESE { $$ = node("factor", 3, $1, $2, $3); }| var { $$ = node("factor", 1, $1); }| call { $$ = node("factor", 1, $1); }| integer { $$ = node("factor", 1, $1); }| float { $$ = node("factor", 1, $1); }integer : INTEGER { $$ = node("integer", 1, $1); }float : FLOATPOINT { $$ = node("float", 1, $1); }call : IDENTIFIER LPARENTHESE args RPARENTHESE { $$ = node("call", 4, $1, $2, $3, $4); }args : { $$ = node("args", 0); }| arg-list { $$ = node("args", 1, $1); }arg-list : arg-list COMMA expression { $$ = node("arg-list", 3, $1, $2, $3); }| expression { $$ = node("arg-list", 1, $1); }%%

在编译的过程中，发现yyin变量未声明。经过查找对应资料后，对 yyin 添加声明extern FILE *yyin;。再次编译，成功编译。

实验结果及验证

编译过程与lab1的编译过程相同

make build
cd build
cmake ../
make parser

输入指令 make parser

sunny2004@sunny2004-VirtualBox:~/lab1/cminus_compiler-2023-fall$ cd build
sunny2004@sunny2004-VirtualBox:~/lab1/cminus_compiler-2023-fall/build$ make parser
[ 36%] Built target common
[ 81%] Built target syntax
[100%] Built target parser
sunny2004@sunny2004-VirtualBox:~/lab1/cminus_compiler-2023-fall/build$

![[Snipaste_2023-11-23_13-59-55.png]]
运行，灵活使用重定向输入

cd cminus_compiler-2023-fall
./build/parser               # 交互式使用（不进行输入重定向）
<在这里输入 Cminus-f 代码，如果遇到了错误，将程序将报错并退出。>
<输入完成后按 ^D 结束输入，此时程序将输出解析树。>
./build/parser < test.cminus # 重定向标准输入
<此时程序从 test.cminus 文件中读取输入，因此不需要输入任何内容。>
<如果遇到了错误，将程序将报错并退出；否则，将输出解析树。>
./build/parser test.cminus  # 不使用重定向，直接从 test.cminus 中读入
./build/parser < test.cminus > out
<此时程序从 test.cminus 文件中读取输入，因此不需要输入任何内容。>
<如果遇到了错误，将程序将报错并退出；否则，将输出解析树到 out 文件中。>

不使用重定向输入运行验证：输入./build/parser后直接输入Cminus-f代码，输入完成以后按^d结束输入，此时程序将输出解析树，如果遇到了报错，程序将报错。

sunny2004@sunny2004-VirtualBox:~/lab2/cminus_compiler-2023-fall$ ./build/parser
int a;
>--+ program
|  >--+ declaration-list
|  |  >--+ declaration
|  |  |  >--+ var-declaration
|  |  |  |  >--+ type-specifier
|  |  |  |  |  >--* int
|  |  |  |  >--* a
|  |  |  |  >--* ;
sunny2004@sunny2004-VirtualBox:~/lab2/cminus_compiler-2023-fall$

重定向运行：

cd cminus_compiler-2023-fall
./build/parser < test.cminus # 重定向标准输入
<此时程序从 test.cminus 文件中读取输入，因此不需要输入任何内容。>
<如果遇到了错误，将程序将报错并退出；否则，将输出解析树。>
./build/parser test.cminus  # 不使用重定向，直接从 test.cminus 中读入
./build/parser < test.cminus > out
<此时程序从 test.cminus 文件中读取输入，因此不需要输入任何内容。>
<如果遇到了错误，将程序将报错并退出；否则，将输出解析树到 out 文件中。>

验证过程：
本次试验测试案例较多，为此我们将这些测试分为两类：

easy: 这部分测试均比较简单且单纯，适合开发时调试。
normal: 较为综合，适合完成实验后系统测试。
我们使用 diff 命令进行验证。将自己的生成结果和助教提供的 xxx_std 进行比较

验证，输入指令：./tests/lab2/test_syntax.sh easy yes
![[Snipaste_2023-11-23_20-25-10.png]]
验证，输入指令./tests/lab2/test_syntax.sh normal yes
输出：

sunny2004@sunny2004-VirtualBox:~/lab2/cminus_compiler-2023-fall$ ./tests/lab2/test_syntax.sh normal yes
[info] Analyzing array1.cminus
[info] Analyzing array2.cminus
[info] Analyzing func.cminus
[info] Analyzing gcd.cminus
[info] Analyzing if.cminus
[info] Analyzing selectionsort.cminus
[info] Analyzing tap.cminus
[info] Analyzing You_Should_Pass.cminus
[info] Comparing...
[info] No difference! Congratulations!
sunny2004@sunny2004-VirtualBox:~/lab2/cminus_compiler-2023-fall$

使用diff命令验证
没有任何输出结果，表示正确

sunny2004@sunny2004-VirtualBox:~/lab1/cminus_compiler-2023-fall$ diff ./tests/lab2/syntree_easy ./tests/lab2/syntree_easy_std
sunny2004@sunny2004-VirtualBox:~/lab1/cminus_compiler-2023-fall$ diff ./tests/lab2/syntree_normal ./tests/lab2/syntree_normal_std
sunny2004@sunny2004-VirtualBox:~/lab1/cminus_compiler-2023-fall$

设计自己的testcase，编写cminus程序如下

/****/
/*absys***/
void hahah(int a){}
float temp[10];
int main(void) {int j;j = 10;while (j >= 3) {temp[j] = j * (j + 2);j = j - 1;}
}
void ohuuhu(int a) {
a = a + 3;
}

执行指令./build/parser <tests/lab2/easy/mytest.cminus > tests/lab2/syntree_easy_std/mytest.syntax_tree，
在没有语法错误的情况下，将解析树定向输出到mytest.syntax_tree中。在相应的文件夹中能够正确地生成语法树。

实验反馈

对于本次实验，我的总结如下：

注意bison的语法规则和如何与flex联合进行语法分析，产生式中的词法单元是与.l文件中定义的词法单元对应的
对于空串也要正确地进行处理，将子节点数量设置为0
对于EOL、COMMENT、BLANK，在语法分析中需要识别，但不需要分析，所以删去return和 pass_node部分，且因为是需要被识别到的，所以不能直接将.l文件中的相关语句全部删去

这篇关于编译原理Lab2-用bison完成语法分析器的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！