本文主要是介绍Java使用ANTLR4对Lua脚本语法校验详解,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
《Java使用ANTLR4对Lua脚本语法校验详解》ANTLR是一个强大的解析器生成器,用于读取、处理、执行或翻译结构化文本或二进制文件,下面就跟随小编一起看看Java如何使用ANTLR4对Lua脚本...
什么是ANTLR?
https://www.antlr.org/
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
ANTLR(ANother Tool for Language Recognition)是一个强大的解析器生成器,用于读取、处理、执行或翻译结构化文本或二进制文件。 它被广泛用于构建语言、工具和框架。ANTLR 根据语法定义生成解析器,解析器可以构建和遍历解析树。
第一个例子
https://github.com/antlr/antlr4/blob/master/doc/getting-started.md#a-first-example
1.新建个Hello.g4文件:
// Define a grammar called Hello grammar Hello; r : 'hello' ID ; // match keyword hello followed by an identifier ID : [a-z]+ ; // match lower-case identifiers WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
2.安装IDEA插件
ANTLR v4:https://plugins.jetbrains.com/plugin/7358-antlr-v4
3.打开ANTLR Preview
在r : 'hello' ID ; // match keyword hello followed by an identifier这行上右键,点击Test Rule r
输入hello world,能够准确识别出ID为word。
输入hello World
,就不能够识别出ID为world了。
ANTLR4 的工作流程
词法分析器 (Lexer) :将字符序列转换为单词(Token)的过程。词法分析器(Lexer)一般是用来供语法解析器(Parser)调用的。
语法解析器 (Parser) :通常作为编译器或解释器出现。它的作用是进行语法检查,并构建由输入单词(Token)组成的数据结构(即抽象语法树)。语法解析器通常使用词法分析器(Lexer)从输入字符流中分离出一个个的单词(Token),并将单词(Token)流作为其输入。实际开发中,语法解析器可以手工编写,也可以使用工具自动生成。
抽象语法树 (Parse Tree) :是源代码结构的一种抽象表示,它以树的形状表示语言的语法结构。抽象语法树一般可以用来进行代码语法的检查,代码风格的检查,代码的格式化,代码的高亮,代码的错误提示以及代码的自动补全等。
使用 antlr4 的一般流程如下:
- 书写 antlr4 的词法和文法规则
- 使用 antlr4 的生成工具处理写好的规则,以生成指定语言的 Lexer 和 Parser 代码
- 调用生成的 Lexer 和 Parser 类,书写相应的逻辑代码,将原始输入文本转化为一个抽象语法树
- 使用 antlr4 的 visitor 来解析语法树,实现各种功能
实际上,除了 visitor 之外,antlr4 还提供了另一种解析语法树方式,叫做 Listener。Listener 是 antlr4 默认解析语法树的方式,它和 visitor 一样都可以实现对 ParseTree 的解析。如果开启了 visitor 或 listener,那么 antlr4 除了会生成 Lexer 和 Parser 代码,还会生成相应的 Visitor 和 Listener 代码。Listener 和 Visitor 区别如下
Listener | Visitor(个人倾向这种) | |
---|---|---|
是否访问所有节点 | 访问所有节点 | 只访问手动指定的节点 |
访问节点方式 | 通过 enter 和 exit 方法 | 通过 visit 方法 |
方法是否有返回值 | 没有返回值 | 有返回值 |
了解了 Listener 和 Visitor 的区别之后,我们可以总结出 antlr4 的大致工作流程如下:
如上左边的点线流程代表了通过 ANTLR4,将原始的.g4 规则转化为 Lexer、Parser、Listener 和 Visitor。右边的虚线流程代表了将原始的输入流通过 Lexer 转化为 Tokens,再将 Tokens 通过 Parser 转化为语法树,最后通过 Listener 或 Visitor 遍历 ParseTree 得到最终结果。
Lua脚本语法校验
准备一个Lua Grammar文件
https://github.com/antlr/grammars-v4/tree/master/lua
/* BSD License Copyright (c) 2013, Kazunori Sakamoto Copyright (c) 2016, Alexander Alexeev All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the NAME of Rainer Schuster nor the NAMEs of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. This grammar file derived from: Lua 5.3 Reference Manual http://www.lua.org/manual/5.3/manual.html Lua 5.2 Reference Manual http://www.lua.org/manual/5.2/manual.html Lua 5.1 grammar written by Nicolai Mainiero http://www.antlr3.org/grammar/1178608849736/Lua.g Tested by Kazunori Sakamoto with Test suite for Lua 5.2 (http://www.lua.org/tests/5.2/) Tested by Alexander Alexeev with Test suite for Lua 5.3 http://www.lua.org/testEYITKshPus/lua-5.3.2-tests.tar.gz */ grammar Lua; chunk : block EOF ; block : stat* retstat? ; stat : ';' | varlist '=' explist | functioncall | label | 'break' | 'goto' NAME | 'do' block 'end' | 'while' exp 'do' block 'end' | 'repeat' block 'until' exp | 'if' exp 'then' block ('elseif' exp 'then' block)* ('else' block)? 'end' | 'for' NAME '=' exp ',' exp (',' exp)? 'do' block 'end' | 'for' namelist 'in' explist 'do' block 'end' | 'function' funcname funcbody | 'local' 'function' NAME funcbody | 'local' attnamelist ('=' explist)? ; attnamelist : NAME attrib (',' NAME attrib)* ; attrib : ('<' NAME '>')? ; retstat : 'return' explist? ';'? ; label : '::' NAME '::' ; funcname : NAME ('.' NAME)* (':' NAME)? ; varlist : var_ (',' var_)* ; namelist : NAME (',' NAME)* ; explist : exp (',' exp)* ; exp : 'nil' | 'false' | 'true' | number | string | '...' | functiondef | prefixexp | tableconstructor | <assoc=right> exp operatorPower exp | operatorUnary exp | exp operatorMulDivMod exp | exp operatorAddSub exp | <assoc=right> exp operatorStrcat exp | exp operatorComparison exp | exp operatorAnd exp | exp operatorOr exp | exp operatorBitwise exp ; prefixexp : varOrExp nameAndArgs* ; functioncall : varOrExp nameAndArgs+ ; varOrExp : var_ | '(' exp ')' ; var_ : (NAME | '(' exp ')' varSuffix) varSuffix* ; varSuffix : nameAndArgs* ('[' exp ']' | '.' NAME) ; nameAndArgs : (':' NAME)? args ; /* var_ : NAME | prefixexp '[' exp ']' | prefixexp '.' NAME ; prefixexp : var_ | functioncall | '(' exp ')' ; functioncall : prefixexp args | prefixexp ':' NAME args ; */ args : '(' explist? ')' | tableconstructor | string ; functiondef : 'function' funcbody ; funcbody : '(' parlist? ')' block 'end' ; parlist : namelist (',' '...')? | '...' ; tableconstructor : '{' fieldlist? '}' ; fieldlist : field (fieldsep field)* fieldsep? ; field : '[' exp ']' '=' exp | NAME '=' exp | exp ; fieldsep : ',' | ';' ; operatorOr : 'or'; operatorAnd : 'and'; operatorComparison : '<' | '>' | '<=' | '>=' | '~=' | '=='; operatorStrcat : '..'; operatorAddSub : '+' | '-'; operatorMulDivMod : '*' | '/' | '%' | '//'; operatorBitwise : '&' | '|' | '~' | '<<' | '>>'; operatorUnary : 'not' | '#' | '-' | '~'; operatorPower : '^'; number : INT | HEX | FLOAT | HEX_FLOAT ; string : NORMALSTRING | CHARSTRING | LONGSTRING ; // LEXER NAME : [a-zA-Z_][a-zA-Z_0-9]* ; NORMALSTRING : '"' ( EscapeSequence | ~('\\'|'"') )* '"' ; CHARSTRING : '\'' ( EscapeSequence | ~('\''|'\\') )* '\'' ; LONGSTRING : '[' NESTED_STR ']' ; fragment NESTED_STR : '=' NESTED_STR '=' | '[' .*? ']' ; INT : Digit+ ; HEX : '0' [xX] HexDigit+ ; FLOAT : Digit+ '.' Digit* ExponentPart? | '.' Digit+ ExponentPart? | Digit+ ExponentPart ; HEX_FLOAT : '0' [xX] HexDigit+ '.' HexDigit* HexExponentPart? | '0' [xX] '.' HexDigit+ HexExponentPart? | '0' [xX] HexDigit+ HexExponentPart ; fragment ExponentPart : [eE] [+-]? Digit+ ; fragment HexExponentPart : [pP] [+-]? Digit+ ; fragment EscapeSequence : '\\' [abfnrtvz"'\\] | '\\' '\r'? '\n' | DecimalEscape | HexEscape | UtfEscape ; fragment DecimalEscape : '\\' Digit | '\\' Digit Digit | '\\' [0-2] Digit Digit ; fragment HexEscape : '\\' 'x' HexDigit HexDigit ; fragment UtfEscape : '\\' 'u{' HexDigit+ '}' ; fragment Digit : [0-9] ; fragment HexDigit : [0-9a-fA-F] ; COMMENT : '--[' NESTED_STR ']' -> channel(HIDDEN) ; LINE_COMMENT : '--' ( // -- | '[' '='* // --[== | '[' '='* ~('='|'['|'\r'|'\n') ~('\r'|'\n')* // --[==AA | ~('['|'\r'|'\n') ~('\r'|'\n')* // --AAA ) ('\r\n'|'\r'|'\n'|EOF) -> channel(HIDDEN) ; WS : [ \t\u000C\r\n]+ -> skip ; SHEBANG : '#' '!' ~('\n'|'\r')* -> channel(HIDDEN) ;
maven配置
使用JDK8的注意:antlr4最高版本为4.9.3,原因如下:
来源:https://github.com/antlr/antlr4/releases/tag/4.10
Increasing minimum Java version
Going forward, we are using Java 11 for the source code and the compiled .class files for the ANTLR tool. The Java runtime target, however, and the associated runtime tests use Java 8 (bumping up from Java 7).
<dependencies> <dependency> <groupId>org.antlr</groupId> <artifactId>antlr4-runtime</artifactId> <version>${antlr.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.antlr</groupId> <artifactId>antlr4-maven-plugin</artifactId> <version>${antlr.version}</version> <configuration> <visitor>true</visitor> <listener>true</listener> </configuration> <executions> <execution> <goals> <goal>antlr4</goal> </goals> </execution> </executions> </plugin> </plugins> </build> <properties> <!-- https://mvnrepository.com/artifact/org.antlr/antlr4-runtime --> <!-- Antlr4 4.9.3 is the last version compatible with Java 8 --> <antlr.version>www.chinasem.cn;4.9.3</antlr.version> </properties>
生成Lexer Parser Listener Visitor代码
mvn clean compile
新建实体类
语法错误:每行有什么错误。
package com.baeldung.antlr.lua.model; /** * 语法错误 * * @author duhongming * @see * @since 1.0.0 */ public class SyntaxErrorEntry { private Integer lineNum; private String errorInfo; public Integer getLineNum() { return lineNum; } public void setLineNum(Integer lineNum) { this.lineNum = lineNum; } public String getErrorInfo() { return errorInfo; } public void setErrorInfo(String errorInfo) { this.errorInfo = errorInfo; } }
语法错误报告:每行有什么错误的集合。
package com.baeldung.antlr.lua.model; import java.util.LinkedList; import java.util.List; /** * 语法错误报告 * * @author duhongming * @see * @since 1.0.0 */ public class SyntaxErrorReportEntry { private final List<SyntaxErrorEntry> syntaxErrorList = new LinkedList<>(); public void addError(int line, int charPositionInLine,python Object offendingSymbol, String msg) { SyntaxErrorEntry syntaxErrorEntry = new SyntaxErrorEntry(); syntaxErrorEntry.setLineNum(line); syntaxErrorEntry.setErrorInfo(line + "行," + charPositionInLine + "列," + offendingSymbol + "字符处,存在语法错误:" + msg); syntaxErrorList.add(syntaxErrorEntry); } public List<SyntaxErrorEntry> getSyntaxErrorReport() { return syntaxErrorList; } }
Lua语法遍历器
package com.baeldung.antlr.lua; import com.baeldung.antlr.LuaParser; import com.baeldung.antlr.LuaVisitor; import org.antlr.v4.runtime.tree.ErrorNode; import org.antlr.v4.runtime.tree.ParseTree; import org.antlr.v4.runtime.tree.RuleNode; impohttp://www.chinasem.cnrt org.antlr.v4.runtime.tree.TerminalNode; /** * Lua语法遍历器 * * @author duhongming * @see * @since 1.0.0 */ public class LuaSyntaxVisitor implements LuaVisitor<Object> { // ctrl+O Override即可 }
语法错误监听器
package com.baeldung.antlr.lua; import com.baeldung.antlr.lua.model.SyntaxErrorReportEntry; import org.antlr.v4.runtime.BaseErrorListener; import org.antlr.v4.runtime.RecognitionException; import org.antlr.v4.runtime.Recognizer; /** * 语法错误监听器 * * @author duhongming * @see * @since 1.0.0 */ public class SyntaxErrorListener extends BaseErrorListener { private final SyntaxErrorReportEntry reporter; public SyntaxErrorListener(SyntaxErrorReportEntry reporter) { this.reporter = reporter; } @Override public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) { this.reporter.addError(line, charPositionInLine, offendingSymbol, msg); } }
单元测试
package com.baeldung.antlr; import com.baeldung.antlr.lua.LuaSyntaxVisitor; import com.baeldung.antlr.lua.SyntaxErrorListener; import com.baeldung.antlr.lua.model.SyntaxErrorEntry; import com.baeldung.antlr.lua.model.SyntaxErrorReportEntry; import org.antlr.v4.runtime.CharStream; import org.antlr.v4.runtime.CharStreams; import org.antlr.v4.runtime.CommonTokenStream; import org.junit.Test; import java.util.List; import static org.hamcrest.CoreMatchers.is; import static org.hamcrest.MatcherAssert.assertThat; public class LuaSyntaxErrorUnitTest { public static List<SyntaxErrorEntry> judgeLuaSyntax(String luaScript) { //新建一个CharStream,读取数据 CharStream charStreams = CharStreams.fromString(luaScript); //包含一个词法分析器的定义,作用是将输入的字符序列聚集成词汇符号。 LuaLexer luaLexer = new LuaLexer(charStreams); //新建一个词法符号的缓冲区,用于存储词法分析器生成的词法符号(Token) CommonTokenStream tokenStream = new CommonTokenStream(luaLexer); //新建一个语法分析器,用于分析词法符号缓冲区中的词法符号 LuaParser luaParser = new LuaParser(tokenStream); SyntaxErrorReportEntry syntaxErrorReporter = new SyntaxErrorReportEntry(); SyntaxErrorListener errorListener = new SyntaxErrorListener(syntaxErrorReporter); luaParser.addErrorListener(errorListener); LuaSyntaxVisitor luaSyntaxVisitor = new LuaSyntaxVisitor(); luaSyntaxVisitor.visit(luaParser.chunk()); return syntaxErrorReporter.getSyntaxErrorReport(); } @Test public void testGood() throws Exception { List<SyntaxErrorEntry> errorEntryList = judgeLuaSyntax("if a~=1 then print(1) end"); assertThat(errorEntryList.size(), is(0)); } @Test public void testBad() throws Exception { //新建一个CharStream,读取数据 List<SyntaxErrorEntry> errorEntryList = judgeLuaSyntax("if a!=1 then print(1) end"); assertThat(errorEntryList.size(), is(2)); } }
顺便说一下:把antlr4看成一种语言,和javjavascripta同一级别,这个在使用groovy时也是一样的。
最终目录情况及单元测试情况如下:
以上就是Java使用ANTLR4对Lua脚本语法校验详解的详细内容,更多关于Java Lua脚本语法校验的资料请关注China编程(www.chinasem.cn)其它相关文章!
这篇关于Java使用ANTLR4对Lua脚本语法校验详解的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!