The SPECIALIST Lexicon API

本文主要是介绍The SPECIALIST Lexicon API，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

The SPECIALIST Lexicon JAVA API使用

affix 为词缀,按缀位分为 prefix (前缀)和 suffix(后缀);
按缀形分成 inflection (屈折词)和 derivation (衍生词)
derivation 分为 prefix 和 suffix,如:happy 加suffix为happily,加prefix为unhappy．
inflection 只在词尾加词缀,表时态,数,格等变化,如:ask,asks,asking,asked,etc.

derivation 派生词改变词性和语义

inflection 语法变化

LvgCmdApi

全部组件说明 lvg2021/docs/designDoc/UDF/flow/index.html

-f:a

缩写扩展

-f:b

uninflect a term 还原单词形态

it can make plural nouns in to singular nouns, inflected verbs into their infinitive forms, and adjectives and adverbs into their positive forms.

复数转换成单数，动词转换成不定式，副词形容词转换成原级（不能转换成名词）

-f:An

Anti-Normalize (Approximate Match)

‎使用规范化术语作为输入返回词汇中的转换后的术语。可用作基本近似匹配。

‎在词典中找到近似匹配，可用于不规范术语转换

The order of the results is sorted by alphabetical, EUI, category, and then inflection.

String outputFromLvg = null;
LvgCmdApi lvgApi = new LvgCmdApi("-f:An", "D:/lvg2021/data/config/lvg.properties");// ---------------------------------       
// process each term
// ---------------------------------
outputFromLvg = lvgApi.MutateToString("term");

-f:d

Generate derivational variants

生成派生词

派生规则文件 lvg2021/docs/designDoc/UDF/derivations/index.html

Derivational variants are generated by FACTs (a pre-computed derivational table) and morphology rules (RULEs). Facts are stored in database and retrieved by SQL query. RULEs are stored and retrieved through Trie mechanism.

派生转换由FACT（预计算的派生表）和形态规则(RULEs）生成。FACTs存储在数据库中，由SQL查询检索。RULEs通过Trie机制存储和检索。

-f:dc~数字

以数字指定派生词词性

Category	Value
adj	1
adv	2
aux	4
compl	8
conj	16
det	32
modal	64
noun	128
prep	256
pron	512
verb	1024

String outputFromLvg = null;
LvgCmdApi lvgApi = new LvgCmdApi("-f:dc~128", "D:/lvg2021/data/config/lvg.properties");
outputFromLvg = lvgApi.MutateToString(w);
String[] outs = outputFromLvg.split("\n");
if (outputFromLvg.length()>0) {for (String out : outs) {derivword.add(out.split("\\|")[1]);}}

-f:d kdt:STR

限制派生类型

Z (zeroD): restricts the outputs zero derivations of the input.无变化
S (suffixD): restricts the outputs suffix derivations of the input. 后缀
P (prefixD): restricts the outputs prefix derivations of the input. 前缀
ZS (zeroD and suffixD): restricts the outputs zero and suffix derivations of the input. This is one of the most used options with query expansion for CUI mapping.
ZP (zeroD and prefixD): restricts the outputs zero and prefix derivations of the input.
SP (suffixD and prefixD): restricts the outputs suffix and prefix derivations of the input.
ZSP (all): No restriction on the outputs on derivation type. All zeroD (Z), suffixD (S), and prefixD (P) are displayed. This is the default option.

-f:f

Filter output to contain only forms from the lexicon.

过滤词典中不存在的，只返回一条记录

inflection输出过滤 -k:i:1

输出派生变体过滤 -k:d:1

-f:i

Generate inflectional variants

生成屈折变体

-f:Ln

从数据库中检索单词类别（词性）和变体信息

-f:nom

Retrieve nominalizations form for an input term.

输入的标准化形式

-f:N3

=LuiNorm?

normalize non-ASCII Unicode characters to ASCII, remove genitives, then remove parenthetic plural forms, then replace punctuations with spaces, then remove stop words, then lowercase, then uninflected words, then take each of the normalized uninflected words and map them to their canonical form, then strip or map non-ASCII Unicode characters to ASCII, and then word order sort.

非ASCII字符转换，删除所有格，删除括号复数，替换标点符号为空格，小写，词形还原，转为正式名称，排序单词

-f:r

递归生成同义词

Norm API

lvg2021/docs/userDoc/examples/norm.html

同 -f:q0:g:rs:o:t:l:B:Ct:q7:q8:w

q0: map Unicode symbols and punctuation to ASCII
g: remove genitives,
rs: then remove parenthetic plural forms of (s), (es), (ies), (S), (ES), and (IES),
o: then replace punctuation with spaces,
t: then remove stop words,
l: then lowercase,
B: then uninflect each word,
Ct: then get citation form for each base form,
q7: then Unicode Core Norm
q8: then strip or map non-ASCII Unicode characters,
w: and finally sort the words in alphabetic order.

生成的单词有可能不存在于词典中

right经norm后成ride

import java.util.*;
import gov.nih.nlm.nls.lvg.Api.*;public class Normalization
{// test driverpublic static void main(String[] args){// instantiate a LvgApi object by config fileString lvgConfigFile= "/export/home/lu/Projects/LVG/lvg2012/data/config/lvg.properties";NormApi normApi = new NormApi(lvgConfigFile);// Process the inflectional variants mutationString in = "left"; // use lexItem as input to lvgApitry{Vector outs = normApi.Mutate(in);// PrintOut the Resultfor(String out: outs){System.out.println(in + "|" + out);}// clean upnormApi.CleanUp();}catch (Exception e){System.err.println("** ERR: " + e.toString());}}
}

输出形式

Field 1	Field 2	Field 3	Field 4	Field 5	Field 6	Field 7+
Input	Output Term	Categories	Inflections	Flow History	Flow Number	Additional Information

output term：转换后的术语

categories：

Bit	Value	Variant	Other Symbols	Example
0	1	adj	adjective ADJ	red
1	2	adv	adverb ADV	quickly
2	4	aux	auxiliary	be is are do have has
3	8	compl	complementizer	that
4	16	conj	conjunction CON con	and or but
5	32	det	determiner DET	a the some each
6	64	modal	.	can dare may must ought shall will
7	128	noun	NOM NPR	dog
8	256	prep	preposition PRE pre	to on in at by
9	512	pron	pronoun	it he they
10	1024	verb	VER ver	break

inflection:

Bit	Value	Variant	Other Symbols	Example
0	1	base	.	dog break red quickly
1	2	comparative 比较级	.	redder
2	4	superlative 最高级	.	reddest
3	8	plural 复数	p	dogs
4	16	presPart 现在分词	ing	breaking
5	32	past 过去式	.	broke
6	64	pastPart 过去分词	.	broken
7	128	pres3s 现在第三人称单数	.	breaks
8	256	positive	.	red
9	512	singular	s	dog
10	1024	infinitive	inf	break
11	2048	pres123p	.	break
12	4096	pastNeg	.	didn't couldn't wouldn't shouldn't
13	8192	pres123pNeg	.	don't won't
14	16384	pres1s	.	am
15	32768	past1p23pNeg	.	weren't
16	65536	past1p23p	.	were
17	131072	past1s3sNeg	.	wasn't
18	262144	pres1p23p	.	are
19	524288	pres1p23pNeg	.	aren't
20	1048576	past1s3s	.	was
21	2097152	pres	.	can
22	4194304	pres3sNeg	.	isn't hasn't
23	8388608	presNeg	.	can't cannot

where:

pres: present
past: past
Part: participle
1: first personal
2: second personal
3: third personal
s: singular
p: plural
Neg: Negative

additional information：-m

Sub-Term Mapping Tools (SMTM)

Sub-Term Mapping Tools (nih.gov)

LexItem Sub-Term Finder (LSF):

to find if a term is in the Lexicon
to find all sub-terms are in the Lexicon
to find all prefix sub-terms are in the Lexicon
to find the longest prefix sub-term in the Lexicon

//判断语料库中是否存在该词
LsfApi lsfApi = new LsfApi("D:/stmt2015/data/config/lsf.properties");String isincorpus = lsfApi.CheckInCorpus("alis");

//前缀？ 对于单独的单词好像无法识别前缀
Vector<String> prefixes = lsfApi.FindPrefixes("cricoarytenoid");

这篇关于The SPECIALIST Lexicon API的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

The SPECIALIST Lexicon API

LvgCmdApi

-f:a

-f:b

-f:An

-f:d

-f:dc~数字

-f:d kdt:STR

-f:f

-f:i

-f:Ln

-f:nom

-f:N3

-f:r

Norm API

输出形式

Sub-Term Mapping Tools (SMTM)

相关文章

Knife4j+Axios+Redis前后端分离架构下的 API 管理与会话方案(最新推荐)

HTML5 getUserMedia API网页录音实现指南示例小结

使用Python实现调用API获取图片存储到本地的方法

无法启动此程序因为计算机丢失api-ms-win-core-path-l1-1-0.dll修复方案

python通过curl实现访问deepseek的API

Java对接Dify API接口的完整流程

一文详解如何在Vue3中封装API请求

springboot项目中常用的工具类和api详解

基于Flask框架添加多个AI模型的API并进行交互

C#集成DeepSeek模型实现AI私有化的流程步骤(本地部署与API调用教程)