The SPECIALIST Lexicon API

2023-10-28 03:18
文章标签 api specialist lexicon

本文主要是介绍The SPECIALIST Lexicon API,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

The SPECIALIST Lexicon JAVA API使用

affix 为词缀,按缀位分为 prefix (前缀)和 suffix(后缀);
按缀形分成 inflection (屈折词)和 derivation (衍生词)
derivation 分为 prefix 和 suffix,如:happy 加suffix为happily,加prefix为unhappy.
inflection 只在词尾加词缀,表时态,数,格等变化,如:ask,asks,asking,asked,etc.

derivation 派生词 改变词性和语义

inflection 语法变化

LvgCmdApi

全部组件说明 lvg2021/docs/designDoc/UDF/flow/index.html

-f:a

缩写扩展

-f:b

uninflect a term 还原单词形态

it can make plural nouns in to singular nouns, inflected verbs into their infinitive forms, and adjectives and adverbs into their positive forms.

复数转换成单数,动词转换成不定式,副词形容词转换成原级(不能转换成名词)

-f:An 

Anti-Normalize (Approximate Match)

‎使用规范化术语作为输入返回词汇中的转换后的术语。可用作基本近似匹配。

‎在词典中找到近似匹配,可用于不规范术语转换

The order of the results is sorted by alphabetical, EUI, category, and then inflection.

String outputFromLvg = null;
LvgCmdApi lvgApi = new LvgCmdApi("-f:An", "D:/lvg2021/data/config/lvg.properties");// ---------------------------------       
// process each term
// ---------------------------------
outputFromLvg = lvgApi.MutateToString("term");

-f:d

Generate derivational variants

生成派生词

派生规则文件 lvg2021/docs/designDoc/UDF/derivations/index.html

Derivational variants are generated by FACTs (a pre-computed derivational table) and morphology rules (RULEs). Facts are stored in database and retrieved by SQL query. RULEs are stored and retrieved through Trie mechanism.

派生转换由FACT(预计算的派生表)和形态规则(RULEs)生成。FACTs存储在数据库中,由SQL查询检索。RULEs通过Trie机制存储和检索。

-f:dc~数字

以数字指定派生词词性

CategoryValue
adj1
adv2
aux4
compl8
conj16
det32
modal64
noun128
prep256
pron512
verb

1024

String outputFromLvg = null;
LvgCmdApi lvgApi = new LvgCmdApi("-f:dc~128", "D:/lvg2021/data/config/lvg.properties");
outputFromLvg = lvgApi.MutateToString(w);
String[] outs = outputFromLvg.split("\n");
if (outputFromLvg.length()>0) {for (String out : outs) {derivword.add(out.split("\\|")[1]);}}

-f:d kdt:STR

限制派生类型

  • Z (zeroD): restricts the outputs zero derivations of the input.无变化
  • S (suffixD): restricts the outputs suffix derivations of the input. 后缀
  • P (prefixD): restricts the outputs prefix derivations of the input. 前缀
  • ZS (zeroD and suffixD): restricts the outputs zero and suffix derivations of the input. This is one of the most used options with query expansion for CUI mapping. 
  • ZP (zeroD and prefixD): restricts the outputs zero and prefix derivations of the input.
  • SP (suffixD and prefixD): restricts the outputs suffix and prefix derivations of the input.
  • ZSP (all): No restriction on the outputs on derivation type. All zeroD (Z), suffixD (S), and prefixD (P) are displayed. This is the default option.

-f:f

 Filter output to contain only forms from the lexicon.

过滤词典中不存在的,只返回一条记录

inflection输出过滤 -k:i:1 

输出派生变体过滤 -k:d:1

-f:i

Generate inflectional variants

生成屈折变体

-f:Ln

从数据库中检索单词类别(词性)和变体信息

-f:nom

Retrieve nominalizations form for an input term.

输入的标准化形式

-f:N3

=LuiNorm?

normalize non-ASCII Unicode characters to ASCII, remove genitives, then remove parenthetic plural forms, then replace punctuations with spaces, then remove stop words, then lowercase, then uninflected words, then take each of the normalized uninflected words and map them to their canonical form, then strip or map non-ASCII Unicode characters to ASCII, and then word order sort.

非ASCII字符转换,删除所有格,删除括号复数,替换标点符号为空格,小写,词形还原,转为正式名称,排序单词

-f:r

递归生成同义词

Norm API

lvg2021/docs/userDoc/examples/norm.html

同 -f:q0:g:rs:o:t:l:B:Ct:q7:q8:w

  1. q0: map Unicode symbols and punctuation to ASCII
  2. g: remove genitives,
  3. rs: then remove parenthetic plural forms of (s), (es), (ies), (S), (ES), and (IES),
  4. o: then replace punctuation with spaces,
  5. t: then remove stop words,
  6. l: then lowercase,
  7. B: then uninflect each word,
  8. Ct: then get citation form for each base form,
  9. q7: then Unicode Core Norm
  10. q8: then strip or map non-ASCII Unicode characters,
  11. w: and finally sort the words in alphabetic order.

生成的单词有可能不存在于词典中

right经norm后成ride 

import java.util.*;
import gov.nih.nlm.nls.lvg.Api.*;public class Normalization
{// test driverpublic static void main(String[] args){// instantiate a LvgApi object by config fileString lvgConfigFile= "/export/home/lu/Projects/LVG/lvg2012/data/config/lvg.properties";NormApi normApi = new NormApi(lvgConfigFile);// Process the inflectional variants mutationString in = "left"; // use lexItem as input to lvgApitry{Vector outs = normApi.Mutate(in);// PrintOut the Resultfor(String out: outs){System.out.println(in + "|" + out);}// clean upnormApi.CleanUp();}catch (Exception e){System.err.println("** ERR: " + e.toString());}}
}

输出形式

Field 1Field 2Field 3Field 4Field 5Field 6Field 7+
InputOutput TermCategoriesInflectionsFlow HistoryFlow NumberAdditional Information

output term:转换后的术语

categories:

BitValueVariantOther SymbolsExample
01adj
  • adjective
  • ADJ
  • red
12adv
  • adverb
  • ADV
  • quickly
24aux
  • auxiliary
  • be
  • is
  • are
  • do
  • have
  • has
38compl
  • complementizer
  • that
416conj
  • conjunction
  • CON
  • con
  • and
  • or
  • but
532det
  • determiner
  • DET
  • a
  • the
  • some
  • each
664modal.
  • can
  • dare
  • may
  • must
  • ought
  • shall
  • will
7128noun
  • NOM
  • NPR
  • dog
8256prep
  • preposition
  • PRE
  • pre
  • to
  • on
  • in
  • at
  • by
9512pron
  • pronoun
  • it
  • he
  • they
101024verb
  • VER
  • ver
  • break

inflection:

 

BitValueVariantOther SymbolsExample
01base.
  • dog
  • break
  • red
  • quickly
12comparative 比较级.
  • redder
24superlative 最高级.
  • reddest
38plural 复数
  • p
  • dogs
416presPart 现在分词
  • ing
  • breaking
532past 过去式.
  • broke
664pastPart  过去分词.
  • broken
7128pres3s 现在第三人称单数.
  • breaks
8256positive.
  • red
9512singular
  • s
  • dog
101024infinitive
  • inf
  • break
112048pres123p.
  • break
124096pastNeg.
  • didn't
  • couldn't
  • wouldn't
  • shouldn't
138192pres123pNeg.
  • don't
  • won't
1416384pres1s.
  • am
1532768past1p23pNeg.
  • weren't
1665536past1p23p.
  • were
17131072past1s3sNeg.
  • wasn't
18262144pres1p23p.
  • are
19524288pres1p23pNeg.
  • aren't
201048576past1s3s.
  • was
212097152pres.
  • can
224194304pres3sNeg.
  • isn't
  • hasn't
238388608presNeg.
  • can't
  • cannot

where:

  • pres: present
  • past: past
  • Part: participle
  • 1: first personal
  • 2: second personal
  • 3: third personal
  • s: singular
  • p: plural
  • Neg: Negative

additional information:-m

 

Sub-Term Mapping Tools (SMTM)

Sub-Term Mapping Tools (nih.gov)

LexItem Sub-Term Finder (LSF):

  • to find if a term is in the Lexicon
  • to find all sub-terms are in the Lexicon
  • to find all prefix sub-terms are in the Lexicon
  • to find the longest prefix sub-term in the Lexicon
//判断语料库中是否存在该词
LsfApi lsfApi = new LsfApi("D:/stmt2015/data/config/lsf.properties");String isincorpus = lsfApi.CheckInCorpus("alis");
//前缀? 对于单独的单词好像无法识别前缀
Vector<String> prefixes = lsfApi.FindPrefixes("cricoarytenoid");

 

这篇关于The SPECIALIST Lexicon API的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/290321

相关文章

【LabVIEW学习篇 - 21】:DLL与API的调用

文章目录 DLL与API调用DLLAPIDLL的调用 DLL与API调用 LabVIEW虽然已经足够强大,但不同的语言在不同领域都有着自己的优势,为了强强联合,LabVIEW提供了强大的外部程序接口能力,包括DLL、CIN(C语言接口)、ActiveX、.NET、MATLAB等等。通过DLL可以使用户很方便地调用C、C++、C#、VB等编程语言写的程序以及windows自带的大

如何更优雅地对接第三方API

如何更优雅地对接第三方API 本文所有示例完整代码地址:https://github.com/yu-linfeng/BlogRepositories/tree/master/repositories/third 我们在日常开发过程中,有不少场景会对接第三方的API,例如第三方账号登录,第三方服务等等。第三方服务会提供API或者SDK,我依稀记得早些年Maven还没那么广泛使用,通常要对接第三方

Java基础回顾系列-第五天-高级编程之API类库

Java基础回顾系列-第五天-高级编程之API类库 Java基础类库StringBufferStringBuilderStringCharSequence接口AutoCloseable接口RuntimeSystemCleaner对象克隆 数字操作类Math数学计算类Random随机数生成类BigInteger/BigDecimal大数字操作类 日期操作类DateSimpleDateForma

Restful API 原理以及实现

先说说API 再说啥是RESRFUL API之前,咱先说说啥是API吧。API大家应该都知道吧,简称接口嘛。随着现在移动互联网的火爆,手机软件,也就是APP几乎快爆棚了。几乎任何一个网站或者应用都会出一款iOS或者Android APP,相比网页版的体验,APP确实各方面性能要好很多。 那么现在问题来了。比如QQ空间网站,如果我想获取一个用户发的说说列表。 QQ空间网站里面需要这个功能。

京东物流查询|开发者调用API接口实现

快递聚合查询的优势 1、高效整合多种快递信息。2、实时动态更新。3、自动化管理流程。 聚合国内外1500家快递公司的物流信息查询服务,使用API接口查询京东物流的便捷步骤,首先选择专业的数据平台的快递API接口:物流快递查询API接口-单号查询API - 探数数据 以下示例是参考的示例代码: import requestsurl = "http://api.tanshuapi.com/a

WordPress开发中常用的工具或api文档

http://php.net/ http://httpd.apache.org/ https://wordpress.org/ https://cn.wordpress.org/ https://core.svn.wordpress.org/ zh-cn:开发者文档: https://codex.wordpress.org/zh-cn:%E5%BC%80%E5%8F%91%E8%80%

Java后端微服务架构下的API限流策略:Guava RateLimiter

Java后端微服务架构下的API限流策略:Guava RateLimiter 大家好,我是微赚淘客返利系统3.0的小编,是个冬天不穿秋裤,天冷也要风度的程序猿! 在微服务架构中,API限流是保护服务不受过度使用和拒绝服务攻击的重要手段。Guava RateLimiter是Google开源的Java库中的一个组件,提供了简单易用的限流功能。 API限流概述 API限流通过控制请求的速率来防止

Docker远程连接和Docker Remote Api

在Docker生态系统中一共有3种API:Registry API、Docker Hub API、Docker Remote API 这三种API都是RESTful风格的。这里Remote API是通过程序与Docker进行集成和交互的核心内容。 Docker Remote API是由Docker守护进程提供的。默认情况下,Docker守护进程会绑定到一个所在宿主机的套接字:unix:///v

基于MinerU的PDF解析API

基于MinerU的PDF解析API - MinerU的GPU镜像构建- 基于FastAPI的PDF解析接口 支持一键启动,已经打包到镜像中,自带模型权重,支持GPU推理加速,GPU速度相比CPU每页解析要快几十倍不等 主要功能 删除页眉、页脚、脚注、页码等元素,保持语义连贯对多栏输出符合人类阅读顺序的文本保留原文档的结构,包括标题、段落、列表等提取图像、图片标题、表格、表格标题自动识别

mongodb基本命令和Java操作API示例

1.Mongo3.2 java API示例:http://www.cnblogs.com/zhangchaoyang/articles/5146508.html 2.MongoDB基本命:http://www.cnblogs.com/xusir/archive/2012/12/24/2830957.html 3.java MongoDB查询(一)简单查询: http://www.cnblogs