CoNLL 2017 - Multi-Model and Crosslingual Dependency Analysis

本文主要是介绍CoNLL 2017 - Multi-Model and Crosslingual Dependency Analysis，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

ref: Multi-Model and Crosslingual Dependency Analysis

code: https://github.com/CoNLL-UD-2017/Orange-Deskin

proceedings: http://universaldependencies.org/conll17/proceedings/

http://universaldependencies.org/conll17/results.html

代码运行环境的搭建： VirtualBox+Centos 7

1. get the source code

git clone https://github.com/CoNLL-UD-2017/Orange-Deskin

2. get cnn-v1

cd Orange-Deskin
git clone https://github.com/clab/cnn-v1

3. get eigen

hg clone https://bitbucket.org/eigen/eigen

4. replace cnn/model by the file in cnn-modifs and compile cnn:

cp cnn-modifs/model.h cnn-v1/cnn/
cd cnn-v1
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=../../eigen 【need to set to absolute path, like: /root/Orange-Deskin/eigen】make

如果cmake失败，而错误的原因是“Undefined reference to pthread_create in Linux”，解决方法是：安装boost-devel包，在Centos环境下：

yum install boost-devel

5. modify pycnn/setup.py (directory "../../cnn" should be "../../cnn-v1") and compile the python interface:

cd cnn-v1/pycnn
make install

Train models

1. in order to run training, we need to set the environment variable to find cnn-python-library

export LD_LIBRARY_PATH=PATH/TO/cnn-v1/pycnn

2. run the following training, we need to get "train-projective.conllu", "word2vec.cbow.bin", "train-words-to-load.txt".

python bistparser/barchybrid/src/parser.py \--cnn-mem 4000  \--outdir /PATH/TO/OUTDIR \--train train-projective.conllu \--dev dev-projective.conllu \--epochs 20 --lstmdims 125 \--lstmlayers 2 --bibi-lstm \--k 3 --usehead --userl \--extrn word2vec.cbow.bin \--extrnFilter train-words-to-load.txt \[--hidden 50]

3. get "train-projective.conllu"

py/projectivise.py -c train.conllu > train-pojective.conllu

其中，“train.conllu”是在treebank中选中一个训练集。Treebank: Universal Dependencies 2.0可以从“https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2184”下载，其文件名为：ud-test-v2.0-conll2017.tgz。我们可以选择：ud-test-v2.0-conll2017/input/conll17-ud-trial-2017-03-19/en-udpipe.conllu.

4. get "dev-projective.conllu"

py/projectivise.py -c dev.conllu > dev-pojective.conllu

5. get "word2vec.cbow.bin"

downloaded "freebase-vectors-skipgram1000.bin.gz" or "freebase-vectors-skipgram1000-en.bin.gz" or " GoogleNews-vectors-negative300.bin.gz" from https://code.google.com/archive/p/word2vec/

However, the used word2vec.cbow.bin is not trained by GoogleNews!

So we need to train another corpora so that we can get the file "word2vec.cbow.bin"

Word embeddings have been calculated on corpora taken from "https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989",可以下载文件名为：word-embeddings-conll17.tar的文件，取其中的word-embeddings-conll17/English/en.vectors文件作为word2vec.cbow.bin。

6. get "train-words-to-load.txt"

cut -f2 train-projective.conllu | sort -u > forms.txt
cut -f3 train-projective.conllu | sort -u > lemmas.txt
cat forms.txt lemmas.txt | perl -CSD -ne 'print lc' | sort -u > train-words-to-load.txt.txt

7. begin to training

python bistparser/barchybrid/src/parser.py \--cnn-mem 4000  \--outdir /PATH/TO/OUTDIR \--train train-projective.conllu \--dev dev-projective.conllu \--epochs 20 --lstmdims 125 \--lstmlayers 2 --bibi-lstm \--k 3 --usehead --userl \--extrn word2vec.cbow.bin \--extrnFilter [--hidden 50]

Note: /PATH/TO/OUTDIR should be set as the true output directory.

Use Models

python bistparser/barchybrid/src/parser.py \--cnn-mem 4000 --predict \--outfile result.conllu \--test test-projective.conllu \--model /PATH/TO/OUTDIR/barchhybrid.model_NNN \--params /PATH/TO/OUTDIR/params.pickle \--k 3 --usehead --userl \--extrn word2vec.cbow.bin \--extrnFilter train-words-to-load.txt \--extrnFilterNew test-words-to-load.txt

发生如下错误：

出现上述错误，进过多次检查，发现barchhybrid.model_002的名字错误。

更改之后，发现如下错误：

出现上述原因是因为没有将cnn-v1/cnn/model.h替换掉。

Finally, de-projectivise output if you have projectivised:

py/projectivise.py -d result.conllu > result-deprojectivised.conllu

这篇关于CoNLL 2017 - Multi-Model and Crosslingual Dependency Analysis的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

CoNLL 2017 - Multi-Model and Crosslingual Dependency Analysis

相关文章

Pydantic中model_validator的实现

GORM中Model和Table的区别及使用

mysqld_multi在Linux服务器上运行多个MySQL实例

2014 Multi-University Training Contest 8小记

2014 Multi-University Training Contest 7小记

2014 Multi-University Training Contest 6小记

MVC（Model-View-Controller）和MVVM（Model-View-ViewModel）

diffusion model 合集

2017 版本的 WebStorm 永久破解

Versioned Staged Flow-Sensitive Pointer Analysis