CoNLL 2017 - Multi-Model and Crosslingual Dependency Analysis

2024-04-27 04:08

ref: Multi-Model and Crosslingual Dependency Analysis



代码运行环境的搭建: VirtualBox+Centos 7

1. get the source code

git clone

2. get cnn-v1

cd Orange-Deskin
git clone

3. get eigen

hg clone

4. replace cnn/model by the file in cnn-modifs and compile cnn:

cp cnn-modifs/model.h cnn-v1/cnn/
cd cnn-v1
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=../../eigen 【need to set to absolute path, like: /root/Orange-Deskin/eigen】make

如果cmake失败,而错误的原因是“Undefined reference to pthread_create in Linux”,解决方法是:安装boost-devel包,在Centos环境下:

yum install boost-devel

5. modify pycnn/ (directory "../../cnn" should be "../../cnn-v1") and compile the python interface:

cd cnn-v1/pycnn
make install
Train models

1. in order to run training, we need to set the environment variable to find cnn-python-library

export LD_LIBRARY_PATH=PATH/TO/cnn-v1/pycnn

2. run the following training, we need to get "train-projective.conllu", "word2vec.cbow.bin", "train-words-to-load.txt".

python bistparser/barchybrid/src/ \--cnn-mem 4000  \--outdir /PATH/TO/OUTDIR \--train train-projective.conllu \--dev dev-projective.conllu \--epochs 20 --lstmdims 125 \--lstmlayers 2 --bibi-lstm \--k 3 --usehead --userl \--extrn word2vec.cbow.bin \--extrnFilter train-words-to-load.txt \[--hidden 50]

3. get "train-projective.conllu"

py/ -c train.conllu > train-pojective.conllu

其中,“train.conllu”是在treebank中选中一个训练集。Treebank: Universal Dependencies 2.0可以从“”下载,其文件名为:ud-test-v2.0-conll2017.tgz。我们可以选择:ud-test-v2.0-conll2017/input/conll17-ud-trial-2017-03-19/en-udpipe.conllu.

4. get "dev-projective.conllu"

py/ -c dev.conllu > dev-pojective.conllu

其中,“train.conllu”是在treebank中选中一个训练集。Treebank: Universal Dependencies 2.0可以从“”下载,其文件名为:ud-test-v2.0-conll2017.tgz。我们可以选择:ud-test-v2.0-conll2017/input/conll17-ud-development-2017-03-19/en-udpipe.conllu.

5. get "word2vec.cbow.bin"

downloaded "freebase-vectors-skipgram1000.bin.gz" or "freebase-vectors-skipgram1000-en.bin.gz" or " GoogleNews-vectors-negative300.bin.gz" from

However, the used word2vec.cbow.bin is not trained by GoogleNews!

So we need to train another corpora so that we can get the file "word2vec.cbow.bin"

Word embeddings have been calculated on corpora taken from "",可以下载文件名为:word-embeddings-conll17.tar的文件,取其中的word-embeddings-conll17/English/en.vectors文件作为word2vec.cbow.bin。

6. get "train-words-to-load.txt"

cut -f2 train-projective.conllu | sort -u > forms.txt
cut -f3 train-projective.conllu | sort -u > lemmas.txt
cat forms.txt lemmas.txt | perl -CSD -ne 'print lc' | sort -u > train-words-to-load.txt.txt

7. begin to training

python bistparser/barchybrid/src/ \--cnn-mem 4000  \--outdir /PATH/TO/OUTDIR \--train train-projective.conllu \--dev dev-projective.conllu \--epochs 20 --lstmdims 125 \--lstmlayers 2 --bibi-lstm \--k 3 --usehead --userl \--extrn word2vec.cbow.bin \--extrnFilter [--hidden 50]

Note: /PATH/TO/OUTDIR should be set as the true output directory.

Use Models

python bistparser/barchybrid/src/ \--cnn-mem 4000 --predict \--outfile result.conllu \--test test-projective.conllu \--model /PATH/TO/OUTDIR/barchhybrid.model_NNN \--params /PATH/TO/OUTDIR/params.pickle \--k 3 --usehead --userl \--extrn word2vec.cbow.bin \--extrnFilter train-words-to-load.txt \--extrnFilterNew test-words-to-load.txt





Finally, de-projectivise output if you have projectivised:

py/ -d result.conllu > result-deprojectivised.conllu

