本文主要是介绍CoNLL 2017 - Multi-Model and Crosslingual Dependency Analysis,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
ref: Multi-Model and Crosslingual Dependency Analysis
code: https://github.com/CoNLL-UD-2017/Orange-Deskin
proceedings: http://universaldependencies.org/conll17/proceedings/
http://universaldependencies.org/conll17/results.html
代码运行环境的搭建: VirtualBox+Centos 7
1. get the source code
git clone https://github.com/CoNLL-UD-2017/Orange-Deskin
2. get cnn-v1
cd Orange-Deskin
git clone https://github.com/clab/cnn-v1
3. get eigen
hg clone https://bitbucket.org/eigen/eigen
4. replace cnn/model by the file in cnn-modifs and compile cnn:
cp cnn-modifs/model.h cnn-v1/cnn/
cd cnn-v1
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=../../eigen 【need to set to absolute path, like: /root/Orange-Deskin/eigen】make
如果cmake失败,而错误的原因是“Undefined reference to pthread_create in Linux”,解决方法是:安装boost-devel包,在Centos环境下:
yum install boost-devel
5. modify pycnn/setup.py (directory "../../cnn" should be "../../cnn-v1") and compile the python interface:
cd cnn-v1/pycnn
make install
Train models1. in order to run training, we need to set the environment variable to find cnn-python-library
export LD_LIBRARY_PATH=PATH/TO/cnn-v1/pycnn
2. run the following training, we need to get "train-projective.conllu", "word2vec.cbow.bin", "train-words-to-load.txt".
python bistparser/barchybrid/src/parser.py \--cnn-mem 4000 \--outdir /PATH/TO/OUTDIR \--train train-projective.conllu \--dev dev-projective.conllu \--epochs 20 --lstmdims 125 \--lstmlayers 2 --bibi-lstm \--k 3 --usehead --userl \--extrn word2vec.cbow.bin \--extrnFilter train-words-to-load.txt \[--hidden 50]
3. get "train-projective.conllu"
py/projectivise.py -c train.conllu > train-pojective.conllu
其中,“train.conllu”是在treebank中选中一个训练集。Treebank: Universal Dependencies 2.0可以从“https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2184”下载,其文件名为:ud-test-v2.0-conll2017.tgz。我们可以选择:ud-test-v2.0-conll2017/input/conll17-ud-trial-2017-03-19/en-udpipe.conllu.
4. get "dev-projective.conllu"
py/projectivise.py -c dev.conllu > dev-pojective.conllu
其中,“train.conllu”是在treebank中选中一个训练集。Treebank: Universal Dependencies 2.0可以从“https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2184”下载,其文件名为:ud-test-v2.0-conll2017.tgz。我们可以选择:ud-test-v2.0-conll2017/input/conll17-ud-development-2017-03-19/en-udpipe.conllu.
5. get "word2vec.cbow.bin"
downloaded "freebase-vectors-skipgram1000.bin.gz" or "freebase-vectors-skipgram1000-en.bin.gz" or " GoogleNews-vectors-negative300.bin.gz" from https://code.google.com/archive/p/word2vec/
However, the used word2vec.cbow.bin is not trained by GoogleNews!
So we need to train another corpora so that we can get the file "word2vec.cbow.bin"
Word embeddings have been calculated on corpora taken from "https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989",可以下载文件名为:word-embeddings-conll17.tar的文件,取其中的word-embeddings-conll17/English/en.vectors文件作为word2vec.cbow.bin。
6. get "train-words-to-load.txt"
cut -f2 train-projective.conllu | sort -u > forms.txt
cut -f3 train-projective.conllu | sort -u > lemmas.txt
cat forms.txt lemmas.txt | perl -CSD -ne 'print lc' | sort -u > train-words-to-load.txt.txt
7. begin to training
python bistparser/barchybrid/src/parser.py \--cnn-mem 4000 \--outdir /PATH/TO/OUTDIR \--train train-projective.conllu \--dev dev-projective.conllu \--epochs 20 --lstmdims 125 \--lstmlayers 2 --bibi-lstm \--k 3 --usehead --userl \--extrn word2vec.cbow.bin \--extrnFilter [--hidden 50]
Note: /PATH/TO/OUTDIR should be set as the true output directory.
Use Models
python bistparser/barchybrid/src/parser.py \--cnn-mem 4000 --predict \--outfile result.conllu \--test test-projective.conllu \--model /PATH/TO/OUTDIR/barchhybrid.model_NNN \--params /PATH/TO/OUTDIR/params.pickle \--k 3 --usehead --userl \--extrn word2vec.cbow.bin \--extrnFilter train-words-to-load.txt \--extrnFilterNew test-words-to-load.txt
发生如下错误:
出现上述错误,进过多次检查,发现barchhybrid.model_002的名字错误。
更改之后,发现如下错误:
出现上述原因是因为没有将cnn-v1/cnn/model.h替换掉。
Finally, de-projectivise output if you have projectivised:
py/projectivise.py -d result.conllu > result-deprojectivised.conllu
这篇关于CoNLL 2017 - Multi-Model and Crosslingual Dependency Analysis的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!