NLP之Fasttext

2024-02-18 09:32

文章标签 nlp fasttext

本文主要是介绍NLP之Fasttext，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

NLP之Fasttext

一、简介

     Fasttext是2016年facebook开源的一个机器学习模型，可用于生成词向量和文本分类。文本分类方面据说有着和深度学习模型接近的效果，并且训练速度更快（其中一个原因是使用了层次化softmax加快了运算过程）。模型的输入是一个句子及其n-gram特征，输出类别。

二、语料（文本分类）

      输入的数据需要经过一些简单的处理，fasttext有着自己的数据格式：‘__label__[类别] 文本’另外fasttext用于监督学习的时候，训练及其预测过程类似于机器学习，它可调用fasttext.supervised(参数)训练，这里有一点需要注意，输入数据不是直接输入训练文本，而是输入训练文本所在的路径。但是在预测的时候，输入的是文本序列。

三、Fasttext.supervised常用参数：

		Input: 训练文本的路径output： 模型的保存路径下面是一些可选参数：-lr ：learning rate [0.05]-lrUpdateRate ：change the rate of updates for the learning rate [100]-dim  ：size of word vectors [100]  (词向量的大小)-ws ： size of the context window [5]-epoch ：number of epochs [5]   (select with bucket)-minCount： minimal number of word occurences [1]-neg ：number of negatives sampled [5]-wordNgrams ：max length of word ngram [1]-loss： loss function {ns, hs, softmax} [ns]-bucket： number of buckets [2000000]-minn ：min length of char ngram [3]-maxn： max length of char ngram [6]-thread： number of threads [12]-t ：sampling threshold [0.0001]-labe：l labels prefix [__label__]

四、精度评估&预测：

	精度评估：result = model.test(‘test.txt’)print(‘precision:{}’.format(result.precison))Print(‘recall:{}’.format(result.recall))print(‘Number of examples’, result.nexample)模型预测：（1）model.predict(texts)	# 仅标签类别（2）model.predict_proba(texts)	# 带概率的标签类别，比如[[('0', 0.998047)]]

五、其它

	（1）词向量的特点：词向量的距离可以衡量单词间的语义相似度（2）Fasttext模型类似于word2vec的CBOW模型，但CBOW是用上下文预测目标词汇，而fasttext是预测类别。具体可参考下面的博文：FastText：快速的文本分类器  https://blog.csdn.net/john_bh/article/details/79268850

六、参考文献

	[1]博文：http://albertxiebnu.github.io/fasttext/[2]Fasttext源码：https://github.com/facebookresearch/fastText[3]word2vec详解 https://blog.csdn.net/itplus/article/details/37969519[4]使用fasttext文本分类实践：https://cloud.tencent.com/developer/article/1061909

这篇关于NLP之Fasttext的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！