本文主要是介绍nltk安装与使用,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
Natural Language Toolkit,自然语言处理工具包,在NLP领域中,最常使用的一个Python库。
1、安装nltk
pip install -upgrade nltk
2、安装nltk_data
import nltk
nltk.download('punkt') # 英文且此、词根、切句等方法
nltk.download('stopwords') # 英文停用词库
我是用上面python代码下载相关数据集,一直报错
[nltk_data] Error loading punkt: <urlopen error [Errno 8] nodename nor
[nltk_data] servname provided, or not known>
[nltk_data] Error loading stopwords: <urlopen error [Errno 8] nodename
[nltk_data] nor servname provided, or not known>
最后去github手动下载,下载packages中的所有内容
下载后放到本地文件夹,我放在了/Users/sunwenjun/anaconda3/envs/python310/nltk_data/
,注意有些压缩包要解压。
from nltk.data import find
print(find('punkt')) # /Users/sunwenjun/anaconda3/envs/python310/nltk_data/punkt
print(find('tokenizers')) # /Users/sunwenjun/anaconda3/envs/python310/nltk_data/tokenizers
3、nltk使用
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwordsinput_string = 'Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks'# 分词
word_tokens = word_tokenize(input_string)
print(word_tokens) # ['Retrieval-Augmented', 'Generation', 'for', 'Knowledge-Intensive', 'NLP', 'Tasks']# 去停用词
stop_words = set(stopwords.words('english'))
filtered_words = [w for w in word_tokens if not w.lower() in stop_words]
print(filtered_words) # ['Retrieval-Augmented', 'Generation', 'Knowledge-Intensive', 'NLP', 'Tasks']# 取词根
ps = PorterStemmer()
ps_words = [ps.stem(w) for w in filtered_words]
print(ps_words) # ['retrieval-aug', 'gener', 'knowledge-intens', 'nlp', 'task']
4、nltk_data可存放的路径
LookupError:
**********************************************************************Resource punkt not found.Please use the NLTK Downloader to obtain the resource:>>> import nltk>>> nltk.download('punkt')For more information see: https://www.nltk.org/data.htmlAttempted to load corpora/punktSearched in:- '/Users/sunwenjun/nltk_data'- '/Users/sunwenjun/anaconda3/envs/python310/nltk_data'- '/Users/sunwenjun/anaconda3/envs/python310/share/nltk_data'- '/Users/sunwenjun/anaconda3/envs/python310/lib/nltk_data'- '/usr/share/nltk_data'- '/usr/local/share/nltk_data'- '/usr/lib/nltk_data'- '/usr/local/lib/nltk_data'
**********************************************************************
这篇关于nltk安装与使用的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!