corenlp注意事项 – 源码巴士

一些参数设置

ner-server.properties

annotators = tokenize,ssplit,pos,lemma,ner
ner.applyFineGrained = false
ner.useSUTime = 0

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties ner-server.properties

core_3.7 跟目前最新的core4.4有些细节区别。具体看文档上面有说。

https://stanfordnlp.github.io/CoreNLP/download.html

具体什么

python使用的话: pipline方式

from stanfordcorenlp import StanfordCoreNLP
#nlp = StanfordCoreNLP(r'core/stanford-corenlp-4.4.0')
nlp = StanfordCoreNLP('http://localhost', port=9000) 
import time
props={'annotators': 'tokenize,pos,ner','pipelineLanguage':'en','outputFormat':'json'}


passages = json.loads(nlp.annotate(row['passages'], properties=props))

你也可以直接单独用

text1_toks = " ".join(nlp.word_tokenize(text1_raw))

原文链接：https://blog.csdn.net/qq_28612967/article/details/125161410