- scispaCy是一个Python包,其中包含用于处理生物医学,科学或临床文本的spaCy模型。
- 2022.03.14为止, Spacy官方 提供的可供安装的Models有8个:en_core_sci_sm、en_core_sci_md、en_core_sci_scibert、en_core_sci_lg、en_ner_craft_md、en_ner_jnlpba_md、en_ner_bc5cdr_md、en_ner_bionlp13cg_md
- xxx_sm —— 小型预训练模型(13MB)
- xxx_md —— 中型预训练模型(44 MB)
- xxx_lg —— 大型预训练模型(742MB)
- xxx_trf —— 438MB型transformer预训练)
- xxx_scibert ——
SciBERT
是一个以科学文本为训练对象的BERT
模型,
可参考 SciBERT: Pretrained Language Model for Scientific Text··································································································
主要原自:使用 spaCy 3.0 进行 NLP
Model | Description | Install URL |
---|---|---|
en_core_sci_sm | A full spaCy pipeline for biomedical data. | Download |
en_core_sci_md | A full spaCy pipeline for biomedical data with a larger vocabulary and 50k word vectors. | Download |
en_core_sci_scibert | A full spaCy pipeline for biomedical data with a ~785k vocabulary and allenai/scibert-base as the transformer model. | Download |
en_core_sci_lg | A full spaCy pipeline for biomedical data with a larger vocabulary and 600k word vectors. | Download |
Model | Description | Install URL |
---|---|---|
en_ner_craft_md | A spaCy NER model trained on the CRAFT corpus. | Download |
en_ner_jnlpba_md | A spaCy NER model trained on the JNLPBA corpus. | Download |
en_ner_bc5cdr_md | A spaCy NER model trained on the BC5CDR corpus. | Download |
en_ner_bionlp13cg_md | A spaCy NER model trained on the BIONLP13CG corpus. | Download |
表格来源于:SpaCy models for biomedical text processing
Anaconda某环境下安装Spacy的Models模型
找到Anaconda的 Anaconda Prompt
并 右击
,以管理员身份运行
(安装Spacy就在正常情况下安装就好了:pip install scispacy,方法很多,可以网上查查;我个人是基本上在Anaconda Navigater的操作界面手操的,很少使用命令!)
- 在base环境下:activate environment(你个人打算按在那个环境,就写那个环境名)
(base) C:\Users\Chain>activate github_39
- 安装Spacy的Models模型:pip install ModelName(个人想要安装的模型名)
(github_39) C:\Users\Chain>pip install en_ner_bionlp13cg_md
# 或者复制上面模型的URL来安装:pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_sm-0.5.0.tar.gz
# 卸载:pip uninstall en_ner_bionlp13cg_md
- 验证:不报错就是安装成功了
# 方式一
import en_ner_bionlp13cg_md # 推荐。若不报错,便是安装成功
# 方式二
import spacy
nlp = spacy.load("en_ner_bionlp13cg_md") # 可能会有报错
# 在实际程序中如果报错,可能是你使用的位置或者方法有问题(别问,问就是我又掉坑里!!!!)
本地安装可餐:Windows下安装使用Spacy英文库en_core_web_md,解决Warning: no model found for ‘en_core_web_md’
验证方式可餐:spaCy:No module named ‘en’ || Can‘t find model ‘en’
版权声明:本文为qq_45913057原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。