Industrial-Strength Natural Language Processing ^ _ ^
Reference
Introduction
Spacy is an Industrial-Strength Natural Language Processing tool.
Quick Start
1 | # pip install -U spacy |
Execute python -m spacy download en_core_web_sm
might fail because of the error of network connection. Another way to download language model is install package from github. Then install it offline: pip install <package>
.
For example, install ans use “zh_core_web_sm-3.2.0”:
- download zh_core_web_sm-3.2.0.tar.gz from https://github.com/explosion/spacy-models/releases/.
- copy the package to the
download
folder under the project directory.
- In Visual Studio, drag the package from local computer directory to remote server directory is convenient when the size of package is small.
- If the package size is too big,
scp /path/filename username@servername:/path
will be more convenient. Because transfering too big file by VS Code might cause occidental lost of connection.
- install offline:
pip install zh_core_web_sm-3.2.0.tar.gz
- Then you can load the language model in your python script:
nlp = spacy.load("zh_core_web_sm")
Features
Support for 64+ languages
64 trained pipelines for 19 languages
pre-trained transformers
Multi-task learning with pretrained transformers like BERT
Pretrained word vectors
State-of-the-art speed
Production-ready training system
Linguistically-motivated tokenization
Components for many nlp tasks
Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and more.
Custom components
Easily extensible with custom components and attributes.
Custom models
Support for custom models in PyTorch, TensorFlow and other frameworks.