QIAN Li etc. write the review of Text Classifiction in 2020 ^ _ ^
Overview
Pipeline Of Text Classification
Method Changes
The biggest difference between shallow learning(feature engineering) and deep learning is deep learning extracts features autommatically.
Methods
Shallow Learngin Models
pipeline
- Preprocess: Such as word segmentation, data cleaning, data statistics
- Text Representation: aims to express preprocessed text in a form that is easy for computation. Such as Bag-Of-Words(BOW), N-gram, Term Frequency Inverse Document Frequency(TF-IDF), word2vec, GloVe.
- Represented text is fed into the classifier according to selected features.
The discussion is about representative classifiers.
PGM-based mothods
Probabilistic graphical models(PHMs) express the conditional dependencies among features in graphs.
- Naive Bayes
- Hidden Markov Model
KNN(K-Nearest Neiborhood)-based Method
SVM(Support Vector Machine)-based Method
DT(Decision Tree)-based Method
Integration-based Method
aims to aggregate the results of multiple algorithm, sunch as RF(Random Forest), XGBoost and stacking.