QIAN Li etc. write the review of Text Classifiction in 2020 ^ _ ^

Overview

Pipeline Of Text Classification

Method Changes

The biggest difference between shallow learning(feature engineering) and deep learning is deep learning extracts features autommatically.

Methods

pipeline

Preprocess: Such as word segmentation, data cleaning, data statistics
Text Representation: aims to express preprocessed text in a form that is easy for computation. Such as Bag-Of-Words(BOW), N-gram, Term Frequency Inverse Document Frequency(TF-IDF), word2vec, GloVe.
Represented text is fed into the classifier according to selected features.

The discussion is about representative classifiers.

Probabilistic graphical models(PHMs) express the conditional dependencies among features in graphs.

aims to aggregate the results of multiple algorithm, sunch as RF(Random Forest), XGBoost and stacking.