Theme NexT works best with JavaScript enabled
0%

卷积神经网络

^ _ ^

Motivation

In MLP, if in a layer, every input item needs to multiply a independent weight, the layer will be named Fully Connected Layer or Dense Layer. However, it is not appropriate for some tasks, such as:

  • Image Recognition: The same picture with different image transformation will have very different results.
  • Emotion Classfication: The emotional polarity of a sentence is usually determined by some words or phrases. But the position of these decisive words are not fixed.

Overall, the Fully Connected Layer is hard to capture critical local information.

To solve the problem, an idea is using a small dense layer to extract those local features, such as pixels in fixed-size window, n-gram in text. The small dense layer is called Kernel or Filter.

Inputs with different dimension will extract local features with different dimensions. We can use Pooling operation to retain features what we want. Pooling operation can solve the problem of inconsistent size of input samples.

Additionally, we can use multiple kernels to extract multiple groups of features. There are 2 ways to construct kernels:

  1. Use different initialization params.
  2. Extract diffrent scale of local features.

Finally, we can stacking many convolution layer and pooling layer to construct a deeper network. Those networks are called Convolutional Neural Network.

Structure

Simplest Start

  1. Mapping every word into a vector.
  2. Use 4 kernels to extract local features.
    • kernel size: N
    • input length: L
    • output length: L - N + 1
  3. Different kernel get different output. Input these output to pooling layer and get output with same dimension.
  4. Concat these output into one feature vector.
  5. Finally through Fully Connected Layer to classify.

The CNN which slides in a single direction is called One-Dimension Convolution. One Dimension Convolution is suitable for NLP data. When we need to deal with image data, we need Two-Demension Convolution, which slides both horizantol and vertical.

Code

Convolution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import torch
from torch.nn import Conv1d

# Conv1d(in_channels, out_channels, kernel_size)
conv1 = Conv1d(5, 2, 4)
conv2 = Conv1d(5, 2, 3)

# input size: (batch, in_channels, seq_len)
# in nlp, in_channels represents the dimension of word vector.
inputs = torch.rand(2, 5, 6)

# output size: (batch, out_channels, seq_len)
# batch: 2; out_channels: 2
outputs1 = conv1(inputs) # seq_len: 3 = 6 - 4 + 1
outputs2 = conv2(inputs) # seq_len: 4 = 6 - 3 + 1

Pooling

1
2
3
4
5
6
7
from torch.nn import MaxPool1d
pool1 = MaxPool1d(3) # 3 is the seq_len of outputs1
pool2 = MaxPool1d(4)

# outputs_pool1 size: [batch, out_channels, 1]
outputs_pool1 = pool1(outputs1)
outputs_pool2 = pool2(outputs2)

Pool can also use as a function. The advantage of this method is we don’t need know the pool size in advance.

1
2
3
4
import torch.nn.functional as F

output_pool1 = F.max_pool1d(outputs1, kernel_size=outputs1.shape[2])
output_pool2 = F.max_pool1d(outputs2, kernel_size=outputs2.shape[2])

Squeeze & Concat

We need to concat outputs_pool1 and outputs_pool2. Before that operation, we need to delete dimension which is 1, named squeeze.

Squeeze

1
2
3
# if outputs_pool1[2] == 1, the delete dimension 2
outputs_pool_squeeze1 = outputs_pool1.squeeze(dim=2)
outputs_pool_squeeze2 = outputs_pool2.squeeze(dim=2)

Concat

1
outputs_pool = torch.cat([outputs_pool_squeeze1, outputs_pool_squeeze2], dim=1)

Fully Connected Layer

1
2
3
4
5
from torch.nn import Linear

linear = Linear(4, 2)
output_linear = liner(output_pool)
print(outputs_linear)