李宏毅 2021 机器学习课程 HW 2 记录

本文为日志一样的记录文档,直接记录当时的想法和一些代码改动,可能参考意义不是特别大,只是拿来给我自己后面复盘用的,可能做完一个项目后会重新调整一下本文内容。

2021/08/23 记录:

今天参考这篇文章调整了模型,使用了更大的 Batch_size,更高的模型层数,L2 正则化避免过拟合,在学习至 0.78 精度时还是进入了过拟合。

2021/08/21 记录:

现在能保证每次都能直接训练到 0.75 以上,然后在 150 个 Epoch 左右进入过拟合阶段,可能需要一些其他技巧才能继续训练。

1
2
3
4
num_epoch = 200
learning_rate = 0.005 #发现用这个 LR 能更快的拟合

optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.001)

现在的模型为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import torch
import torch.nn as nn

class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
self.bn0 = nn.BatchNorm1d(429)
self.layer1 = nn.Linear(429, 1024)
self.bn1 = nn.BatchNorm1d(1024)
self.layer2 = nn.Linear(1024, 512)
self.bn2 = nn.BatchNorm1d(512)
self.layer3 = nn.Linear(512, 256)
self.bn3 = nn.BatchNorm1d(256)
self.layer4 = nn.Linear(256, 128)
self.bn4 = nn.BatchNorm1d(128)
self.out = nn.Linear(128, 39)

self.act_fn = nn.ReLU()
self.dropout = nn.Dropout(p=0.5)
self.dropout1 = nn.Dropout(p=0.1)
self.dropout2 = nn.Dropout(p=0.3)

def forward(self, x):
x = self.bn0(x)
x = self.layer1(x)
x = self.bn1(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer2(x)
x = self.bn2(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer3(x)
x = self.bn3(x)
x = self.dropout1(x)
x = self.act_fn(x)

x = self.layer4(x)
x = self.bn4(x)
x = self.dropout2(x)
x = self.act_fn(x)

x = self.out(x)

return x

2021/08/18 记录:

又是没什么进展的一天,甚至发现昨天训练到的 0.75 准确度是偶然情况,并不能再次复现,不过发现使用学习率衰减的方式能够更快的拟合。

1
2
3
learning_rate = 0.01

scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=10,gamma=0.5,verbose=True)

2021/08/17 记录:

今天终于有了进展,将 Dropout 参数调整至了 0.5,进一步防止了过拟合,并且提高了 Batch_size 至 256(经测试 256 能更快的拟合,512 虽然能提高速度但是效果并不理想),因为 Batch Normalization 需要在每批次的量足够多的时候才能有好的效果,本来测试了一下,是不是需要进行自适应学习率调整,然后加入了一个学习率调整的参数,但是除了让学习速度变慢了,没有什么实质性效果,Adam 这个 optim 本身就已经有了自适应学习速率,用它就够了。

然后模型什么时候拟合这个问题,今天发现需要观察其 loss 变化来进行分析,之前几天我以为达到了过拟合,其实只是训练次数不够造成的错觉,今天在 80 个 Epoch 里,测试准确度达到了 0.75 以上。

这里贴一个关于拟合的文章

1
BATCH_SIZE = 512
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import torch
import torch.nn as nn

class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
self.layer1 = nn.Linear(429, 1024)
self.bn1 = nn.BatchNorm1d(1024)
self.layer2 = nn.Linear(1024, 512)
self.bn2 = nn.BatchNorm1d(512)
self.layer3 = nn.Linear(512, 256)
self.bn3 = nn.BatchNorm1d(256)
self.layer4 = nn.Linear(256, 128)
self.bn4 = nn.BatchNorm1d(128)
self.out = nn.Linear(128, 39)

self.act_fn = nn.ReLU()
self.dropout = nn.Dropout(p=0.5)

def forward(self, x):
x = self.layer1(x)
x = self.bn1(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer2(x)
x = self.bn2(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer3(x)
x = self.bn3(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer4(x)
x = self.bn4(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.out(x)

return x
1
num_epoch = 1000

2021/08/15 记录:

尝试在 Adam 中使用 weight_decay 来防止目标过拟合,但因为使用了 Dropout 效果并不显著,其实此时函数已经没有处于过拟合的状态了,可能需要调整模型结构才能继续提高精度,目前将训练 Epoch 提至 40 查看最后效果,在 0.743 时提升 loss 变化已经变得缓慢了起来,怀疑已经接近 critical point。

下次将提升 Batch_size 查看效果,因为此时 batch_size 维持在一个小的量,可能做 normalization 带来的效果并不明显。

目前模型结构为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import torch
import torch.nn as nn

class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
self.layer1 = nn.Linear(429, 1024)
self.bn1 = nn.BatchNorm1d(1024)
self.layer2 = nn.Linear(1024, 512)
self.bn2 = nn.BatchNorm1d(512)
self.layer3 = nn.Linear(512, 256)
self.bn3 = nn.BatchNorm1d(256)
self.layer4 = nn.Linear(256, 128)
self.bn4 = nn.BatchNorm1d(128)
self.out = nn.Linear(128, 39)

self.act_fn = nn.ReLU()
self.dropout = nn.Dropout(p=0.3)

def forward(self, x):
x = self.layer1(x)
x = self.bn1(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer2(x)
x = self.bn2(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer3(x)
x = self.bn3(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer4(x)
x = self.bn4(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.out(x)

return x
1
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)

2021/08/14 记录:

直接跑 Baseline 模型,发现 Train Acc 远高于 Val Acc,初步判断是由于模型过拟合造成的。使用 Batch Norm 能较少提高模型精度,于是在每一层计算中加入 Drop out,降低模型的过拟合程度,在默认的 20 个 Epoch 中顺利提高精度至 0.737 左右。

下面放出修改后的模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import torch
import torch.nn as nn

class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
self.layer1 = nn.Linear(429, 1024)
self.bn1 = nn.BatchNorm1d(1024)
self.layer2 = nn.Linear(1024, 512)
self.bn2 = nn.BatchNorm1d(512)
self.layer3 = nn.Linear(512, 128)
self.bn3 = nn.BatchNorm1d(128)
self.out = nn.Linear(128, 39)

self.act_fn = nn.ReLU()
self.dropout = nn.Dropout(p=0.3)

def forward(self, x):
x = self.layer1(x)
x = self.bn1(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer2(x)
x = self.bn2(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.layer3(x)
x = self.bn3(x)
x = self.dropout(x)
x = self.act_fn(x)

x = self.out(x)

return x

作者

Lebenito

发布于

2021-08-14

更新于

2022-09-09

许可协议