本文为日志一样的记录文档,直接记录当时的想法和一些代码改动,可能参考意义不是特别大,只是拿来给我自己后面复盘用的,可能做完一个项目后会重新调整一下本文内容。
2021/08/23 记录:
今天参考这篇文章调整了模型,使用了更大的 Batch_size,更高的模型层数,L2 正则化避免过拟合,在学习至 0.78 精度时还是进入了过拟合。
2021/08/21 记录:
现在能保证每次都能直接训练到 0.75 以上,然后在 150 个 Epoch 左右进入过拟合阶段,可能需要一些其他技巧才能继续训练。
1 2 3 4
| num_epoch = 200 learning_rate = 0.005
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.001)
|
现在的模型为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
| import torch import torch.nn as nn
class Classifier(nn.Module): def __init__(self): super(Classifier, self).__init__() self.bn0 = nn.BatchNorm1d(429) self.layer1 = nn.Linear(429, 1024) self.bn1 = nn.BatchNorm1d(1024) self.layer2 = nn.Linear(1024, 512) self.bn2 = nn.BatchNorm1d(512) self.layer3 = nn.Linear(512, 256) self.bn3 = nn.BatchNorm1d(256) self.layer4 = nn.Linear(256, 128) self.bn4 = nn.BatchNorm1d(128) self.out = nn.Linear(128, 39)
self.act_fn = nn.ReLU() self.dropout = nn.Dropout(p=0.5) self.dropout1 = nn.Dropout(p=0.1) self.dropout2 = nn.Dropout(p=0.3)
def forward(self, x): x = self.bn0(x) x = self.layer1(x) x = self.bn1(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer2(x) x = self.bn2(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer3(x) x = self.bn3(x) x = self.dropout1(x) x = self.act_fn(x)
x = self.layer4(x) x = self.bn4(x) x = self.dropout2(x) x = self.act_fn(x)
x = self.out(x) return x
|
2021/08/18 记录:
又是没什么进展的一天,甚至发现昨天训练到的 0.75 准确度是偶然情况,并不能再次复现,不过发现使用学习率衰减的方式能够更快的拟合。
1 2 3
| learning_rate = 0.01
scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=10,gamma=0.5,verbose=True)
|
2021/08/17 记录:
今天终于有了进展,将 Dropout 参数调整至了 0.5,进一步防止了过拟合,并且提高了 Batch_size 至 256(经测试 256 能更快的拟合,512 虽然能提高速度但是效果并不理想),因为 Batch Normalization 需要在每批次的量足够多的时候才能有好的效果,本来测试了一下,是不是需要进行自适应学习率调整,然后加入了一个学习率调整的参数,但是除了让学习速度变慢了,没有什么实质性效果,Adam 这个 optim 本身就已经有了自适应学习速率,用它就够了。
然后模型什么时候拟合这个问题,今天发现需要观察其 loss 变化来进行分析,之前几天我以为达到了过拟合,其实只是训练次数不够造成的错觉,今天在 80 个 Epoch 里,测试准确度达到了 0.75 以上。
这里贴一个关于拟合的文章。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
| import torch import torch.nn as nn
class Classifier(nn.Module): def __init__(self): super(Classifier, self).__init__() self.layer1 = nn.Linear(429, 1024) self.bn1 = nn.BatchNorm1d(1024) self.layer2 = nn.Linear(1024, 512) self.bn2 = nn.BatchNorm1d(512) self.layer3 = nn.Linear(512, 256) self.bn3 = nn.BatchNorm1d(256) self.layer4 = nn.Linear(256, 128) self.bn4 = nn.BatchNorm1d(128) self.out = nn.Linear(128, 39)
self.act_fn = nn.ReLU() self.dropout = nn.Dropout(p=0.5)
def forward(self, x): x = self.layer1(x) x = self.bn1(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer2(x) x = self.bn2(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer3(x) x = self.bn3(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer4(x) x = self.bn4(x) x = self.dropout(x) x = self.act_fn(x)
x = self.out(x) return x
|
2021/08/15 记录:
尝试在 Adam 中使用 weight_decay 来防止目标过拟合,但因为使用了 Dropout 效果并不显著,其实此时函数已经没有处于过拟合的状态了,可能需要调整模型结构才能继续提高精度,目前将训练 Epoch 提至 40 查看最后效果,在 0.743 时提升 loss 变化已经变得缓慢了起来,怀疑已经接近 critical point。
下次将提升 Batch_size 查看效果,因为此时 batch_size 维持在一个小的量,可能做 normalization 带来的效果并不明显。
目前模型结构为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
| import torch import torch.nn as nn
class Classifier(nn.Module): def __init__(self): super(Classifier, self).__init__() self.layer1 = nn.Linear(429, 1024) self.bn1 = nn.BatchNorm1d(1024) self.layer2 = nn.Linear(1024, 512) self.bn2 = nn.BatchNorm1d(512) self.layer3 = nn.Linear(512, 256) self.bn3 = nn.BatchNorm1d(256) self.layer4 = nn.Linear(256, 128) self.bn4 = nn.BatchNorm1d(128) self.out = nn.Linear(128, 39)
self.act_fn = nn.ReLU() self.dropout = nn.Dropout(p=0.3)
def forward(self, x): x = self.layer1(x) x = self.bn1(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer2(x) x = self.bn2(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer3(x) x = self.bn3(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer4(x) x = self.bn4(x) x = self.dropout(x) x = self.act_fn(x)
x = self.out(x) return x
|
1
| optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)
|
2021/08/14 记录:
直接跑 Baseline 模型,发现 Train Acc 远高于 Val Acc,初步判断是由于模型过拟合造成的。使用 Batch Norm 能较少提高模型精度,于是在每一层计算中加入 Drop out,降低模型的过拟合程度,在默认的 20 个 Epoch 中顺利提高精度至 0.737 左右。
下面放出修改后的模型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
| import torch import torch.nn as nn
class Classifier(nn.Module): def __init__(self): super(Classifier, self).__init__() self.layer1 = nn.Linear(429, 1024) self.bn1 = nn.BatchNorm1d(1024) self.layer2 = nn.Linear(1024, 512) self.bn2 = nn.BatchNorm1d(512) self.layer3 = nn.Linear(512, 128) self.bn3 = nn.BatchNorm1d(128) self.out = nn.Linear(128, 39)
self.act_fn = nn.ReLU() self.dropout = nn.Dropout(p=0.3)
def forward(self, x): x = self.layer1(x) x = self.bn1(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer2(x) x = self.bn2(x) x = self.dropout(x) x = self.act_fn(x)
x = self.layer3(x) x = self.bn3(x) x = self.dropout(x) x = self.act_fn(x)
x = self.out(x) return x
|