Pytorch Mnist分类任务

[TOC]

## Mnist分类任务

### 了解目标

tn2>——网络基本构建与训练方法,常用函数解析
——torch.nn.functional模块
——nn.Module模块

## 读取Mnist数据集

tn2>执行下面点代码会自动进行下载。
或者从我的github中去下载data文件夹<a href="" target="_blank">下载</a>的内容

```python
%matplotlib inline
```

```python
from pathlib import Path
import requests

DATA_PATH = Path("data")
PATH = DATA_PATH / "mnist"

PATH.mkdir(parents=True, exist_ok=True)

URL = ""
FILENAME = "mnist.pkl.gz"

if not (PATH / FILENAME).exists():
        content = requests.get(URL + FILENAME).content
        (PATH / FILENAME).open("wb").write(content)
```

```python
import pickle
import gzip

# 解压包
with / FILENAME).as_posix(), "rb") as f:
    print(f)
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")
```

><gzip _io.BufferedReader name='data/mnist/mnist.pkl.gz' 0x7a05301f7400>

```python
x_train.shape
```

>(50000, 784)

50000个数据。<br/> 784是mnist数据集每个样本的像素点个数(`28*28*1`)<br/> 将 `x_train` 数组中的第一个图像(已经被重新塑形为 28x28 像素的二维数组)显示为一个灰度图像。 ```python from matplotlib import pyplot import numpy as np # imshow 函数来显示图像 pyplot.imshow(x_train[0].reshape((28, 28)), cmap="gray") print(x_train.shape) ``` >(50000, 784) ![]( ![]( ![]( tn2>注意数据需转换成tensor才能参与后续建模训练 ```python import torch # 将这些类型转换成torch.tensor类型 x_train, y_train, x_valid, y_valid = map( torch.tensor, (x_train, y_train, x_valid, y_valid) ) n, c = x_train.shape x_train, x_train.shape, y_train.min(), y_train.max() print(x_train, y_train) print(x_train.shape) print(y_train.min(), y_train.max()) ``` >tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]) tensor([5, 0, 4, ..., 8, 4, 8]) torch.Size([50000, 784]) tensor(0) tensor(9) ## torch.nn.functional 很多层和函数在这里都会见到 torch.nn.functional中有很多功能,后续会常用的。那什么时候使用nn.Module,什么时候使用nn.functional呢?一般情况下,如果模型有可学习的参数,最好用nn.Module,其他情况nn.functional相对更简单一些 ```python import torch.nn.functional as F # 这里定义了损失函数 loss_func 为交叉熵损失(Cross Entropy Loss),它是一种常用于多类分类问题的损失函数。 loss_func = F.cross_entropy # 在这个函数内部,输入 xb 通过矩阵乘法(mm)与 weights 相乘,并加上偏置 bias。 # 这是一个非常基础的线性层(也称为全连接层或密集层)的实现。 def model(xb): return + bias ``` ```python torch.randn([784, 10], dtype = torch.float, requires_grad = True).shape ``` >torch.Size([784, 10]) 接下来做一个简单的测试 ```python bs = 64 # 取出它们的64个样本标签 xb = x_train[0:bs] # a mini-batch from x yb = y_train[0:bs] # 随机获取784个数据和10个维度 weights = torch.randn([784, 10], dtype = torch.float, requires_grad = True) bs = 64 # 设置一个偏执b bias = torch.zeros(10, requires_grad=True) # 调用交叉熵损失函数,传入模型输出和对应的真实标签 yb,计算这个小批量数据上的损失值 print(loss_func(model(xb), yb)) ``` >tensor(14.8633, grad_fn=`<NllLossBackward0>`) ## 创建一个model来更简化代码 tn2>——必须继承nn.Module且在其构造函数中需调用nn.Module的构造函数 ——无需写反向传播函数,nn.Module能够利用autograd自动实现反向传播 ——Module中的可学习参数可以通过named_parameters()或者parameters()返回迭代器 ```python from torch import nn # 定义一个全连接的类继承 nn.Module class Mnist_NN(nn.Module): def __init__(self): super().__init__() # 第一层隐藏层784条数据点,输出128个像素点 self.hidden1 = nn.Linear(784, 128) # 第二层隐藏层128条数据点,输出256个像素点 self.hidden2 = nn.Linear(128, 256) # 输出10个 self.out = nn.Linear(256, 10) # 这里按照50%杀死这个东西 self.dropout = nn.Dropout(0.5) # 前向传播 # x: batch 特征 def forward(self, x): x = F.relu(self.hidden1(x)) x = self.dropout(x) x = F.relu(self.hidden2(x)) x = self.dropout(x) x = self.out(x) return x ``` tn>过拟合:就是稍微超出的意料以外的事情就解决不了了。 tn2>Dropout是一种用来防止神经网络过拟合的技术。 Dropout 就像是在团队做项目时,为了确保每个人都能独立解决问题,随机让一些成员休息,让剩下的成员完成任务。 ```python net = Mnist_NN() print(net) ``` >Mnist_NN( (hidden1): Linear(in_features=784, out_features=128, bias=True) (hidden2): Linear(in_features=128, out_features=256, bias=True) (out): Linear(in_features=256, out_features=10, bias=True) (dropout): Dropout(p=0.5, inplace=False) ) tn2>权重参数pytorch已经自动去做了,可以打印我们定义好名字里的权重和偏置项 ```python
for name, parameter in net.named_parameters():
    print(name, parameter,parameter.size())
```

>hidden1.weight Parameter containing:
tensor([[-0.0328, -0.0041, -0.0332,  ...,  0.0327, -0.0128, -0.0318],
        [ 0.0217,  0.0024, -0.0190,  ...,  0.0107,  0.0091,  0.0025],
        [-0.0261, -0.0255,  0.0267,  ...,  0.0205,  0.0028,  0.0088],
        ...,
        [-0.0247,  0.0210,  0.0056,  ...,  0.0177,  0.0318, -0.0213],
        [-0.0099, -0.0090,  0.0282,  ...,  0.0173,  0.0075,  0.0203],
        [ 0.0192, -0.0161,  0.0318,  ...,  0.0018, -0.0307, -0.0248]],
       requires_grad=True) torch.Size([128, 784])
hidden1.bias Parameter containing:
tensor([ 0.0075, -0.0238,  0.0057,  ..., -0.0185, -0.0329, -0.0004],
       requires_grad=True) torch.Size([128]) hidden2.weight Parameter containing:
tensor([[-0.0415,  0.0363, -0.0117,  ..., -0.0717, -0.0825,  0.0568],
        [-0.0881, -0.0485, -0.0498,  ...,  0.0801, -0.0196,  0.0175],
        [ 0.0238, -0.0028, -0.0533,  ..., -0.0325, -0.0531,  0.0269],
        ...,
        [-0.0340,  0.0684,  0.0152,  ...,  0.0156, -0.0137,  0.0137],
        [ 0.0845,  0.0142,  0.0351,  ...,  0.0205,  0.0393, -0.0280],
        [-0.0203,  0.0234, -0.0095,  ..., -0.0314,  0.0578,  0.0760]],
       requires_grad=True) torch.Size([256, 128]) hidden2.bias Parameter containing:
tensor([-0.0750, -0.0227, -0.0355,  ..., -0.0537, -0.0748,  0.0660],
       requires_grad=True) torch.Size([256]) out.weight Parameter containing:
tensor([[-0.0167, -0.0487, -0.0353,  ..., -0.0388,  0.0404,  0.0401],
        [-0.0403, -0.0562,  0.0585,  ..., -0.0038, -0.0478,  0.0184],
        [-0.0057, -0.0612, -0.0607,  ...,  0.0030,  0.0144,  0.0002],
        ...,
        [-0.0290,  0.0233,  0.0219,  ..., -0.0163, -0.0378, -0.0244],
        [ 0.0316, -0.0497,  0.0182,  ..., -0.0062,  0.0361,  0.0335],
        [ 0.0476, -0.0497, -0.0115,  ..., -0.0212, -0.0204,  0.0286]],
       requires_grad=True) torch.Size([10, 256]) out.bias Parameter containing:
tensor([-0.0288,  0.0392, -0.0213, -0.0028, -0.0356,  0.0588,  0.0381,  0.0176,
         0.0029, -0.0034],
       requires_grad=True) torch.Size([10]) requires_grad=True) torch.Size([10]) ## 使用TensorDataset和DataLoader来简化 tn2>获取训练集和验证集 ```python from import TensorDataset from import DataLoader # 将x_train, y_train封装数据成TensorDataset格式 train_ds = TensorDataset(x_train, y_train) # DataLoader 把train_ds打包给cpu。 # batch_size=bs 所以以每64byte打包给cpu。(数值可以为:64、128、256自定义) # shuffle=True 打乱顺序 train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True) valid_ds = TensorDataset(x_valid, y_valid) valid_dl = DataLoader(valid_ds, batch_size=bs * 2) ``` ```python # 获取数据的方法 def get_data(train_ds, valid_ds, bs): return ( DataLoader(train_ds, batch_size=bs, shuffle=True), DataLoader(valid_ds, batch_size=bs * 2), ) ``` tn2>一般在训练模型时加上`model.train()`,这样会正常使用Batch Normalization和 Dropout 测试的时候一般选择`model.eval()`,这样就不会使用Batch Normalization和 Dropout 我们数据有了,模型有了,接着我们需要定义一个fit函数作为一个训练的方法。 ```python import numpy as np # steps 训练次数 # model 训练模型 # loss_func 损失函数 # opt 优化器 # train_dl 训练集 # valid_dl 验证集 def fit(steps, model, loss_func, opt, train_dl, valid_dl): # 循环次数训练 for step in range(steps): # 训练模式,更新w和b model.train() for xb, yb in train_dl: loss_batch(model, loss_func, xb, yb, opt) # 验证模式是不进行更新的 model.eval() with torch.no_grad(): # zip 多个数值按照下标打包成的字典 # zip(*) 表示解包,这里解出来两个结果 losses, nums = zip( *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl] ) # 计算验证集的结果 # np.multiply 乘法 losses*nums再相加=总损失 # 除以总数量=等于平均损失 val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums) print('当前step:'+str(step), '验证集损失:'+str(val_loss)) ``` ```python from torch import optim def get_model(): # 定义模型 model = Mnist_NN() # SGD 是梯度下降,model.parameters()全更新,lr是学习率 return model, optim.SGD(model.parameters(), lr=0.001) # return model, optim.Adam(model.parameters(), lr=0.001) ``` ```python def loss_batch(model, loss_func, xb, yb, opt=None): # 计算我们的损失 # model(xb) 预测值 # yb 真实值 loss = loss_func(model(xb), yb) if opt is not None: # 反向传播 loss.backward() # 更新参数 w、b opt.step() # 梯度清零(pytorch会进行累加所以要清理) opt.zero_grad() # 返回结果和总数,因为要计算平均 return loss.item(), len(xb) ``` ## 三行搞定! ```python train_dl, valid_dl = get_data(train_ds, valid_ds, bs) model, opt = get_model() fit(25, model, loss_func, opt, train_dl, valid_dl) ``` >当前step:0 验证集损失:2.2804982192993166 当前step:1 验证集损失:2.2535984596252443 当前step:2 验证集损失:2.2155482387542724 当前step:3 验证集损失:2.1587322315216064 当前step:4 验证集损失:2.073927662277222 当前step:5 验证集损失:1.9530213565826415 当前step:6 验证集损失:1.7918485031127929 当前step:7 验证集损失:1.5947792680740356 当前step:8 验证集损失:1.3865230758666993 当前step:9 验证集损失:1.199518952178955 当前step:10 验证集损失:1.0501244049072265 当前step:11 验证集损失:0.936854721069336 当前step:12 验证集损失:0.8493013159751892 当前step:13 验证集损失:0.7818769567489624 当前step:14 验证集损失:0.7263854211807251 当前step:15 验证集损失:0.680776209449768 当前step:16 验证集损失:0.6429182374954223 当前step:17 验证集损失:0.6104381775856018 当前step:18 验证集损失:0.5826128076553345 当前step:19 验证集损失:0.5583832846164704 当前step:20 验证集损失:0.5369707444667816 当前step:21 验证集损失:0.51700747590065 当前step:22 验证集损失:0.5001905463218689 当前step:23 验证集损失:0.4858731382369995 当前step:24 验证集损失:0.4717775447368622 ```python correct = 0 total = 0 # 循环获取每一次验证集的预测 for xb, yb in valid_dl: outputs = model(xb) _, predicted = torch.max(, 1) # 返回最大值和最大值的索引 total += yb.size(0) correct += (predicted == yb).sum().item() print('准确率:%d %%' % (100 * correct / total)) ``` >准确率:87 % ## Adam测试 tn2>接下来我们使用Adam来进行测试 ```python from torch import optim def get_model(): # 定义模型 model = Mnist_NN() # SGD 是梯度下降,model.parameters()全更新,lr是学习率 # return model, optim.SGD(model.parameters(), lr=0.001) return model, optim.Adam(model.parameters(), lr=0.001) ``` ![]( tn2>通过切换优化器发现 SGD:下降得慢,而且训练25次只有85%-89%左右 Adam:下降得快,而且训练25次只有97%左右