Pytorch 基于经典网络架构训练图像分类模型 电脑版发表于:2023/12/27 22:39 ![](https://img.tnblog.net/arcimg/hb/21f086c80c5d4afda1bc1029dadd8f3a.png) >#Pytorch 基于经典网络架构训练图像分类模型 [TOC] ### 数据预处理部分: tn2>数据增强:torchvision中transforms模块自带功能,比较实用 数据预处理:torchvision中transforms也帮我们实现好了,直接调用即可 DataLoader模块直接读取batch数据 ### 网络模块设置: tn2>加载预训练模型,torchvision中有很多经典网络架构,调用起来十分方便,并且可以用人家训练好的权重参数来继续训练,也就是所谓的迁移学习。 需要注意的是别人训练好的任务跟咱们的可不是完全一样,需要把最后的head层改一改,一般也就是最后的全连接层,改成咱们自己的任务 训练时可以全部重头训练,也可以只训练最后咱们任务的层,因为前几层都是做特征提取的,本质任务目标是一致的 ![](https://img.tnblog.net/arcimg/hb/9b14ee80a63f4f66a74f105c3984ea39.png) ### 网络模型保存与测试 tn2>模型保存的时候可以带有选择性,例如在验证集中如果当前效果好则保存 读取模型进行实际测试 ![](https://img.tnblog.net/arcimg/hb/ab7cfb75c2a845918917965606d7acaa.png) ```python import os import matplotlib.pyplot as plt %matplotlib inline import numpy as np import torch from torch import nn import torch.optim as optim import torchvision #pip install torchvision from torchvision import transforms, models, datasets #https://pytorch.org/docs/stable/torchvision/index.html import imageio import time import warnings import random import sys import copy import json from PIL import Image ``` ### 数据读取与预处理操作 tn2>相关数据集请点击这里<a href="https://github.com/AiDaShi/learningpytorch/tree/main/028_034%EF%BC%9A%E5%9B%BE%E5%83%8F%E8%AF%86%E5%88%AB%E6%A0%B8%E5%BF%83%E6%A8%A1%E5%9D%97%E5%AE%9E%E6%88%98%E8%A7%A3%E8%AF%BB/%E5%8D%B7%E7%A7%AF%E7%BD%91%E7%BB%9C%E5%AE%9E%E6%88%98" target="_blank">下载</a>,创建相关路径。 ```python data_dir = './flower_data/' train_dir = data_dir + '/train' valid_dir = data_dir + '/valid' ``` ### 制作数据源 tn2>data_transforms中指定了所有图像预处理操作 ImageFolder假设所有的文件按文件夹保存好,每个文件夹下面存贮同一类别的图片,文件夹的名字为分类的名字。 压缩的图片大小越小,卷积得越小。 ### 数据增强 tn2>通过对原始图片数据随机旋转、裁剪、缩放...等增加图片数据的识别度。 定义训练集和验证集,并对其进行数据增强。 ```python data_transforms = { 'train': transforms.Compose([ transforms.Resize([96,96]), # 压缩成96*96的图片 transforms.RandomRotation(45),#随机旋转,-45到45度之间随机选 transforms.CenterCrop(64),#从中心开始裁剪 64 transforms.RandomHorizontalFlip(p=0.5),#随机水平翻转 选择一个概率概率 transforms.RandomVerticalFlip(p=0.5),#随机垂直翻转 transforms.ColorJitter(brightness=0.2, contrast=0.1, saturation=0.1, hue=0.1),#参数1为亮度,参数2为对比度,参数3为饱和度,参数4为色相 transforms.RandomGrayscale(p=0.025),#概率转换成灰度率,3通道就是R=G=B transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])#均值,标准差 ]), 'valid': transforms.Compose([ transforms.Resize([96,96]), transforms.CenterCrop(64), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), } ``` tn2>创建图片数据集,图片加载器和图片数量。 tn>注意我的图片目录结构很特殊,长这样。 ![](https://img.tnblog.net/arcimg/hb/2de8335eae0a432caa119c1b196135fa.png) ```python batch_size = 8 image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'valid']} dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True) for x in ['train', 'valid']} dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'valid']} class_names = image_datasets['train'].classes ``` ```python image_datasets ``` ![](https://img.tnblog.net/arcimg/hb/c7635b3a1e9a420f9d890206ab764b6b.png) ```python dataloaders ``` ![](https://img.tnblog.net/arcimg/hb/ddae154f94fe46b9af2d3be57c71a28c.png) ```python dataset_sizes ``` >{'train': 6552, 'valid': 818} ### 读取标签对应的实际名字 tn2>这里的有一个<a href="https://github.com/AiDaShi/learningpytorch/blob/main/028_034%EF%BC%9A%E5%9B%BE%E5%83%8F%E8%AF%86%E5%88%AB%E6%A0%B8%E5%BF%83%E6%A8%A1%E5%9D%97%E5%AE%9E%E6%88%98%E8%A7%A3%E8%AF%BB/%E5%8D%B7%E7%A7%AF%E7%BD%91%E7%BB%9C%E5%AE%9E%E6%88%98/cat_to_name.json" target="_blank">cat_to_name.json</a>文件用于标签每个图片文件夹对应的类型名称,我们可以通过如下代码查看一下: ```python with open('cat_to_name.json', 'r') as f: cat_to_name = json.load(f) ``` ```python cat_to_name ``` ![](https://img.tnblog.net/arcimg/hb/43bba43b412b454ab79adc6fe285fb15.png) ### 展示下数据 tn>注意:tensor的数据需要转换成numpy的格式,而且还需要还原回标准化的结果 ```python dataloaders['valid'] ``` >`<torch.utils.data.dataloader.DataLoader at 0x7c6979aef310>` ```python def im_convert(tensor): """ 张量转换为可视化图像。 参数: tensor (torch.Tensor): 一个PyTorch张量,通常是一个图像的表示。 返回: numpy.ndarray: 转换后的图像,适用于显示。 """ # 使用CPU,克隆一个图片副本,确保原始张量不受影响 image = tensor.to("cpu").clone().detach() # 将张量转换为NumPy数组。此步骤去除了任何额外的维度(如批处理大小) image = image.numpy().squeeze() # 转换颜色通道的顺序。PyTorch中的张量通常是[通道, 高度, 宽度], # 而标准图像格式是[高度, 宽度, 通道]。 image = image.transpose(1, 2, 0) # 对图像进行去标准化。这里假定图像最初是标准化的,即按通道减去均值并除以标准差。 # 这里的操作是标准化的逆过程:先乘以标准差,再加上均值。 image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406)) # 将图像的像素值剪切到[0, 1]范围内,以确保适合显示。 image = image.clip(0, 1) return image ``` ```python # 创建一个图形窗口,设定大小为20x12英寸 fig = plt.figure(figsize=(20, 12)) # 定义列数和行数 columns = 4 rows = 2 # 从数据加载器中获取一批数据。这里假设'dataloaders'是一个字典,包含了不同数据集(如训练集、验证集等)的DataLoader。 dataiter = iter(dataloaders['valid']) inputs, classes = next(dataiter) # 循环遍历要显示的图像数量(由列数和行数决定) for idx in range(columns * rows): # 为每张图像创建一个子图。'rows, columns, idx+1'指定了子图的位置。 ax = fig.add_subplot(rows, columns, idx + 1, xticks=[], yticks=[]) # 设置子图的标题。这里假设'cat_to_name'是一个将类别索引映射到其名称的字典,'class_names'是一个包含类别名的列表。 ax.set_title(cat_to_name[str(int(class_names[classes[idx]]))]) # 使用之前定义的'im_convert'函数将PyTorch张量转换为图像,并显示图像。 plt.imshow(im_convert(inputs[idx])) # 显示整个图表 plt.show() ``` ![](https://img.tnblog.net/arcimg/hb/8bf80e7ab9244432b79d673f312698e5.png) ### 选择与加载模型 tn2>加载models中提供的模型,并且直接用训练的好权重当做初始化参数。 第一次执行需要下载,可能会比较慢,我会提供给大家一份下载好的,可以直接放到相应路径。 ```python model_name = 'resnet' #可选的比较多 ['resnet', 'alexnet', 'vgg', 'squeezenet', 'densenet', 'inception'] #是否用人家训练好的特征来做 feature_extract = True ``` tn2>判断是否使用GPU进行训练模型,如果没有还是使用cpu进行跑模型。 ```python # 是否用GPU训练 train_on_gpu = torch.cuda.is_available() if not train_on_gpu: print('CUDA is not available. Training on CPU ...') else: print('CUDA is available! Training on GPU ...') device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") ``` >CUDA is available! Training on GPU ... tn2>定义模型中设置参数并取消梯度的方法。 ```python # 设置一些参数的梯度不再更新 def set_parameter_requires_grad(model, feature_extracting): if feature_extracting: for param in model.parameters(): # 让所有参数的梯度都不再更新 param.requires_grad = False ``` ```python model_ft = models.resnet18() model_ft ``` ![](https://img.tnblog.net/arcimg/hb/11a820996eda450b9f8e6b07be7fa04b.png) ![](https://img.tnblog.net/arcimg/hb/9b3182cca0ab46a0970d3227c8c0d6ae.png) ### 参考pytorch官网例子 tn2>这是初始化模型的创建的方法,有不同模型可以进行初始化。 这里定义时除了对模型的创建,还做了取消已经训练好的模型梯度下降,并对最后一层进行了更改。 ```python def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True): # 选择合适的模型,不同模型的初始化方法稍微有点区别 model_ft = None input_size = 0 if model_name == "resnet": """ Resnet18 """ model_ft = models.resnet18(pretrained=use_pretrained) # 设置模型的梯度下降不进行更新 set_parameter_requires_grad(model_ft, feature_extract) # 找到fc层进行更改 num_ftrs = model_ft.fc.in_features model_ft.fc = nn.Sequential(nn.Linear(num_ftrs, 102), nn.LogSoftmax(dim=1)) input_size = 64 elif model_name == "alexnet": """ Alexnet """ model_ft = models.alexnet(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) num_ftrs = model_ft.classifier[6].in_features model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes) input_size = 224 elif model_name == "vgg": """ VGG11_bn """ model_ft = models.vgg16(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) num_ftrs = model_ft.classifier[6].in_features model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes) input_size = 224 elif model_name == "squeezenet": """ Squeezenet """ model_ft = models.squeezenet1_0(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1)) model_ft.num_classes = num_classes input_size = 224 elif model_name == "densenet": """ Densenet """ model_ft = models.densenet121(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) num_ftrs = model_ft.classifier.in_features model_ft.classifier = nn.Linear(num_ftrs, num_classes) input_size = 224 elif model_name == "inception": """ Inception v3 Be careful, expects (299,299) sized images and has auxiliary output """ model_ft = models.inception_v3(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) # Handle the auxilary net num_ftrs = model_ft.AuxLogits.fc.in_features model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes) # Handle the primary net num_ftrs = model_ft.fc.in_features model_ft.fc = nn.Linear(num_ftrs,num_classes) input_size = 299 else: print("Invalid model name, exiting...") exit() return model_ft, input_size ``` ###?设置哪些层需要训练 tn2>初始化模型,并使用GPU进行模型的训练,定义训练时保存的文件路径以防数据丢失,最后打印一下需要梯度下降的层。 也就是我们修改的最后一层。 ```python model_ft, input_size = initialize_model(model_name, 102, feature_extract, use_pretrained=True) #GPU计算 model_ft = model_ft.to(device) #?模型保存 filename='checkpoint.pth' # 是否训练所有层 params_to_update = model_ft.parameters() print("Params to learn:") if feature_extract: params_to_update = [] for name,param in model_ft.named_parameters(): if param.requires_grad == True: params_to_update.append(param) print("\t",name) else: for name,param in model_ft.named_parameters(): if param.requires_grad == True: print("\t",name) ``` ![](https://img.tnblog.net/arcimg/hb/4c0c3e6b516f46f1a828553bbf6b93d5.png) ```python model_ft ``` ![](https://img.tnblog.net/arcimg/hb/b79b0849313e4edabba33a4c6f16c2da.png) ###?优化器设置 tn2>创建Adam优化器,`1e-2`学习率`0.01`,学习率每7个epoch会减少到原来的`1/10`,最后定义一个`NLLLoss()`损失函数。 ```python # 优化器设置 optimizer_ft = optim.Adam(params_to_update, lr=1e-2) scheduler = optim.lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)#学习率每7个epoch衰减成原来的1/10 #最后一层已经LogSoftmax()了,所以不能nn.CrossEntropyLoss()来计算了,nn.CrossEntropyLoss()相当于logSoftmax()和nn.NLLLoss()整合 # 添加损失函数 criterion = nn.NLLLoss() ``` ### 训练模块 ```python def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False,filename=filename): since = time.time() best_acc = 0 """ checkpoint = torch.load(filename) best_acc = checkpoint['best_acc'] model.load_state_dict(checkpoint['state_dict']) optimizer.load_state_dict(checkpoint['optimizer']) model.class_to_idx = checkpoint['mapping'] """ model.to(device) # 初始化用于记录训练过程中的准确度历史、损失历史和学习率的列表。 val_acc_history = [] # 用于记录每个epoch后在验证集上的准确率 train_acc_history = [] # 用于记录每个epoch后在训练集上的准确率 train_losses = [] # 用于记录每个epoch的训练损失 valid_losses = [] # 用于记录每个epoch的验证损失 LRs = [optimizer.param_groups[0]['lr']] # 记录初始学习率 # 复制模型的初始权重。在训练过程中,如果发现更好的模型,将用这个变量来保存那个模型的状态。 best_model_wts = copy.deepcopy(model.state_dict()) for epoch in range(num_epochs): print('Epoch {}/{}'.format(epoch, num_epochs - 1)) print('-' * 10) # 训练和验证 for phase in ['train', 'valid']: if phase == 'train': model.train() # 训练 else: model.eval() # 验证 # 初始化累计损失和正确预测数 running_loss = 0.0 running_corrects = 0 # 把数据都取个遍 for inputs, labels in dataloaders[phase]: inputs = inputs.to(device) labels = labels.to(device) # 清零 optimizer.zero_grad() # 只有训练的时候计算和更新梯度 with torch.set_grad_enabled(phase == 'train'): if is_inception and phase == 'train': # 对Inception模型的特殊处理 outputs, aux_outputs = model(inputs) loss1 = criterion(outputs, labels) loss2 = criterion(aux_outputs, labels) loss = loss1 + 0.4*loss2 else:#resnet执行的是这里 # 计算输出和损失 outputs = model(inputs) loss = criterion(outputs, labels) _, preds = torch.max(outputs, 1) # 在训练阶段进行反向传播和优化 if phase == 'train': loss.backward() optimizer.step() # 计算损失累计当前批次的损失和正确预测数 running_loss += loss.item() * inputs.size(0) running_corrects += torch.sum(preds == labels.data) # 计算该阶段的平均损失和准确率 epoch_loss = running_loss / len(dataloaders[phase].dataset) epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset) time_elapsed = time.time() - since print('Time elapsed {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60)) print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc)) # 如果是验证阶段,保存最佳模型和更新历史数据 if phase == 'valid' and epoch_acc > best_acc: best_acc = epoch_acc best_model_wts = copy.deepcopy(model.state_dict()) state = { 'state_dict': model.state_dict(), 'best_acc': best_acc, 'optimizer' : optimizer.state_dict(), } torch.save(state, filename) if phase == 'valid': val_acc_history.append(epoch_acc) valid_losses.append(epoch_loss) scheduler.step(epoch_loss) if phase == 'train': train_acc_history.append(epoch_acc) train_losses.append(epoch_loss) # 更新学习率历史 print('Optimizer learning rate : {:.7f}'.format(optimizer.param_groups[0]['lr'])) LRs.append(optimizer.param_groups[0]['lr']) print() # 输出总的训练时间和最佳验证准确率 time_elapsed = time.time() - since print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60)) print('Best val Acc: {:4f}'.format(best_acc)) # 训练完后用最好的一次当做模型最终的结果 # 加载最佳模型权重 model.load_state_dict(best_model_wts) # 返回训练后的模型和历史数据 return model, val_acc_history, train_acc_history, valid_losses, train_losses, LRs ``` ### 开始训练! ```python model_ft, val_acc_history, train_acc_history, valid_losses, train_losses, LRs = train_model(model_ft, dataloaders, criterion, optimizer_ft, num_epochs=20, is_inception=(model_name=="inception")) ``` ![](https://img.tnblog.net/arcimg/hb/cafd93a43ac84029a3aa0bff7486877f.png) tn2>在到13次expose时,验证集的损失达到了`0.3191`,已经很低了。 ### 再继续训练所有层 ```python # 开启所有层的梯度下降 for param in model_ft.parameters(): param.requires_grad = True # 再继续训练所有的参数,学习率调小一点 optimizer = optim.Adam(params_to_update, lr=1e-4) scheduler = optim.lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1) # 损失函数 criterion = nn.NLLLoss() ``` ```python # Load the checkpoint checkpoint = torch.load(filename) best_acc = checkpoint['best_acc'] model_ft.load_state_dict(checkpoint['state_dict']) optimizer.load_state_dict(checkpoint['optimizer']) #model_ft.class_to_idx = checkpoint['mapping'] ``` ```python model_ft, val_acc_history, train_acc_history, valid_losses, train_losses, LRs = train_model(model_ft, dataloaders, criterion, optimizer, num_epochs=10, is_inception=(model_name=="inception")) ``` ![](https://img.tnblog.net/arcimg/hb/04b526b0fe994ba8961d346e3475abbc.png) ### 测试网络效果 tn2>输入一张测试图像,看看网络的返回结果: ```python probs, classes = predict(image_path, model) print(probs) print(classes) > [ 0.01558163 0.01541934 0.01452626 0.01443549 0.01407339] > ['70', '3', '45', '62', '55'] ``` tn2>注意预处理方法需相同 ### 加载训练好的模型 ```python model_ft, input_size = initialize_model(model_name, 102, feature_extract, use_pretrained=True) # GPU模式 model_ft = model_ft.to(device) #?保存文件的名字 filename='seriouscheckpoint.pth' # 加载模型 checkpoint = torch.load(filename) best_acc = checkpoint['best_acc'] model_ft.load_state_dict(checkpoint['state_dict']) ``` ### 测试数据预处理 tn2>测试数据处理方法需要跟训练时一直才可以 crop操作的目的是保证输入的大小是一致的 标准化操作也是必须的,用跟训练数据相同的mean和std,但是需要注意一点训练数据是在0-1上进行标准化,所以测试数据也需要先归一化 最后一点,PyTorch中颜色通道是第一个维度,跟很多工具包都不一样,需要转换 ```python def process_image(image_path): # 读取测试数据 img = Image.open(image_path) # Resize,thumbnail方法只能进行缩小,所以进行了判断 if img.size[0] > img.size[1]: img.thumbnail((10000, 256)) else: img.thumbnail((256, 10000)) # Crop操作 left_margin = (img.width-224)/2 bottom_margin = (img.height-224)/2 right_margin = left_margin + 224 top_margin = bottom_margin + 224 img = img.crop((left_margin, bottom_margin, right_margin, top_margin)) # 相同的预处理方法 img = np.array(img)/255 mean = np.array([0.485, 0.456, 0.406]) #provided mean std = np.array([0.229, 0.224, 0.225]) #provided std img = (img - mean)/std # 注意颜色通道应该放在第一个位置 img = img.transpose((2, 0, 1)) return img ``` ```python def imshow(image, ax=None, title=None): """展示数据""" if ax is None: fig, ax = plt.subplots() # 颜色通道还原 image = np.array(image).transpose((1, 2, 0)) # 预处理还原 mean = np.array([0.485, 0.456, 0.406]) std = np.array([0.229, 0.224, 0.225]) image = std * image + mean image = np.clip(image, 0, 1) ax.imshow(image) ax.set_title(title) return ax ``` ```python image_path = 'image_06621.jpg' img = process_image(image_path) imshow(img) ``` ![](https://img.tnblog.net/arcimg/hb/0a2896089a4b4e6cb8610e51a1b8cc40.png) ```python img.shape ``` >(3, 224, 224) ```python # 得到一个batch的测试数据 dataiter = iter(dataloaders['valid']) images, labels = next(dataiter) model_ft.eval() if train_on_gpu: output = model_ft(images.cuda()) else: output = model_ft(images) ``` tn2>output表示对一个batch中每一个数据得到其属于各个类别的可能性 ```python output.shape ``` >torch.Size([8, 102]) ### 得到概率最大的那个 ```python _, preds_tensor = torch.max(output, 1) preds = np.squeeze(preds_tensor.numpy()) if not train_on_gpu else np.squeeze(preds_tensor.cpu().numpy()) preds ``` >array([87, 87, 26, 87, 87, 87, 33, 84]) ### 展示预测结果 ```python fig=plt.figure(figsize=(20, 20)) columns =4 rows = 2 for idx in range (columns*rows): ax = fig.add_subplot(rows, columns, idx+1, xticks=[], yticks=[]) plt.imshow(im_convert(images[idx])) ax.set_title("{} ({})".format(cat_to_name[str(preds[idx])], cat_to_name[str(labels[idx].item())]), color=("green" if cat_to_name[str(preds[idx])]==cat_to_name[str(labels[idx].item())] else "red")) plt.show() ``` ![](https://img.tnblog.net/arcimg/hb/aa5d2fe734de4f47b873645ece68553a.png)