Pytorch Learning Note _1_ Try to build an AlexNet for classification and training
Published: 2019-04-21

1. about AlexNet

AlexNet structure

The structure in the paper is as follows:

	Input		(224x224x3)

(1)	Conv1		(96x55x55)
(2)	MaxPool1	(96x27x27)

(3)	Conv2		(256x27x27)
(4)	MaxPool2	(256x13x13)

(5)	Conv3		(384x13x13)
(6)	Conv4		(384x13x13)
(7)	Conv5		(256x13x13)
(8)	MaxPool3	(256x6x6)

(9)	FC		(1x4096)
(10)	FC		(4096)
(11)	FC		(1000)

	Output		(1000)

It is intended to use another data set with only 3 types of outputs, so (11) is changed to only 3 neurons

2. Start programming

(zero) versions of pytorch and torchvision used

pytorch 0.4.1 post2
torchvision 0.1.8 -->0.2.1

(1) Check whether Cuda and GPU are available

import torch
USE_GPU = torch.cuda.is_available()
print("USE_GPU = {}".format(USE_GPU))

If displayed as True, the GPU can be used for training.

(2) Construction of Network

using torch.nn, AlexNet uses the following classes:


question: the picture has rgb three channels, i.e. three layers. why use torch.nn.Conv2d () instead of torch.nn.Conv3d ()?

(1) The image does have rgb three channels, and the convolution stone filter is indeed three-dimensional, but Conv2d and Conv3d are not distinguished according to the dimension of convolution stone;
(2) When processing an image, the convolution stone only slides in the width and height directions, and Conv2d; is used correspondingly;
(3) We can say that when using Conv2d, pytorch defaults to the same number of layers (3 rgb) of the rolled image and the number of layers (3) of a single convolution stone.
(4) Therefore, when torch.nn.Conv2d () is used, the parameter kernel_size only needs to give the sizes of the two dimensions of convolution stone, and does not need to give "the number of layers of a single convolution stone" and "the number of convolution stone used".

'''example 输入224*224*3层 的图像  卷积为-->  96层*55*55 的feature map
   前两个参数3和96表示的是  输入的层数 、 输出的层数
   其中filter的层数默认为3,使用96个这样的filter,即得到想要大小的feature map'''
torch.nn.Conv2d(3, 96, kernel_size=(11,11), stride=3, padding=2, bias=True)

(3) image transformation and loading

image transformation using torchvision.transforms:

0.1.8 torchvision does not have transforms.Resize (), only transforms.Scale (), and can only be turned into squares;
Resize can be used in the 0.2.1 version of torchvision.

my_trans = transforms.Compose([transforms.Resize((224,224 ),transforms.ToTensor()])

images are loaded and read using and

To read data according to one's wishes, one can inherit the class and rewrite the following three methods:

def __init__():
def __getitem__():

def __len__():

(4) Start training

A, about Pytorch variable type conversion

Traceback (most recent call last):
  File "/home/mortimerli/桌面/测试小代码/1_mortimer_alexnet/", line 159, in <module>
    batch_loss = loss_func(y_pred, y_train_batch)  # 计算这个batch的loss
  File "/home/mortimerli/anaconda3/envs/python36/lib/python3.6/site-packages/torch/nn/modules/", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mortimerli/anaconda3/envs/python36/lib/python3.6/site-packages/torch/nn/modules/", line 421, in forward
    return F.mse_loss(input, target, reduction=self.reduction)
  File "/home/mortimerli/anaconda3/envs/python36/lib/python3.6/site-packages/torch/nn/", line 1716, in mse_loss
    return _pointwise_loss(lambda a, b: (a - b) ** 2, torch._C._nn.mse_loss, input, target, reduction)
  File "/home/mortimerli/anaconda3/envs/python36/lib/python3.6/site-packages/torch/nn/", line 1674, in _pointwise_loss
    return lambd_optimized(input, target, reduction)
RuntimeError: mse_loss_forward is not implemented for type torch.cuda.LongTensor
Follow the article above to change the code from:

batch_loss = loss_func(y_pred, y_train_batch)  # 计算这个batch的loss

Replace with:

aa = y_pred.float()
bb = y_train_batch.float()
batch_loss = loss_func(aa, bb)  # 计算这个batch的loss

can be used to calculate loss, and then the following problems occur

B, on losFunctionand gradient descent

Traceback (most recent call last):
  File "/home/mortimerli/桌面/测试小代码/1_mortimer_alexnet/", line 162, in <module>
    batch_loss.backward()   # 反向传播一次
  File "/home/mortimerli/anaconda3/envs/python36/lib/python3.6/site-packages/torch/", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/mortimerli/anaconda3/envs/python36/lib/python3.6/site-packages/torch/autograd/", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

It is suspected that parameters in model is not set to require gradient, but the following command outputs are all True to exclude this possibility

for para in model.parameters():
    print(para.requires_grad)  # = True

This code caused the problem when it was later discovered that it was solving the previous problem.

aa = y_pred.float()
bb = y_train_batch.float()
batch_loss = loss_func(aa, bb)  # 计算这个batch的loss

aa = y_pred.float () This operation makes aa have the same data as y_pred, but the attribute requires_grad has changed from True to False;;Bb is the same as y_train_batch.In this way, we get that the requires_grad attribute of the calculated batch_loss is also False, so it cannot be propagated back.The solution is to add the penultimate sentence in the following code and set the batch_loss.requires_grad property to True:

aa = y_pred.float()
bb = y_train_batch.float()
batch_loss = loss_func(aa, bb)  # 计算这个batch的loss

batch_loss.requires_grad = True#(requires_grad=True)
batch_loss.backward()   # 反向传播一次

Reference: Pytorch V0.4.0 in is released, which supports Windows and tensor/variable merging for the first time
C. training_acc is always 0

D, loss, acc

e. switch to CrossEntrophyLoss

F.leaf Tensor leaf node

Pickle AttributeError: Can’t get attribute ‘Wishart’ on <module ‘main’ from ‘’>

(5) trained and saved models



model = torch.load("model.pth")