【Pytorch】Torch_NN_Learning

1
2
3
4
5
6
7
8
9
10
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.autograd as autograd
from torch.autograd import Variable
import numpy as np
print("Import success.\nTorch Version:{}".format(torch.__version__))
Import success.
Torch Version:0.2.1+a4fc05a

LSTM - Parameters

input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
num_layers – Number of recurrent layers.

bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True
batch_first – If True, then the input and output tensors are provided as (batch, seq, feature)
dropout – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer
bidirectional – If True, becomes a bidirectional RNN. Default: False

1
2
3
4
5
6
7
8
9
10
11
input = Variable(torch.randn(4,3,5)) # (seq_len, batch, input_size)
h0 = Variable(torch.randn(2,3,7)) # (num_layers * num_directions, batch, hidden_size)
c0 = Variable(torch.randn(2,3,7)) # (num_layers * num_directions, batch, hidden_size)
lstm = nn.LSTM(input_size=5, hidden_size=7, num_layers=2)
output, (hn, cn) = lstm(input, (h0, c0))
print(
'\nOutput', output, # (seq_len, batch, hidden_size * num_directions)
'\nh_n ', hn, # (num_layers * num_directions, batch, hidden_size)
'\nc_n ', cn, # (num_layers * num_directions, batch, hidden_size)
)
Output Variable containing:
(0 ,.,.) = 
  0.0798 -0.3524 -0.0318 -0.2497  0.1932 -0.0912  0.0585
  0.0715  0.1636  0.2120  0.2547 -0.3115  0.4829  0.0978
  0.1510 -0.4919 -0.0118  0.1087 -0.2944 -0.0191  0.4525

(1 ,.,.) = 
  0.0460 -0.1349  0.1419 -0.1101  0.0451 -0.1520 -0.0424
  0.0307  0.0099  0.1939  0.1564 -0.2541  0.1459 -0.0038
  0.1022 -0.2297  0.1253  0.0450 -0.1736 -0.0761  0.1345

(2 ,.,.) = 
  0.0382 -0.1091  0.1898 -0.0535 -0.0493 -0.1513 -0.1140
  0.0181 -0.0784  0.1801  0.0833 -0.2366  0.0152 -0.1006
  0.0564 -0.1635  0.1952 -0.0127 -0.1812 -0.0981 -0.0383

(3 ,.,.) = 
  0.0269 -0.1075  0.2085 -0.0134 -0.1187 -0.1357 -0.1500
  0.0225 -0.1104  0.2012  0.0484 -0.2287 -0.0512 -0.1547
  0.0267 -0.1547  0.1928 -0.0397 -0.1941 -0.0921 -0.1526
[torch.FloatTensor of size 4x3x7]

h_n    Variable containing:
(0 ,.,.) = 
  0.1074 -0.0985  0.0460  0.0892  0.0026 -0.0203  0.0786
  0.0937 -0.0193 -0.0566  0.1505  0.0223  0.1503  0.0570
 -0.0040  0.1010 -0.2472 -0.0768 -0.0267  0.3387  0.0787

(1 ,.,.) = 
  0.0269 -0.1075  0.2085 -0.0134 -0.1187 -0.1357 -0.1500
  0.0225 -0.1104  0.2012  0.0484 -0.2287 -0.0512 -0.1547
  0.0267 -0.1547  0.1928 -0.0397 -0.1941 -0.0921 -0.1526
[torch.FloatTensor of size 2x3x7]

c_n    Variable containing:
(0 ,.,.) = 
  0.2615 -0.3211  0.0973  0.2190  0.0056 -0.0338  0.1595
  0.3154 -0.0683 -0.1427  0.2748  0.0584  0.3398  0.1123
 -0.0157  0.3064 -0.3791 -0.1399 -0.0585  0.8162  0.1604

(1 ,.,.) = 
  0.0552 -0.2677  0.4853 -0.0205 -0.2206 -0.2210 -0.2661
  0.0474 -0.2559  0.5212  0.0737 -0.4361 -0.0798 -0.2621
  0.0541 -0.3830  0.5064 -0.0609 -0.3632 -0.1520 -0.2786
[torch.FloatTensor of size 2x3x7]

Bi-LSTM

Add parameter bidirectional=True
and take care of num_directions is double
other parameters are the same as LSTM

1
2
3
4
5
6
7
8
9
10
h0 = Variable(torch.randn(2*2,3,7)) # (num_layers * num_directions, batch, hidden_size)
c0 = Variable(torch.randn(2*2,3,7)) # (num_layers * num_directions, batch, hidden_size)
bilstm = nn.LSTM(input_size=5, hidden_size=7, num_layers=2, bidirectional=True)
output, (hn, cn) = bilstm(input, (h0, c0))
print(
'\nOutput', output, # (seq_len, batch, hidden_size * num_directions)
'\nh_n ', hn, # (num_layers * num_directions, batch, hidden_size)
'\nc_n ', cn, # (num_layers * num_directions, batch, hidden_size)
)
Output Variable containing:
(0 ,.,.) = 

Columns 0 to 8 
  -0.0328  0.1329 -0.1598  0.0962 -0.2130  0.2678 -0.0482  0.0515 -0.0290
 -0.4693 -0.1502 -0.0117  0.0687  0.6282 -0.1565  0.7240  0.0644 -0.0035
 -0.1117  0.3106 -0.0701 -0.0226 -0.5075  0.1602 -0.1437  0.0071  0.1246

Columns 9 to 13 
   0.0238  0.0820 -0.1239 -0.0197  0.0165
 -0.2156  0.0808  0.0332  0.1923 -0.0358
 -0.1882 -0.1621  0.0670  0.0959  0.0796

(1 ,.,.) = 

Columns 0 to 8 
  -0.0134  0.0914 -0.1416  0.1550 -0.0034  0.1058 -0.0382  0.0279 -0.0681
 -0.3385 -0.0606 -0.1338 -0.0166  0.3892 -0.1016  0.3074  0.1098 -0.0233
 -0.1066  0.1902 -0.1043  0.0515 -0.2649  0.0779 -0.1027  0.0279  0.0465

Columns 9 to 13 
   0.0358  0.1217 -0.1425 -0.0235  0.0493
 -0.2052  0.0850  0.0339  0.1756 -0.0640
 -0.1521 -0.2092  0.0313  0.0631  0.0642

(2 ,.,.) = 

Columns 0 to 8 
   0.0486  0.0285 -0.1354  0.1621  0.1405  0.0095 -0.0334  0.0275 -0.1737
 -0.2761 -0.0289 -0.1539 -0.0745  0.3068 -0.1381  0.1749  0.1334  0.0352
 -0.0790  0.1042 -0.1282  0.1465 -0.0645 -0.0118 -0.0119  0.0132 -0.0647

Columns 9 to 13 
   0.0710  0.2234 -0.1106 -0.0286  0.1901
 -0.1466  0.1494  0.0337  0.2215 -0.1346
 -0.1716 -0.2517  0.0648  0.0666  0.0637

(3 ,.,.) = 

Columns 0 to 8 
   0.1408 -0.0332 -0.1137  0.1926  0.2436 -0.0803 -0.0155  0.0165 -0.3842
 -0.2130  0.0105 -0.1439 -0.1528  0.2133 -0.1947  0.0306  0.3092  0.1739
 -0.0390  0.0827 -0.1251  0.2153  0.0422 -0.0405 -0.0032 -0.0257 -0.0834

Columns 9 to 13 
   0.2120  0.5565  0.0042 -0.0354  0.4640
 -0.0739  0.2532 -0.0068  0.3069 -0.2176
 -0.1889 -0.3534 -0.0101  0.1968 -0.0465
[torch.FloatTensor of size 4x3x14]

h_n    Variable containing:
(0 ,.,.) = 
 -0.0627  0.2550  0.3279  0.0315 -0.1899 -0.0517 -0.1582
 -0.1523  0.0896  0.2832 -0.0463 -0.0302  0.0724 -0.0680
  0.1372  0.2027  0.2818  0.0424 -0.2472 -0.0850 -0.2549

(1 ,.,.) = 
 -0.1606 -0.1595 -0.1187  0.0032 -0.0036  0.0083 -0.1326
  0.1111 -0.1617  0.1805 -0.0115 -0.1544  0.0465 -0.1717
  0.0162  0.1585  0.2094 -0.1939 -0.0532  0.0590 -0.1576

(2 ,.,.) = 
  0.1408 -0.0332 -0.1137  0.1926  0.2436 -0.0803 -0.0155
 -0.2130  0.0105 -0.1439 -0.1528  0.2133 -0.1947  0.0306
 -0.0390  0.0827 -0.1251  0.2153  0.0422 -0.0405 -0.0032

(3 ,.,.) = 
  0.0515 -0.0290  0.0238  0.0820 -0.1239 -0.0197  0.0165
  0.0644 -0.0035 -0.2156  0.0808  0.0332  0.1923 -0.0358
  0.0071  0.1246 -0.1882 -0.1621  0.0670  0.0959  0.0796
[torch.FloatTensor of size 4x3x7]

c_n    Variable containing:
(0 ,.,.) = 
 -0.1302  0.6549  0.6786  0.3079 -0.5095 -0.1389 -0.4543
 -0.2786  0.1790  0.5202 -0.1162 -0.0607  0.1905 -0.1051
  0.3220  0.5443  0.7510  0.1576 -0.5975 -0.2758 -0.3876

(1 ,.,.) = 
 -0.3360 -0.2363 -0.2426  0.0050 -0.0094  0.0226 -0.4114
  0.2075 -0.2975  0.3651 -0.0201 -0.4975  0.0950 -0.2769
  0.0260  0.2856  0.3388 -0.3938 -0.2133  0.1107 -0.1942

(2 ,.,.) = 
  0.2412 -0.0912 -0.2701  0.4474  0.3928 -0.1122 -0.0346
 -0.5209  0.0326 -0.2324 -0.2867  0.4848 -0.3340  0.0575
 -0.0617  0.2354 -0.2384  0.4614  0.0843 -0.0706 -0.0074

(3 ,.,.) = 
  0.1120 -0.0555  0.0582  0.1377 -0.2276 -0.0279  0.0292
  0.1428 -0.0056 -0.5134  0.1684  0.0564  0.3211 -0.0665
  0.0190  0.2116 -0.4337 -0.3472  0.1136  0.1530  0.1594
[torch.FloatTensor of size 4x3x7]

LSTMCell

show how does one single cell work

1
2
3
4
5
6
7
8
9
rnn = nn.LSTMCell(5, 8)
inp = Variable(torch.randn(4, 3, 5))
hx = Variable(torch.randn(3, 8))
cx = Variable(torch.randn(3, 8))
output = []
for i in range(4):
hx, cx = rnn(inp[i], (hx, cx))
output.append(hx)
print("H_{}'s {}".format(i, hx))
H_0's Variable containing:
 0.0161  0.0820  0.1264  0.5698  0.1555  0.1019 -0.6331  0.4666
-0.0466 -0.3108 -0.3889 -0.0909  0.0535 -0.0389 -0.0661 -0.1533
-0.0957  0.4769 -0.5832  0.2228  0.0177 -0.1251 -0.1822 -0.5942
[torch.FloatTensor of size 3x8]

H_1's Variable containing:
-0.0217  0.0698  0.1196  0.3380  0.1997  0.0103 -0.3124  0.1821
-0.0285 -0.1948 -0.2075  0.0533 -0.1547  0.2410 -0.1686 -0.1814
-0.0989  0.1679 -0.5986  0.3037 -0.0145  0.1371 -0.1285 -0.3058
[torch.FloatTensor of size 3x8]

H_2's Variable containing:
-0.1365  0.1022  0.0917  0.3858  0.1318  0.0406 -0.2220  0.1262
 0.0806 -0.0453 -0.1894  0.1336 -0.0463  0.2549 -0.1588 -0.4024
 0.0784  0.0149 -0.3712  0.2240  0.0573 -0.0053 -0.2268 -0.2647
[torch.FloatTensor of size 3x8]

H_3's Variable containing:
 0.0085  0.0597  0.0405  0.2158  0.0499  0.0835 -0.3906 -0.0203
 0.0866  0.0170 -0.1895  0.1637 -0.1015  0.3329 -0.2589 -0.3394
 0.0964 -0.0598 -0.2832  0.2294  0.0313  0.1325 -0.1107 -0.3527
[torch.FloatTensor of size 3x8]

Dropout Layers

  • Input: Any. Input can be of any shape
  • Output: Same. Output is of the same shape as input
1
2
3
4
5
6
7
8
9
10
m = nn.Dropout(p=0.3)
input = Variable(torch.randn(5, 4))
output = m(input)
print(torch.cat((input, output), 1))
# Alpha Dropout is a type of Dropout that maintains the self-normalizing property.
m = nn.AlphaDropout(p=0.3)
input = Variable(torch.randn(5, 4))
output = m(input)
print(torch.cat((input, output), 1))
Variable containing:
 1.3842 -0.1793  1.5022 -1.2981  1.9774 -0.0000  2.1460 -0.0000
-2.2350  0.9031 -0.2413  0.0366 -3.1928  1.2902 -0.0000  0.0522
-0.0988  0.6401  0.8262 -0.4700 -0.0000  0.9144  0.0000 -0.6714
 0.4723  0.0987 -0.1887  1.6187  0.6748  0.0000 -0.2696  2.3124
 1.6831  0.1688  0.8709  1.0857  2.4044  0.0000  0.0000  1.5510
[torch.FloatTensor of size 5x8]

Variable containing:
 0.7550  0.9882 -0.9701 -0.1685  1.1041  1.3049 -1.0595  0.3090
-1.9965  0.8511 -0.0932 -1.7224 -1.2648 -1.0595  0.3738 -1.0288
-0.3788 -2.1000  1.0952 -0.9518  0.1280 -1.3539 -1.0595 -0.3653
-1.4031  0.5019 -0.6636 -0.0361 -1.0595  0.8862 -0.1172  0.4230
-1.3436 -1.0563  1.4147  0.5900 -0.7027 -0.4553  1.6721  0.9620
[torch.FloatTensor of size 5x8]

Padding Layers

N_Batches x Channels x Height x Width

  • $ (N, C, H, W) \rightarrow (N, C, H_{out}, W_{out}) $
  • $ H_{out} = H_{in} + paddingTop + paddingBottom $
  • $ W_{out} = W_{in} + paddingLeft + paddingRight $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Only 4D and 5D padding is supported for now
input = autograd.Variable(torch.randn(1, 2, 3, 4))
# uses the same padding in all boundaries
# m = nn.ZeroPad2d(1)
m = nn.ConstantPad2d(1, 2.3333)
output = m(input)
print(output)
print("\n=======================\n")
# using different paddings
m = nn.ZeroPad2d((1, 1, 2, 2))
# m = nn.ConstantPad2d((1, 1, 2, 2), 2.3333)
output = m(input)
print(output)
Variable containing:
(0 ,0 ,.,.) = 
  2.3333  2.3333  2.3333  2.3333  2.3333  2.3333
  2.3333  2.0598  0.5779  0.7410 -0.2043  2.3333
  2.3333  2.0359 -1.6858  0.4359  0.3211  2.3333
  2.3333  0.3481  0.5727  0.5786 -0.7968  2.3333
  2.3333  2.3333  2.3333  2.3333  2.3333  2.3333

(0 ,1 ,.,.) = 
  2.3333  2.3333  2.3333  2.3333  2.3333  2.3333
  2.3333  1.1789 -0.4450  0.4749  0.2136  2.3333
  2.3333  1.2923  0.8678  1.6216 -0.1105  2.3333
  2.3333  2.1250  0.8989 -0.2381  1.7026  2.3333
  2.3333  2.3333  2.3333  2.3333  2.3333  2.3333
[torch.FloatTensor of size 1x2x5x6]


=======================

Variable containing:
(0 ,0 ,.,.) = 
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  2.0598  0.5779  0.7410 -0.2043  0.0000
  0.0000  2.0359 -1.6858  0.4359  0.3211  0.0000
  0.0000  0.3481  0.5727  0.5786 -0.7968  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000

(0 ,1 ,.,.) = 
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  1.1789 -0.4450  0.4749  0.2136  0.0000
  0.0000  1.2923  0.8678  1.6216 -0.1105  0.0000
  0.0000  2.1250  0.8989 -0.2381  1.7026  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
[torch.FloatTensor of size 1x2x7x6]

Non-linear Activations

  • nn.ReLu(inplace=False)
    • ${ReLU}(x)= max(0, x)$
  • nn.Softmax()
    • $f_i(x) = exp(x_i) / sum_j exp(x_j)$
  • nn.Sigmoid()
    • $f(x) = 1 / ( 1 + exp(-x))$
  • nn.Tanh()
    • $f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))$
  • nn.Threshold(threshold, value, inplace=False)
1
2
3
4
5
6
7
8
input = autograd.Variable(torch.randn(2, 3))
print(input)
relu = nn.ReLU()
print('\nReLU ', relu(input))
sm = nn.Softmax() # The same as Sigmoid & Tanh
print('\nSoftMax ', sm(input))
Variable containing:
 0.2629 -0.5756 -0.4757
-0.2046 -0.1826  0.5311
[torch.FloatTensor of size 2x3]


ReLU     Variable containing:
 0.2629  0.0000  0.0000
 0.0000  0.0000  0.5311
[torch.FloatTensor of size 2x3]


SoftMax  Variable containing:
 0.5235  0.2264  0.2501
 0.2434  0.2488  0.5079
[torch.FloatTensor of size 2x3]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import torch
from torch.autograd import Variable
# define an inputs
x_tensor = torch.randn(10, 20)
y_tensor = torch.randn(10, 5)
x = Variable(x_tensor, requires_grad=False)
y = Variable(y_tensor, requires_grad=False)
# define some weights
w = Variable(torch.randn(20, 5), requires_grad=True)
# get variable tensor
print(type(w.data)) # torch.FloatTensor
# get variable gradient
print(w.grad) # None
loss = torch.mean((y - x @ w) ** 2)
# calculate the gradients
loss.backward()
print(w.grad) # some gradients
# manually apply gradients
w.data -= 0.01 * w.grad.data
# manually zero gradients after update
w.grad.data.zero_()
<class 'torch.FloatTensor'>
None
Variable containing:
-1.3514 -0.1052  1.0056  0.2811 -0.3309
 0.5037  0.9949 -1.6392 -0.4351  1.2254
 0.2477 -0.0502  0.4510  0.7238  0.1114
 0.4799  0.3167 -0.6135 -0.4998  0.2620
 1.0254  0.7146 -0.5500  0.3868  0.1841
 0.1149 -0.0351 -0.3343 -0.4571  0.3408
-0.2435 -0.3256 -0.8101 -1.4030  0.4093
 0.8297 -0.1577 -1.8171 -0.7431  1.0062
 0.0229  0.1829 -0.4641 -0.4319  0.2729
-1.2153 -1.2480  0.6714 -0.4719 -0.4976
-0.7302 -0.0150  0.6535  0.0073 -0.0176
 0.1842 -0.8359 -0.1110 -0.3290 -0.2575
 1.0419  1.0069 -2.1212 -1.4792  1.2291
 1.1946  1.1317  0.0296  1.1031  0.2735
 0.4553  0.2371  0.4601  0.9679  0.0660
-1.1472 -0.2064  1.0872 -0.3853 -0.5404
 0.7875  0.4278 -0.1380  0.7322  0.0687
 1.9909  1.2813 -1.6926 -0.0396  1.0175
 0.8311  0.6657 -0.8842 -0.3210  0.5822
-0.1637 -0.3244  0.3780  0.1150 -0.2662
[torch.FloatTensor of size 20x5]







    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0
[torch.FloatTensor of size 20x5]

Combine layers with senquential

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Example of using Sequential
model = nn.Sequential(
nn.Conv2d(1,20,5),
nn.ReLU(),
nn.Conv2d(20,64,5),
nn.ReLU()
)
# Example of using Sequential with OrderedDict
from collections import OrderedDict
model = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(1,20,5)),
('relu1', nn.ReLU()),
('conv2', nn.Conv2d(20,64,5)),
('relu2', nn.ReLU())
]))

How to train a model with GPU

Example for training LR model with GPU

  • batch_cpu = Variable(torch.from_numpy(x[idx])).float()
  • batch = batch_cpu.cuda() # 很重要

  • target_cpu = Variable(torch.from_numpy(y[idx])).float()

  • target = target_cpu.cuda() # 很重要
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import matplotlib.pyplot as plt
# GPU Example from http://blog.csdn.net/wuichuan
x = np.random.randn(1000, 1) * 4
w = np.array([0.5,])
bias = -1.68
y_true = np.dot(x, w) + bias #真实数据
y = y_true + np.random.randn(x.shape[0]) #加噪声的数据
# 使用x和y,以及y_true回归出w和bias
# 定义回归网络
class LinearRression(nn.Module):
def __init__(self, input_size, out_size):
super(LinearRression, self).__init__()
self.x2o = nn.Linear(input_size, out_size)
#初始化
def forward(self, x):
return self.x2o(x)
#前向传递
batch_size = 10
model = LinearRression(1, 1) #回归模型
criterion = nn.MSELoss() #损失函数
print(model.parameters())
#调用cuda
model.cuda()
criterion.cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
losses = []
epoches = 101
for i in range(epoches):
loss = 0
optimizer.zero_grad() # 清空上一步的梯度
idx = np.random.randint(x.shape[0], size=batch_size)
batch_cpu = Variable(torch.from_numpy(x[idx])).float()
batch = batch_cpu.cuda() # 很重要
target_cpu = Variable(torch.from_numpy(y[idx])).float()
target = target_cpu.cuda() # 很重要
output = model.forward(batch)
loss += criterion(output, target)
loss.backward()
optimizer.step()
if i%10 == 0:
print('Loss at epoch[%s]: %.3f' % (i, loss.data[0]))
losses.append(loss.data[0])
plt.plot(losses, '-ob')
plt.xlabel("Epoch")
plt.xlabel("Loss")
plt.show()
<generator object Module.parameters at 0x000000A21BAEB830>
Loss at epoch[0]: 3.325
Loss at epoch[10]: 2.538
Loss at epoch[20]: 2.621
Loss at epoch[30]: 2.538
Loss at epoch[40]: 2.584
Loss at epoch[50]: 1.451
Loss at epoch[60]: 0.588
Loss at epoch[70]: 3.073
Loss at epoch[80]: 2.032
Loss at epoch[90]: 0.864
Loss at epoch[100]: 0.594

png