在第一個基于cnn的架構(gòu)(AlexNet)贏得ImageNet 2012比賽之后,每個隨后的獲勝架構(gòu)都在深度神經(jīng)網(wǎng)絡中使用更多的層來降低錯誤率。這適用于較少的層數(shù),但當我們增加層數(shù)時,深度學習中會出現(xiàn)一個常見的問題,稱為消失/爆炸梯度。這會導致梯度變?yōu)?或太大。因此,當我們增加層數(shù)時,訓練和測試錯誤率也會增加。
在上圖中,我們可以觀察到56層的CNN在訓練和測試數(shù)據(jù)集上的錯誤率都高于20層的CNN架構(gòu)。通過對錯誤率的進一步分析,得出錯誤率是由梯度消失/爆炸引起的結(jié)論。
ResNet于2015年由微軟研究院的研究人員提出,引入了一種名為殘余網(wǎng)絡的新架構(gòu)。
Residual Networks ResNet– Deep Learning
- 1、殘差網(wǎng)路
- 2、網(wǎng)絡架構(gòu)
- 3、代碼運行
- 4、結(jié)果與總結(jié)
1、殘差網(wǎng)路
為了解決梯度消失/爆炸的問題,該架構(gòu)引入了殘差塊的概念。在這個網(wǎng)絡中,我們使用一種稱為跳過連接的技術(shù)。跳過連接通過跳過中間的一些層將一個層的激活連接到其他層。這就形成了一個殘塊。通過將這些剩余的塊堆疊在一起形成Resnets。
這個網(wǎng)絡背后的方法不是層學習底層映射,而是允許網(wǎng)絡擬合殘差映射。所以我們不用H(x)初始映射,讓網(wǎng)絡適合。
F(x) := H(x) - x which gives H(x) := F(x) + x.
添加這種類型的跳過連接的優(yōu)點是,如果任何層損害了體系結(jié)構(gòu)的性能,那么將通過正則化跳過它。因此,這可以訓練一個非常深的神經(jīng)網(wǎng)絡,而不會出現(xiàn)梯度消失/爆炸引起的問題。本文作者在CIFAR-10數(shù)據(jù)集的100-1000層上進行了實驗。
還有一種類似的方法叫做“高速公路網(wǎng)”,這些網(wǎng)絡也采用跳線連接。與LSTM類似,這些跳過連接也使用參數(shù)門。這些門決定有多少信息通過跳過連接。然而,這種體系結(jié)構(gòu)并沒有提供比ResNet體系結(jié)構(gòu)更好的準確性。
2、網(wǎng)絡架構(gòu)
該網(wǎng)絡采用受VGG-19啟發(fā)的34層平面網(wǎng)絡架構(gòu),并增加了快捷連接。然后,這些快捷連接將架構(gòu)轉(zhuǎn)換為剩余網(wǎng)絡。
3、代碼運行
使用Tensorflow和Keras API,我們可以從頭開始設計ResNet架構(gòu)(包括殘塊)。下面是不同的ResNet架構(gòu)的實現(xiàn)。對于這個實現(xiàn),我們使用CIFAR-10數(shù)據(jù)集。該數(shù)據(jù)集包含10個不同類別(飛機、汽車、鳥、貓、鹿、狗、青蛙、馬、船和卡車)等的60,000張32×32彩色圖像。該數(shù)據(jù)集可以通過keras進行評估。datasets API函數(shù)。
第1步:首先,我們導入keras模塊及其api。這些api有助于構(gòu)建ResNet模型的體系結(jié)構(gòu)。
代碼:導入庫
# Import Keras modules and its important APIs
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras import backend as K
from keras.models import Model
from keras.datasets import cifar10
import numpy as np
import os
第2步:現(xiàn)在,我們設置ResNet架構(gòu)所需的不同超參數(shù)。我們還對數(shù)據(jù)集做了一些預處理,為訓練做準備。
代碼:設置訓練超參數(shù)
# Setting Training Hyperparameters
batch_size = 32 # original ResNet paper uses batch_size = 128 for training
epochs = 200
data_augmentation = True
num_classes = 10
# Data Preprocessing
subtract_pixel_mean = True
n = 3
# Select ResNet Version
version = 1
# Computed depth of
if version == 1:
depth = n * 6 + 2
elif version == 2:
depth = n * 9 + 2
# Model name, depth and version
model_type = 'ResNet % dv % d' % (depth, version)
# Load the CIFAR-10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Input image dimensions.
input_shape = x_train.shape[1:]
# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
# If subtract pixel mean is enabled
if subtract_pixel_mean:
x_train_mean = np.mean(x_train, axis = 0)
x_train -= x_train_mean
x_test -= x_train_mean
# Print Training and Test Samples
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print('y_train shape:', y_train.shape)
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
第3步:在這一步中,我們根據(jù)epoch的個數(shù)來設置學習率。隨著迭代次數(shù)的增加,學習率必須降低以保證更好的學習。
代碼:設置不同epoch數(shù)的LR
# Setting LR for different number of Epochs
def lr_schedule(epoch):
lr = 1e-3
if epoch > 180:
lr *= 0.5e-3
elif epoch > 160:
lr *= 1e-3
elif epoch > 120:
lr *= 1e-2
elif epoch > 80:
lr *= 1e-1
print('Learning rate: ', lr)
return lr
第4步:定義基本的ResNet構(gòu)建塊,可以用來定義ResNet V1和V2架構(gòu)。
代碼:基本的ResNet構(gòu)建塊
# Basic ResNet Building Block
def resnet_layer(inputs,
num_filters=16,
kernel_size=3,
strides=1,
activation='relu',
batch_normalization=True,
conv=Conv2D(num_filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=l2(1e-4))
x=inputs
if conv_first:
x = conv(x)
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
else:
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
x = conv(x)
return x
第5步:定義基于我們上面定義的ResNet構(gòu)建塊的ResNet V1架構(gòu):
代碼:ResNet V1架構(gòu)
def resnet_v1(input_shape, depth, num_classes=10):
if (depth - 2) % 6 != 0:
raise ValueError('depth should be 6n + 2 (eg 20, 32, 44 in [a])')
# Start model definition.
num_filters = 16
num_res_blocks = int((depth - 2) / 6)
inputs = Input(shape=input_shape)
x = resnet_layer(inputs=inputs)
# Instantiate the stack of residual units
for stack in range(3):
for res_block in range(num_res_blocks):
strides = 1
if stack & gt
0 and res_block == 0: # first layer but not first stack
strides = 2 # downsample
y = resnet_layer(inputs=x,
num_filters=num_filters,
strides=strides)
y = resnet_layer(inputs=y,
num_filters=num_filters,
activation=None)
if stack & gt
0 and res_block == 0: # first layer but not first stack
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = keras.layers.add([x, y])
x = Activation('relu')(x)
num_filters *= 2
# Add classifier on top.
# v1 does not use BN after last shortcut connection-ReLU
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
第6步:定義基于我們上面定義的ResNet構(gòu)建塊的ResNet V2架構(gòu):
代碼:ResNet V2架構(gòu)
# ResNet V2 architecture
def resnet_v2(input_shape, depth, num_classes=10):
if (depth - 2) % 9 != 0:
raise ValueError('depth should be 9n + 2 (eg 56 or 110 in [b])')
# Start model definition.
num_filters_in = 16
num_res_blocks = int((depth - 2) / 9)
inputs = Input(shape=input_shape)
# v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
x = resnet_layer(inputs=inputs,
num_filters=num_filters_in,
conv_first=True)
# Instantiate the stack of residual units
for stage in range(3):
for res_block in range(num_res_blocks):
activation = 'relu'
batch_normalization = True
strides = 1
if stage == 0:
num_filters_out = num_filters_in * 4
if res_block == 0: # first layer and first stage
activation = None
batch_normalization = False
else:
num_filters_out = num_filters_in * 2
if res_block == 0: # first layer but not first stage
strides = 2 # downsample
# bottleneck residual unit
y = resnet_layer(inputs=x,
num_filters=num_filters_in,
kernel_size=1,
strides=strides,
activation=activation,
batch_normalization=batch_normalization,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_in,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_out,
kernel_size=1,
conv_first=False)
if res_block == 0:
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters_out,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = keras.layers.add([x, y])
num_filters_in = num_filters_out
# Add classifier on top.
# v2 has BN-ReLU before Pooling
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
第7步:下面的代碼用于訓練和測試我們上面定義的ResNet v1和v2架構(gòu):
代碼:Main函數(shù)
# Main function
if version == 2:
model = resnet_v2(input_shape = input_shape, depth = depth)
else:
model = resnet_v1(input_shape = input_shape, depth = depth)
model.compile(loss ='categorical_crossentropy',
optimizer = Adam(learning_rate = lr_schedule(0)),
metrics =['accuracy'])
model.summary()
print(model_type)
# Prepare model saving directory.
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'cifar10_% s_model.{epoch:03d}.h5' % model_type
if not os.path.isdir(save_dir):
os.makedirs(save_dir)
filepath = os.path.join(save_dir, model_name)
# Prepare callbacks for model saving and for learning rate adjustment.
checkpoint = ModelCheckpoint(filepath = filepath,
monitor ='val_acc',
verbose = 1,
save_best_only = True)
lr_scheduler = LearningRateScheduler(lr_schedule)
lr_reducer = ReduceLROnPlateau(factor = np.sqrt(0.1),
cooldown = 0,
patience = 5,
min_lr = 0.5e-6)
callbacks = [checkpoint, lr_reducer, lr_scheduler]
# Run training, with or without data augmentation.
if not data_augmentation:
print('Not using data augmentation.')
model.fit(x_train, y_train,
batch_size = batch_size,
epochs = epochs,
validation_data =(x_test, y_test),
shuffle = True,
callbacks = callbacks)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
datagen = ImageDataGenerator(
# set input mean to 0 over the dataset
featurewise_center = False,
# set each sample mean to 0
samplewise_center = False,
# divide inputs by std of dataset
featurewise_std_normalization = False,
# divide each input by its std
samplewise_std_normalization = False,
# apply ZCA whitening
zca_whitening = False,
# epsilon for ZCA whitening
zca_epsilon = 1e-06,
# randomly rotate images in the range (deg 0 to 180)
rotation_range = 0,
# randomly shift images horizontally
width_shift_range = 0.1,
# randomly shift images vertically
height_shift_range = 0.1,
# set range for random shear
shear_range = 0.,
# set range for random zoom
zoom_range = 0.,
# set range for random channel shifts
channel_shift_range = 0.,
# set mode for filling points outside the input boundaries
fill_mode ='nearest',
# value used for fill_mode = "constant"
cval = 0.,
# randomly flip images
horizontal_flip = True,
# randomly flip images
vertical_flip = False,
# set rescaling factor (applied before any other transformation)
rescale = None,
# set function that will be applied on each input
preprocessing_function = None,
# image data format, either "channels_first" or "channels_last"
data_format = None,
# fraction of images reserved for validation (strictly between 0 and 1)
validation_split = 0.0)
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(x_train)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train, batch_size = batch_size),
validation_data =(x_test, y_test),
epochs = epochs, verbose = 1, workers = 4,
callbacks = callbacks)
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose = 1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
4、結(jié)果與總結(jié)
在ImageNet數(shù)據(jù)集上,作者使用了152層的ResNet,其深度是VGG19的8倍,但參數(shù)仍然較少。在ImageNet測試集上,這些ResNets的集合產(chǎn)生的錯誤率僅為3.7%,這一結(jié)果贏得了ILSVRC 2015競賽。在COCO對象檢測數(shù)據(jù)集上,由于它的深度表示,也產(chǎn)生了28%的相對改進。
- 上面的結(jié)果表明,快捷連接將能夠解決增加層數(shù)所帶來的問題,因為當我們將層數(shù)從18層增加到34層時,ImageNet驗證集上的錯誤率也會與普通網(wǎng)絡不同而降低。
- 下面是ImageNet測試集的結(jié)果。ResNet的前5名錯誤率為3.57%,是最低的,因此ResNet架構(gòu)在2015年ImageNet分類挑戰(zhàn)中排名第一。
博客主頁:https://blog.csdn.net/weixin_51141489,需要源碼或相關資料實物的友友請關注、點贊,私信吧!