Authors: Jiaming Guo, Guangxiang Bin, Liangrui Peng
Description: A deep CNN similar to GoogLeNet is used with a smaller number of layers of inception structures for computation efficiency. For image pre-processing, the shorter edge is resized to 224 while preserving the aspect ratio of the original image. Average pooling is used to transform the spatial dimension of the feature map into a fixed size before the final fully connected layer. In the training process, the batch size for each iteration is set to 1, the mean of gradients for a preset size of iterations (e.g. 32) are calculated and used to update the network weights.