# What is small batch learning

In implementing the linear perceptron algorithm, we used a gradient descent algorithm to minimize the cost function. When we update the weights, we use all of the training data. This gradient descent algorithm is called the batch gradient descent algorithm. When the training data set reaches a scale of several million or even hundreds of millions of data, the batch gradient descent algorithm is somewhat powerless. Every time the weight is updated, all records are used for evaluation, resulting in wasted costs. Therefore, at this point we can use a random gradient descent or a small batch gradient descent (100 randomly selected data).

The stochastic gradient descent is also referred to as iterative gradient descent or online gradient descent. The main difference between the stochastic gradient descent algorithm and the batch gradient descent algorithm is that the weight update strategy is different. The weight update process consists of the data from the entire training set, and the batch gradient descent weight update applies to the entire training set

The weight update of the stochastic gradient descent applies to individual training data

In order to get more accurate results through stochastic gradient descent, it is very important to train the training data in a random manner. Therefore, each iteration must interrupt the training set to prevent entering the loop. Using the stochastic gradient descent algorithm, we can adjust the learning rate η of the stochastic gradient descent algorithm according to the number of iterations, e.g. B. c1 / (η + c2), where c1 and c2 are constants. If you have to be careful, the optimal solution obtained by the stochastic gradient descent algorithm is not necessarily the global optimal solution, but tends to be the global optimal solution. By using an adaptive learning rate, the global optimal solution can be further approached.

Python implements a stochastic gradient descent algorithm

### Three, stochastic gradient descent algorithms implement online learning

We can also apply the stochastic gradient descent algorithm to an online learning system. The online learning system trains the model in real time when new data is generated without resetting the weight of the model. This algorithm is typically used in massive data and web systems. This algorithm can also reduce the cost of storing data. After the training is over, we can save the model and then discard the data.

Python implementation code

### Four, small batch learning

The small batch learning algorithm is located between the random gradient descent algorithm and the batch gradient descent algorithm. Each time the small batch learning algorithm is trained, 50 or 100 pieces of data are randomly selected from the training set for training. Usually this method is used to compare Many. Compared to the batch gradient descent, its weights are updated more frequently and the speed of convergence is faster. Compared to stochastic gradient descent, we can use vector operations to replace the for loop and improve the algorithm's performance.