为什么要用 Batch
Momentum
Adaptive Learning Rate
Optimization for deep learning