What is Batch Size?
Batch size is a machine learning hyperparameter that determines how many training instances a model processes before updating its internal weights once. It regulates the fundamental rhythm of the learning process, managing how frequently the model corrects itself and how much data it uses to make each adjustment.
In actual machine learning workflows, batch sizes typically range between 16 and 512, depending on hardware capabilities. Smaller batch sizes typically result in better generalization, and bigger batches help accelerate GPU training. In deep learning, batch size can influence model accuracy by 5-10% or more, making it an important parameter in optimization.
The most cited experimental finding comes from the 2017 Northwestern University paper by Keskar et al., which showed consistently across five different neural network architectures and datasets that large-batch training produced models with a generalization gap of 1 to 5 percentage points compared to small-batch equivalents. This finding reshaped how practitioners thought about batch size selection and prompted a wave of follow-up research.
How Does Batch Size Work?
Batch size determines how data flows through the training process:
- Data Split: The full dataset is divided into smaller groups (batches)
- Forward Pass: Each batch is passed through the model to generate predictions
- Loss Calculation: The error is calculated for that batch
- Backward Pass: Gradients are computed using backpropagation
- Weight Update: Model weights are updated after each batch
For example, if a dataset has 1,000 samples and the batch size is 100, the model will update weights 10 times per epoch.

Why is Batch Size Important?
Batch size plays a crucial role in model training efficiency and accuracy.
Key benefits:
- Impacts training speed and computational efficiency
- Affects memory usage (larger batches require more RAM/GPU)
- Influences model generalization and convergence
- Helps balance stability and noise in gradient updates
Types of Batch Size
Batch size is not binary; rather, it spans a spectrum, with each point having its name, set of tradeoffs, and use cases.
- Small Batch Size (e.g., 1–32): Provides noisy but frequent updates, often improving generalization
- Medium Batch Size (e.g., 32–256): Balanced approach, commonly used in practice
- Large Batch Size (e.g., 256+): Faster training but may require more memory and risk poorer generalization