Data, model, loss, and optimizer. Four ingredients, one optimization problem.
Stochastic Methods in Machine Learning: AGH
Learning starts from examples $(x, y)$. No data means no signal.
| ID | Feature A | Feature B | Feature C | Label |
|---|---|---|---|---|
| 001 | 0.72 | 12 | 3.6 | 1 |
| 002 | 0.11 | 5 | 1.8 | 0 |
| 003 | 0.64 | 9 | 2.9 | 1 |
| 004 | 0.21 | 4 | 1.6 | 0 |
| 005 | 0.83 | 11 | 3.3 | 1 |
| ... | ... | ... | ... | ... |
The model defines a parameterized mapping from input to prediction.
Neural networks stack linear layers and nonlinear activations to represent complex patterns.
Loss converts prediction quality into a scalar target that gradients can optimize.
Optimizer controls how parameters move over the loss landscape.
SGD takes noisy gradient steps. Adam adapts step sizes with moment estimates.
Training means finding parameters that minimize loss and maximize fit/model quality.
Data + Model + Loss + Optimizer
$$\theta^*=\arg\min_\theta L(\theta;\mathcal{D})$$