What Do We Need For ML?

Data, model, loss, and optimizer. Four ingredients, one optimization problem.

Stochastic Methods in Machine Learning: AGH

Data

Raw Material

Learning starts from examples $(x, y)$. No data means no signal.

Core Workflow

  • Collect and clean samples
  • Train/validation/test split
  • Feature engineering / normalization

Example Dataset

ID Feature A Feature B Feature C Label
001 0.72 12 3.6 1
002 0.11 5 1.8 0
003 0.64 9 2.9 1
004 0.21 4 1.6 0
005 0.83 11 3.3 1
... ... ... ... ...

Model

Function Family

The model defines a parameterized mapping from input to prediction.

$$\hat y = f_\theta(x)$$

Example

Neural networks stack linear layers and nonlinear activations to represent complex patterns.

input x hidden output parameters: W, b

Loss Function

Objective

Loss converts prediction quality into a scalar target that gradients can optimize.

$$\text{MSE}=\frac{1}{N}\sum_{i=1}^N(y_i-\hat y_i)^2$$
$$\text{MAE}=\frac{1}{N}\sum_{i=1}^N|y_i-\hat y_i|$$
$$\text{Cross-Entropy}=-\frac{1}{N}\sum_{i=1}^N y_i\log(\hat y_i)$$
$$\text{Hinge}=\frac{1}{N}\sum_{i=1}^N\max(0,1-y_if_\theta(x_i))$$

Training Method (Optimizer)

Update Rule

Optimizer controls how parameters move over the loss landscape.

$$\theta_{t+1}=\theta_t-\eta\nabla_\theta L(\theta_t)$$

Intuition

SGD takes noisy gradient steps. Adam adapts step sizes with moment estimates.

Same loss surface, different trajectories
SGD Adam-like

Training = Optimization

Key Idea

Training means finding parameters that minimize loss and maximize fit/model quality.

ML Training

Data + Model + Loss + Optimizer

=

Optimization Problem

$$\theta^*=\arg\min_\theta L(\theta;\mathcal{D})$$

Equivalent Views

  • Minimize prediction error
  • Maximize quality/fit under a chosen metric
  • Search parameter space for the best generalizing solution
1 / 6