Pydantic And Hydra

Pydantic And Hydra#

This post is heavily inspired by Pydra - Pydantic and Hydra for configuration management of model training experiments which talks about combining Pydantic and Hydra for configuration management.

Pydantic #

Pydantic is all you need and it solves all the aforementioned problems. We still leverage yaml based configuration with easy command line overrides, but this time, we also pass the compiled configuration from hydra to pydantic for validation and serialization during runtime.

Pydantic Schema #

from __future__ import annotations

from pathlib import Path
from typing import Any, Dict, List, Type

from pydantic import BaseModel, Field, field_validator
from typing_extensions import Annotated


class TransformConfig(BaseModel):
    image_size: int
    mean: List[float]
    std: List[float]


class ModelConfig(BaseModel):
    model_name: str
    pretrained: bool
    in_chans: Annotated[int, Field(strict=True, ge=1)]  # in_channels must be greater than or equal to 1
    num_classes: Annotated[int, Field(strict=True, ge=1)]  # num_classes must be greater than or equal to 1
    global_pool: str

    @field_validator("global_pool")
    @classmethod
    def validate_global_pool(cls: Type[ModelConfig], global_pool: str) -> str:
        """Validates global_pool is in ["avg", "max"]."""
        if global_pool not in ["avg", "max"]:
            raise ValueError("global_pool must be avg or max")
        return global_pool

    class Config:
        protected_namespaces = ()


class StoresConfig(BaseModel):
    project_name: str
    unique_id: str
    logs_dir: Path
    model_artifacts_dir: Path

    class Config:
        protected_namespaces = ()


class TrainConfig(BaseModel):
    device: str
    project_name: str
    debug: bool
    seed: int
    num_epochs: int
    num_classes: int = 3


class OptimizerConfig(BaseModel):
    optimizer_name: str
    optimizer_params: Dict[str, Any]


class DataConfig(BaseModel):
    data_dir: Path
    batch_size: int
    num_workers: int
    shuffle: bool = True


class Config(BaseModel):
    model: ModelConfig
    transform: TransformConfig
    datamodule: DataConfig
    optimizer: OptimizerConfig
    stores: StoresConfig
    train: TrainConfig

    @classmethod
    def from_dict(cls, config_dict: Dict[str, Any]) -> Config:
        """Creates Config object from a dictionary."""
        return cls(**config_dict)

Pros #

Able to serialize and deserialize objects to and from DICT, JSON, YAML, and other formats. For example, the following code will serialize a Dict object to Pydantics’ Model object. It can also convert back to Dict object.

class ModelConfig(BaseModel):
    model_name: str
    pretrained: bool
    in_chans: Annotated[int, Field(strict=True, ge=1)]  # in_channels must be greater than or equal to 1
    num_classes: Annotated[int, Field(strict=True, ge=1)]  # num_classes must be greater than or equal to 1
    global_pool: str

    @field_validator("global_pool")
    @classmethod
    def validate_global_pool(cls: Type[ModelConfig], global_pool: str) -> str:
        """Validates global_pool is in ["avg", "max"]."""
        if global_pool not in ["avg", "max"]:
            raise ValueError("global_pool must be avg or max")
        return global_pool

    class Config:
        protected_namespaces = ()

model_config_dict = {
    "model_name": "resnet18",
    "pretrained": True,
    "in_chans": 3,
    "num_classes": 1000,
    "global_pool": "avg",
}
model = Model(**model_config_dict)
assert model.model_dump() == model_config_dict

Validation of data types and values. For a large and complex configuration, you either validate the sanity of config at the config level, or check at the code level (i.e. sprinkled throughout your codebase). - Constrained types
```
model = Model(
    model_name="resnet18",
    pretrained=True,
    in_chans=0,
    num_classes=2,
    global_pool="avg",
)
```
This will raise an error because in_chans is less than 1. Pydantic offers a wide range of constrained types out of the box for you to use. If that is not enough, then the custom validators can be used to validate the data with custom needs.
Custom Validators
```
model = Model(
    model_name="resnet18",
    pretrained=True,
    in_chans=3,
    num_classes=2,
    global_pool="average",
)
```
This will raise an error because global_pool is not avg or max. We implemented this custom checks in the validate_global_pool method where we decorated it with @field_validator("global_pool").

There are many other good things like in-built type checking, and coercion. In the next section we see how we combine Hydra and Pydantic together.

Pydra #

The provided code shows a way to merge Hydra and Pydantic in a machine learning training pipeline, using Hydra for hierarchical configuration and command-line interface, and Pydantic for data validation and type checking.

A Hydra-based application entry point is created using the @hydra.main() decorator. This will use the Hydra library to manage configuration files and command-line arguments. Hydra’s config_path and config_name are specified to tell Hydra where to find the configuration files.

This hydra_to_pydantic will take in a hydra’s DictConfig and convert it to a pydantic’s Config object.

import logging
from typing import Any, Dict

import hydra
from hydra.core.hydra_config import HydraConfig
from omegaconf import DictConfig, OmegaConf
from rich.pretty import pprint

from omnixamples.software_engineering.config_management.pydantic.config import Config
from omnixamples.software_engineering.config_management.train import train

LOGGER = logging.getLogger(__name__)


def hydra_to_pydantic(config: DictConfig) -> Config:
    """Converts Hydra config to Pydantic config."""
    # use to_container to resolve
    config_dict: Dict[str, Any] = OmegaConf.to_object(config)  # type: ignore[assignment]
    return Config(**config_dict)


@hydra.main(version_base=None, config_path="../hydra/configs", config_name="config")
def run(config: DictConfig) -> None:
    """Run the main function."""
    LOGGER.info("Type of config is: %s", type(config))
    LOGGER.info("Merged Yaml:\n%s", OmegaConf.to_yaml(config))
    LOGGER.info(HydraConfig.get().job.name)

    config_pydantic = hydra_to_pydantic(config)
    pprint(config_pydantic)
    train(config_pydantic)


if __name__ == "__main__":
    run()

Pydantic And Hydra

Contents

Pydantic And Hydra#

Hydra #

YAML Driven Configuration #

Pros #

Structured Config #

Cons #

Instantiating #

Composition Order #

Pydantic #

Pydantic Schema #

Pros #

Pydra #

References and Further Readings #

Pydantic And Hydra

Contents

Pydantic And Hydra#

Connect with me!

Share this page!