Trying Pytorch Ignite

Introduction

The day before yesterday, I ran into a problem. The problem was, my training procedure on Google Colab was pretty slow and barely any GPU ram was occupied. But in my MacBook Pro, it was pretty fast, and it was working perfectly with MPS. At that time, I was using Pytorch Lightning, which has made my life pretty easy. In Pytorch Lightning, we have to create instances of LightningModule for training the model and for loading data, we can optionally use LightningDataModule. I was using both, and it was great. But when I ran into the problem in Google Colab, it seemed like the debugging is a little bit hard for me. I prefer my code to be completely modular and have the ability to be separated into different parts with ease. So yesterday, I gave Pytorch ignite a shot. I have started with the quick start, and it was super easy to change my training procedure from Pytorch Lightning to Pytorch Ignite. To be honest, I have a really great feeling about Pytorch ignite. I strongly recommend that you give it a shot. Here is the link to the quick start again: quick start

My setup

At first, I have defined my trainer and evaluators like below:

# Define trainer
trainer = create_supervised_trainer(
    model=model,
    optimizer=optimizer,
    loss_fn=criterion,
    device=device,
)

# Define evaluators
val_metrics = {
    "accuracy": Accuracy(),
    "loss": Loss(criterion),
}

train_evaluator = create_supervised_evaluator(
    model=model,
    metrics=val_metrics,
    device=device,
)
val_evaluator = create_supervised_evaluator(
    model=model,
    metrics=val_metrics,
    device=device,
)

Then, because I like to see the progress, I attached ProgressBar to each of them.

# Add progress bar to trainer
pbar = ProgressBar()
pbar.attach(trainer)

# Add progress bar to evaluators
pbar_1 = ProgressBar()
pbar_1.attach(train_evaluator)

pbar_2 = ProgressBar()
pbar_2.attach(val_evaluator)

To run the evaluators after each epoch ends, I have used the events below and add them to the trainer handlers.

# Add logging of training results to trainer
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(engine):
    train_evaluator.run(train_loader)
    metrics = train_evaluator.state.metrics
    print(
        f"Training Results - Epoch[{engine.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
    )


# Add logging of validating results to trainer
@trainer.on(Events.EPOCH_COMPLETED)
def log_validation_results(engine):
    val_evaluator.run(val_loader)
    metrics = val_evaluator.state.metrics
    print(
        f"Validation Results - Epoch[{engine.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
    )

For the score function, I added the code below, which uses accuracy:

def score_function(engine):
    return engine.state.metrics["accuracy"]

To always have the best model saved, I added a ModelCheckPoint to val_evaluator.

# Add model checkpointing to val evaluator
model_checkpoint = ModelCheckpoint(
    f"checkpoints/{cfg.model.name}_{run_counter}",
    n_saved=1,
    filename_prefix="best",
    score_function=score_function,
    score_name="accuracy",
    global_step_transform=global_step_from_engine(trainer),
)

val_evaluator.add_event_handler(
    Events.COMPLETED,
    model_checkpoint,
    {"model": model},
)

Then, for early stopping, I added the code below.

# Add Early stopping
early_stopping = EarlyStopping(
    patience=cfg.patience,
    score_function=score_function,
    trainer=trainer,
    min_delta=cfg.min_delta,
)

val_evaluator.add_event_handler(
    Events.COMPLETED,
    early_stopping,
)

And, to have tensorboard logging, I added the code below:

# Add tensorboard
tb_logger = TensorboardLogger(log_dir=f"tb-logger/{cfg.model.name}_{run_counter}")

tb_logger.attach_output_handler(
    trainer,
    event_name=Events.ITERATION_COMPLETED(every=100),
    tag="training",
    output_transform=lambda loss: {"batch_loss": loss},
)

for tag, evaluator in [
    ("training", train_evaluator),
    ("validation", val_evaluator),
]:
    tb_logger.attach_output_handler(
        evaluator,
        event_name=Events.EPOCH_COMPLETED,
        tag=tag,
        metric_names="all",
        global_step_transform=global_step_from_engine(trainer),
    )

And the only other thing that I should do, is to run my trainer engine by the code below.

trainer.run(train_loader, max_epochs=cfg.max_epochs)

And it works, perfectly.

cfg in code is the config that I am using. I use hydra to control my configs.

Final thoughts

PyTorch Ignite is an extremely great package to train our models. It is modular which I pretty much like and is closer to the way that I think. I strongly recommend that you check it out. Here is the link to the official site: https://docs.pytorch.org/ignite/index.html

© 2025 LiterallyTheOne