Execution Algorithms

LBANN’s drivers support several different execution algorithms. In particular, LBANN supports a basic (batched) inference algorithm as well as a variety of algorithms for training neural networks. The execution algorithms are implemented in C++, and their parameters (or “hyperparameters”) are exposed to users via the Python Front-End (PFE).

Batched Inference

This algorithm is not yet documented.

Training Algorithms

A training algorithm (C++: lbann::training_algorithm, Python: lbann.TrainingAlgorithm) is the method for optimizing a model’s trainable parameters (weights). At the C++ level, a training algorithm takes as input an initial model description (future: a collection of model descriptions), a data source, and some stopping criteria. Once a training algorithm has reached its prescribed stopping criteria, it is defined to be “trained” and the updated model description (future: collection of model descriptions) is returned to the user.

At the PFE level, the model (future: models) and data source are not yet properly associated with the training algorithm; fixing this issue is work in progress. Instead, the training algorithm is associated with the trainer object. The model (future: models) and data source components are managed separately (C++: lbann::model and lbann::data_coordinator; Python: lbann.Model and lbann.DataReader, respectively) and properly associated with the trainer’s training algorithm object in the C++ runtime.

An example description of a training algorithm is shown below.

SGD = lbann.BatchedIterativeOptimizer   # Just an alias
trainer = lbann.Trainer(training_algo=SGD("my sgd", epoch_count=20))

The first positional argument to every training algorithm is a user-defined name. This will be useful for identifying this algorithm in log messages, especially in the case of complex composite algorithms that might use multiple or nested instances of the same algorithm. Remaining (keyword) arguments are generally algorithm-dependent and users should consult the help() messages or the API documentation for the specific algorithms they wish to use.

Python Front-end API Documentation

lbann.TrainingAlgorithm interface

class TrainingAlgorithm

The lbann.TrainingAlgorithm is the base class of all training algorithms used in the Python Front-end.

__init__(name: str)

Construct a training algorithm.

Parameters

name (string) – A user-defined name to identify this object in logs.

export_proto()

Get a protobuf representation of this object.

Return type

AlgoProto.TrainingAlgorithm()

do_export_proto()

Get a protobuf representation of this object.

Important

Must be implemented in derived classes.

Raises

NotImplementedError