LBANN  0.103.0
LivermoreBigArtificialNeuralNetworkToolkit
lbann::TrainingAlgorithm Class Referenceabstract

Base class for LBANN training_algorithms. More...

#include <training_algorithm.hpp>

Inheritance diagram for lbann::TrainingAlgorithm:
[legend]

Public Member Functions

Lifecycle Management
 TrainingAlgorithm (std::string name)
 Constructor. More...
 
virtual ~TrainingAlgorithm ()=default
 
Queries
virtual std::string get_type () const =0
 A string identifying the type of the object. More...
 
std::string const & get_name () const noexcept
 A user-defined string identifying the algorithm object. More...
 
Execution interfaces
virtual void apply (ExecutionContext &context, model &model, data_coordinator &dc, execution_mode mode)=0
 Apply the algorithm to the given model. More...
 
void apply (model &model, data_coordinator &dc)
 Apply the algorithm to the given model. More...
 
void setup_models (std::vector< observer_ptr< model >> const &models, size_t max_mini_batch_size, const std::vector< El::Grid *> &grids)
 Setup a collection of models. More...
 
std::unique_ptr< ExecutionContextget_new_execution_context () const
 Get a default-initialized execution context that fits this training algorithm. More...
 

Protected Member Functions

virtual ExecutionContextdo_get_new_execution_context () const =0
 Covariant return-friendly implementation of get_new_exection_context(). More...
 
In-hierarchy Lifecycle Management
 TrainingAlgorithm (const TrainingAlgorithm &other)=delete
 
TrainingAlgorithmoperator= (const TrainingAlgorithm &other)=delete
 
 TrainingAlgorithm (TrainingAlgorithm &&other)=default
 
TrainingAlgorithmoperator= (TrainingAlgorithm &&other)=default
 

Private Attributes

std::string m_name
 The user-defined name of the algorithm. More...
 

Detailed Description

Base class for LBANN training_algorithms.

A "training algorithm" is defined as a method for modifying one or more models, where "model" is defined in the LBANN sense (that is, a model object typically consists of a machine learning model plus a "sub-DAG" for computing a training-specific objective function). At this time, we only have support for training a single model unit, though some ad hoc methods exist for training multi-model scenarios such as GANs.

Logically, the inputs to a training algorithm are a model architecture (encapsulated in a model object) and a data source, and the output is a trained model (or, a set of parameters that define the action of the model). Here, "trained" means that the training algorithm has evolved the parameters until user-specified stopping criteria have been met; it does necessarily imply that any underlying optimization method has converged (or even exists) or that such a convergence is even well-defined.

A key capability is that training algorithms should be composable. This allows metaheuristic algorithms to simply be implemented as training algorithms constructed from one or more "inner" training algorithms.

Todo:
One component that we need to address yet is the issue of logically encapsulating multiple models, as either inputs or outputs to a training algorithm. Specifically, consider the LTFB "meta-learning" method. Rather than producing the single best model, a user might be interested in the K best models. In this case, tournament-based evolution will begin with a single model (per trainer) but could output several models. Similarly, one might begin with an arbitrary collection of models that are evolved until a single best model emerges. This draws in other issues to be addressed elsewhere in LBANN such as "How do we export models?" Currently, this is done by writing to files on disk via callbacks. However, one might imagine "in-core" interaction between training and inference, perhaps in an online learning scenario, in which repeatedly writing to and reading from disk is not sufficient.

Definition at line 86 of file training_algorithm.hpp.

Constructor & Destructor Documentation

◆ TrainingAlgorithm() [1/3]

lbann::TrainingAlgorithm::TrainingAlgorithm ( std::string  name)

Constructor.

Parameters
[in]nameThe user-defined name of the algorithm.
Here is the caller graph for this function:

◆ ~TrainingAlgorithm()

virtual lbann::TrainingAlgorithm::~TrainingAlgorithm ( )
virtualdefault

◆ TrainingAlgorithm() [2/3]

lbann::TrainingAlgorithm::TrainingAlgorithm ( const TrainingAlgorithm other)
protecteddelete

◆ TrainingAlgorithm() [3/3]

lbann::TrainingAlgorithm::TrainingAlgorithm ( TrainingAlgorithm &&  other)
protecteddefault

Member Function Documentation

◆ apply() [1/2]

virtual void lbann::TrainingAlgorithm::apply ( ExecutionContext context,
model model,
data_coordinator dc,
execution_mode  mode 
)
pure virtual

Apply the algorithm to the given model.

Parameters
[in,out]contextThe persistent state tracked by the model.
[in,out]modelA model architecture with trainable weights. On exit, the weights will have been updated according to the algorithm.
[in,out]dcThe data source for this round of training.
[in]modeIMO, superfluous. Will be removed.

Implemented in lbann::KFAC, lbann::LTFB, and lbann::SGDTrainingAlgorithm.

Here is the caller graph for this function:

◆ apply() [2/2]

void lbann::TrainingAlgorithm::apply ( model model,
data_coordinator dc 
)
inline

Apply the algorithm to the given model.

Parameters
[in,out]modelA model architecture with trainable weights. On exit, the weights will have been updated according to the algorithm.
[in,out]dcThe data source for this round of training.

Definition at line 129 of file training_algorithm.hpp.

Here is the call graph for this function:

◆ do_get_new_execution_context()

virtual ExecutionContext* lbann::TrainingAlgorithm::do_get_new_execution_context ( ) const
protectedpure virtual

Covariant return-friendly implementation of get_new_exection_context().

Implemented in lbann::KFAC, lbann::SGDTrainingAlgorithm, and lbann::LTFB.

Here is the caller graph for this function:

◆ get_name()

std::string const& lbann::TrainingAlgorithm::get_name ( ) const
noexcept

A user-defined string identifying the algorithm object.

◆ get_new_execution_context()

std::unique_ptr<ExecutionContext> lbann::TrainingAlgorithm::get_new_execution_context ( ) const
inline

Get a default-initialized execution context that fits this training algorithm.

This method gets a clean, default-initialized execution context suitable for the training algorithm being used. The concrete type is guaranteed to match the concrete type required by the training algorithm.

Note
This method participates in the "covariant-smart-pointer-return" pattern implemented by the Cloneable interface, for example. See do_get_new_execution_context().

Definition at line 157 of file training_algorithm.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ get_type()

virtual std::string lbann::TrainingAlgorithm::get_type ( ) const
pure virtual

A string identifying the type of the object.

Implemented in lbann::KFAC, lbann::LTFB, and lbann::SGDTrainingAlgorithm.

◆ operator=() [1/2]

TrainingAlgorithm& lbann::TrainingAlgorithm::operator= ( const TrainingAlgorithm other)
protecteddelete
Here is the caller graph for this function:

◆ operator=() [2/2]

TrainingAlgorithm& lbann::TrainingAlgorithm::operator= ( TrainingAlgorithm &&  other)
protecteddefault

◆ setup_models()

void lbann::TrainingAlgorithm::setup_models ( std::vector< observer_ptr< model >> const &  models,
size_t  max_mini_batch_size,
const std::vector< El::Grid *> &  grids 
)

Setup a collection of models.

Parameters
[in]modelsThe collection of models to be setup.
[in]max_mini_batch_sizeThe largest minibatch size accepted by any model.
[in]gridsProcess grids for distributed tensors.
Here is the caller graph for this function:

Member Data Documentation

◆ m_name

std::string lbann::TrainingAlgorithm::m_name
private

The user-defined name of the algorithm.

Definition at line 179 of file training_algorithm.hpp.


The documentation for this class was generated from the following file: