LBANN  0.103.0
LivermoreBigArtificialNeuralNetworkToolkit
lbann::callback::ltfb Class Reference

Tournament training. More...

#include <ltfb.hpp>

Inheritance diagram for lbann::callback::ltfb:
[legend]
Collaboration diagram for lbann::callback::ltfb:
[legend]

Public Member Functions

 ltfb (El::Int batch_interval, std::string metric_name, std::unique_ptr< LTFBCommunicationAlgorithm > comm_algo, bool exchange_hyperparameters=false)
 Construct the LTFB callback. More...
 
 ltfb (const ltfb &other)
 
ltfboperator= (const ltfb &other)
 
ltfbcopy () const override
 
std::string name () const override
 Return this callback's name. More...
 
void on_train_begin (model *m) override
 Called at the beginning of training. More...
 
void on_batch_begin (model *m) override
 Called at the beginning of a (mini-)batch. More...
 
- Public Member Functions inherited from lbann::callback_base
 callback_base (int batch_interval=1)
 Initialize a callback with an optional batch interval. More...
 
 callback_base (const callback_base &)=default
 
virtual ~callback_base ()=default
 
virtual void setup (trainer *t)
 Called once to set up the callback on the trainer. More...
 
virtual void setup (model *m)
 Called once to set up the callback on the model (after all layers are set up). More...
 
virtual void on_setup_end (model *m)
 Called at the end of setup. More...
 
virtual void on_train_end (model *m)
 Called at the end of training. More...
 
virtual void on_phase_end (model *m)
 Called at the end of every phase (multiple epochs) in a layer-wise model training. More...
 
virtual void on_epoch_begin (model *m)
 Called at the beginning of each epoch. More...
 
virtual void on_epoch_end (model *m)
 Called immediate after the end of each epoch. More...
 
virtual void on_batch_end (model *m)
 Called immediately after the end of a (mini-)batch. More...
 
virtual void on_test_begin (model *m)
 Called at the beginning of testing. More...
 
virtual void on_test_end (model *m)
 Called immediately after the end of testing. More...
 
virtual void on_validation_begin (model *m)
 Called at the beginning of validation. More...
 
virtual void on_validation_end (model *m)
 Called immediately after the end of validation. More...
 
virtual void on_forward_prop_begin (model *m)
 Called when a model begins forward propagation. More...
 
virtual void on_forward_prop_begin (model *m, Layer *l)
 Called when a layer begins forward propagation. More...
 
virtual void on_forward_prop_end (model *m)
 Called when a model ends forward propagation. More...
 
virtual void on_forward_prop_end (model *m, Layer *l)
 Called when a layer ends forward propagation. More...
 
virtual void on_backward_prop_begin (model *m)
 Called when a model begins backward propagation. More...
 
virtual void on_backward_prop_begin (model *m, Layer *l)
 Called when a layer begins backward propagation. More...
 
virtual void on_backward_prop_end (model *m)
 Called when a model ends backward propagation. More...
 
virtual void on_backward_prop_end (model *m, Layer *l)
 Called when a layer ends backward propagation. More...
 
virtual void on_optimize_begin (model *m)
 Called when a model begins optimization. More...
 
virtual void on_optimize_begin (model *m, weights *w)
 Called when weights begins optimization. More...
 
virtual void on_optimize_end (model *m)
 Called when a model ends optimization. More...
 
virtual void on_optimize_end (model *m, weights *w)
 Called when weights ends optimization. More...
 
virtual void on_batch_evaluate_begin (model *m)
 Called at the beginning of a (mini-)batch evaluation (validation / testing). More...
 
virtual void on_batch_evaluate_end (model *m)
 Called at the end of a (mini-)batch evaluation (validation / testing). More...
 
virtual void on_evaluate_forward_prop_begin (model *m)
 Called when a model begins forward propagation for evaluation (validation / testing). More...
 
virtual void on_evaluate_forward_prop_begin (model *m, Layer *l)
 Called when a layer begins forward propagation for evaluation (validation / testing). More...
 
virtual void on_evaluate_forward_prop_end (model *m)
 Called when a model ends forward propagation for evaluation (validation / testing). More...
 
virtual void on_evaluate_forward_prop_end (model *m, Layer *l)
 Called when a layer ends forward propagation for evaluation (validation / testing). More...
 
int get_batch_interval () const
 Return the batch interval. More...
 
virtual description get_description () const
 Human-readable description. More...
 
template<class Archive >
void serialize (Archive &ar)
 Store state to archive for checkpoint and restart. More...
 
void write_proto (lbann_data::Callback &proto) const
 Write a protobuf description of the callback. More...
 

Private Member Functions

void write_specific_proto (lbann_data::Callback &proto) const final
 

Private Attributes

std::string m_metric_name
 Metric for tournament evaluation. More...
 
std::unique_ptr< LTFBCommunicationAlgorithm > comm_algo_
 Communication algorithm for exchanging models. More...
 
bool m_low_score_wins
 Whether low-scoring or high-scoring models survive a tournament. More...
 

Additional Inherited Members

- Protected Member Functions inherited from lbann::callback_base
std::string get_multi_trainer_path (const model &m, const std::string &root_dir)
 Build a standard directory hierarchy including trainer ID. More...
 
std::string get_multi_trainer_ec_model_path (const model &m, const std::string &root_dir)
 Build a standard directory hierachy including trainer, execution context, and model information (in that order). More...
 
std::string get_multi_trainer_model_path (const model &m, const std::string &root_dir)
 Build a standard directory hierachy including trainer, model information in that order. More...
 
callback_baseoperator= (const callback_base &)=default
 Copy-assignment operator. More...
 
- Protected Attributes inherited from lbann::callback_base
int m_batch_interval
 Batch methods should once every this many steps. More...
 

Detailed Description

Tournament training.

This is intended to support research into the LTFB algorithm. An outline:

  • Divide the computational resources into multiple "trainers" that can operate in parallel.
  • Setup a model on each trainer and begin training independently.
  • Periodically launch tournaments to select "good" models. More specifically, trainers partner up and exchange their models. Each trainer evaluates a metric for its local and partner models, using its validation data set. The model with the better score is retained and the other one is discarded.

There are many algorithmic variations to be explored:

  • How is data is divvied up amongst the trainers. Is it strictly partitioned, partially shared, or completely replicated?
  • What model components are exchanged? Just the trainable weights, or a subset of the weights? Hyperparameters?
  • Can this be used to explore model architectures?

Definition at line 62 of file callbacks/ltfb.hpp.

Constructor & Destructor Documentation

◆ ltfb() [1/2]

lbann::callback::ltfb::ltfb ( El::Int  batch_interval,
std::string  metric_name,
std::unique_ptr< LTFBCommunicationAlgorithm >  comm_algo,
bool  exchange_hyperparameters = false 
)

Construct the LTFB callback.

Parameters
batch_intervalNumber of training mini-batch steps between tournaments.
metric_nameMetric for tournament evaluation.
comm_algoInter-trainer communication scheme.
exchange_hyperparametersWhether to exchange hyperparameters with model information.
Here is the caller graph for this function:

◆ ltfb() [2/2]

lbann::callback::ltfb::ltfb ( const ltfb other)

Member Function Documentation

◆ copy()

ltfb* lbann::callback::ltfb::copy ( ) const
inlineoverridevirtual

Implements lbann::callback_base.

Definition at line 79 of file callbacks/ltfb.hpp.

Here is the call graph for this function:

◆ name()

std::string lbann::callback::ltfb::name ( ) const
inlineoverridevirtual

Return this callback's name.

Implements lbann::callback_base.

Definition at line 80 of file callbacks/ltfb.hpp.

Here is the call graph for this function:

◆ on_batch_begin()

void lbann::callback::ltfb::on_batch_begin ( model m)
overridevirtual

Called at the beginning of a (mini-)batch.

Reimplemented from lbann::callback_base.

Here is the caller graph for this function:

◆ on_train_begin()

void lbann::callback::ltfb::on_train_begin ( model m)
overridevirtual

Called at the beginning of training.

Reimplemented from lbann::callback_base.

Here is the caller graph for this function:

◆ operator=()

ltfb& lbann::callback::ltfb::operator= ( const ltfb other)

◆ write_specific_proto()

void lbann::callback::ltfb::write_specific_proto ( lbann_data::Callback &  proto) const
finalprivatevirtual

Add callback specific data to prototext

Implements lbann::callback_base.

Here is the caller graph for this function:

Member Data Documentation

◆ comm_algo_

std::unique_ptr<LTFBCommunicationAlgorithm> lbann::callback::ltfb::comm_algo_
private

Communication algorithm for exchanging models.

Definition at line 93 of file callbacks/ltfb.hpp.

◆ m_low_score_wins

bool lbann::callback::ltfb::m_low_score_wins
private

Whether low-scoring or high-scoring models survive a tournament.

Definition at line 97 of file callbacks/ltfb.hpp.

◆ m_metric_name

std::string lbann::callback::ltfb::m_metric_name
private

Metric for tournament evaluation.

Definition at line 90 of file callbacks/ltfb.hpp.


The documentation for this class was generated from the following file: