Tournament training. More...

#include <ltfb.hpp>

Inheritance diagram for lbann::callback::ltfb:

Collaboration diagram for lbann::callback::ltfb:

Public Member Functions
	ltfb (El::Int batch_interval, std::string metric_name, std::unique_ptr< LTFBCommunicationAlgorithm > comm_algo, bool exchange_hyperparameters=false)
	Construct the LTFB callback. More...

	ltfb (const ltfb &other)

ltfb &	operator= (const ltfb &other)

ltfb *	copy () const override

std::string	name () const override
	Return this callback's name. More...

void	on_train_begin (model *m) override
	Called at the beginning of training. More...

void	on_batch_begin (model *m) override
	Called at the beginning of a (mini-)batch. More...

Public Member Functions inherited from lbann::callback_base
	callback_base (int batch_interval=1)
	Initialize a callback with an optional batch interval. More...

	callback_base (const callback_base &)=default

virtual	~callback_base ()=default

virtual void	setup (trainer *t)
	Called once to set up the callback on the trainer. More...

virtual void	setup (model *m)
	Called once to set up the callback on the model (after all layers are set up). More...

virtual void	on_setup_end (model *m)
	Called at the end of setup. More...

virtual void	on_train_end (model *m)
	Called at the end of training. More...

virtual void	on_phase_end (model *m)
	Called at the end of every phase (multiple epochs) in a layer-wise model training. More...

virtual void	on_epoch_begin (model *m)
	Called at the beginning of each epoch. More...

virtual void	on_epoch_end (model *m)
	Called immediate after the end of each epoch. More...

virtual void	on_batch_end (model *m)
	Called immediately after the end of a (mini-)batch. More...

virtual void	on_test_begin (model *m)
	Called at the beginning of testing. More...

virtual void	on_test_end (model *m)
	Called immediately after the end of testing. More...

virtual void	on_validation_begin (model *m)
	Called at the beginning of validation. More...

virtual void	on_validation_end (model *m)
	Called immediately after the end of validation. More...

virtual void	on_forward_prop_begin (model *m)
	Called when a model begins forward propagation. More...

virtual void	on_forward_prop_begin (model m, Layer l)
	Called when a layer begins forward propagation. More...

virtual void	on_forward_prop_end (model *m)
	Called when a model ends forward propagation. More...

virtual void	on_forward_prop_end (model m, Layer l)
	Called when a layer ends forward propagation. More...

virtual void	on_backward_prop_begin (model *m)
	Called when a model begins backward propagation. More...

virtual void	on_backward_prop_begin (model m, Layer l)
	Called when a layer begins backward propagation. More...

virtual void	on_backward_prop_end (model *m)
	Called when a model ends backward propagation. More...

virtual void	on_backward_prop_end (model m, Layer l)
	Called when a layer ends backward propagation. More...

virtual void	on_optimize_begin (model *m)
	Called when a model begins optimization. More...

virtual void	on_optimize_begin (model m, weights w)
	Called when weights begins optimization. More...

virtual void	on_optimize_end (model *m)
	Called when a model ends optimization. More...

virtual void	on_optimize_end (model m, weights w)
	Called when weights ends optimization. More...

virtual void	on_batch_evaluate_begin (model *m)
	Called at the beginning of a (mini-)batch evaluation (validation / testing). More...

virtual void	on_batch_evaluate_end (model *m)
	Called at the end of a (mini-)batch evaluation (validation / testing). More...

virtual void	on_evaluate_forward_prop_begin (model *m)
	Called when a model begins forward propagation for evaluation (validation / testing). More...

virtual void	on_evaluate_forward_prop_begin (model m, Layer l)
	Called when a layer begins forward propagation for evaluation (validation / testing). More...

virtual void	on_evaluate_forward_prop_end (model *m)
	Called when a model ends forward propagation for evaluation (validation / testing). More...

virtual void	on_evaluate_forward_prop_end (model m, Layer l)
	Called when a layer ends forward propagation for evaluation (validation / testing). More...

int	get_batch_interval () const
	Return the batch interval. More...

virtual description	get_description () const
	Human-readable description. More...

template<class Archive >
void	serialize (Archive &ar)
	Store state to archive for checkpoint and restart. More...

void	write_proto (lbann_data::Callback &proto) const
	Write a protobuf description of the callback. More...

Private Member Functions
void	write_specific_proto (lbann_data::Callback &proto) const final

Private Attributes
std::string	m_metric_name
	Metric for tournament evaluation. More...

std::unique_ptr< LTFBCommunicationAlgorithm >	comm_algo_
	Communication algorithm for exchanging models. More...

bool	m_low_score_wins
	Whether low-scoring or high-scoring models survive a tournament. More...

Additional Inherited Members
Protected Member Functions inherited from lbann::callback_base
std::string	get_multi_trainer_path (const model &m, const std::string &root_dir)
	Build a standard directory hierarchy including trainer ID. More...

std::string	get_multi_trainer_ec_model_path (const model &m, const std::string &root_dir)
	Build a standard directory hierachy including trainer, execution context, and model information (in that order). More...

std::string	get_multi_trainer_model_path (const model &m, const std::string &root_dir)
	Build a standard directory hierachy including trainer, model information in that order. More...

callback_base &	operator= (const callback_base &)=default
	Copy-assignment operator. More...

Protected Attributes inherited from lbann::callback_base
int	m_batch_interval
	Batch methods should once every this many steps. More...

Detailed Description

Tournament training.

This is intended to support research into the LTFB algorithm. An outline:

Divide the computational resources into multiple "trainers" that can operate in parallel.
Setup a model on each trainer and begin training independently.
Periodically launch tournaments to select "good" models. More specifically, trainers partner up and exchange their models. Each trainer evaluates a metric for its local and partner models, using its validation data set. The model with the better score is retained and the other one is discarded.

There are many algorithmic variations to be explored:

How is data is divvied up amongst the trainers. Is it strictly partitioned, partially shared, or completely replicated?
What model components are exchanged? Just the trainable weights, or a subset of the weights? Hyperparameters?
Can this be used to explore model architectures?

Definition at line 62 of file callbacks/ltfb.hpp.

Constructor & Destructor Documentation

◆ ltfb() [1/2]

lbann::callback::ltfb::ltfb	(	El::Int	batch_interval,
		std::string	metric_name,
		std::unique_ptr< LTFBCommunicationAlgorithm >	comm_algo,
		bool	exchange_hyperparameters = `false`
	)

Construct the LTFB callback.

Parameters

batch_interval	Number of training mini-batch steps between tournaments.
metric_name	Metric for tournament evaluation.
comm_algo	Inter-trainer communication scheme.
exchange_hyperparameters	Whether to exchange hyperparameters with model information.

Here is the caller graph for this function:

◆ ltfb() [2/2]

lbann::callback::ltfb::ltfb ( const ltfb & other )

Member Function Documentation

◆ copy()

ltfb* lbann::callback::ltfb::copy ( ) const

inlineoverridevirtual

Implements lbann::callback_base.

Definition at line 79 of file callbacks/ltfb.hpp.

Here is the call graph for this function:

◆ name()

std::string lbann::callback::ltfb::name ( ) const

inlineoverridevirtual

Return this callback's name.

Implements lbann::callback_base.

Definition at line 80 of file callbacks/ltfb.hpp.

Here is the call graph for this function:

◆ on_batch_begin()

void lbann::callback::ltfb::on_batch_begin ( model * m )

overridevirtual

Called at the beginning of a (mini-)batch.

Reimplemented from lbann::callback_base.

Here is the caller graph for this function:

◆ on_train_begin()

void lbann::callback::ltfb::on_train_begin ( model * m )

overridevirtual

Called at the beginning of training.

Reimplemented from lbann::callback_base.

Here is the caller graph for this function:

◆ operator=()

ltfb& lbann::callback::ltfb::operator= ( const ltfb & other )

◆ write_specific_proto()

void lbann::callback::ltfb::write_specific_proto ( lbann_data::Callback & proto ) const

finalprivatevirtual

Add callback specific data to prototext

Implements lbann::callback_base.

Here is the caller graph for this function:

Member Data Documentation

◆ comm_algo_

std::unique_ptr<LTFBCommunicationAlgorithm> lbann::callback::ltfb::comm_algo_

private

Communication algorithm for exchanging models.

Definition at line 93 of file callbacks/ltfb.hpp.

◆ m_low_score_wins

bool lbann::callback::ltfb::m_low_score_wins

private

Whether low-scoring or high-scoring models survive a tournament.

Definition at line 97 of file callbacks/ltfb.hpp.

◆ m_metric_name

std::string lbann::callback::ltfb::m_metric_name

private

Metric for tournament evaluation.

Definition at line 90 of file callbacks/ltfb.hpp.

The documentation for this class was generated from the following file:

callbacks/ltfb.hpp

Public Member Functions

Private Member Functions

Private Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ ltfb() [1/2]

◆ ltfb() [2/2]

Member Function Documentation

◆ copy()

◆ name()

◆ on_batch_begin()

◆ on_train_begin()

◆ operator=()

◆ write_specific_proto()

Member Data Documentation

◆ comm_algo_

◆ m_low_score_wins

◆ m_metric_name