An implementation of the LTFB training algorithm. More...

#include <ltfb.hpp>

Inheritance diagram for lbann::LTFB:

Collaboration diagram for lbann::LTFB:

Public Types
using	TermCriteriaType = ltfb::LTFBTerminationCriteria

using	ExeContextType = ltfb::LTFBExecutionContext

Public Member Functions
Life-cycle management
	LTFB (std::string name, std::unique_ptr< TrainingAlgorithm > local_training_algorithm, std::unique_ptr< ltfb::MetaLearningStrategy > meta_learning_strategy, ltfb::LTFBTerminationCriteria stopping_criteria, bool suppress_timer)
	Construct LTFB from its component pieces. More...

	~LTFB () noexcept=default

	LTFB (LTFB const &other)=delete

LTFB &	operator= (LTFB const &)=delete

	LTFB (LTFB &&)=default

LTFB &	operator= (LTFB &&)=default


std::string	get_type () const final
	Queries. More...

Apply interface
void	apply (ExecutionContext &context, model &m, data_coordinator &dc, execution_mode mode) final
	Apply the training algorithm to refine model weights. More...

Public Member Functions inherited from lbann::TrainingAlgorithm
	TrainingAlgorithm (std::string name)
	Constructor. More...

virtual	~TrainingAlgorithm ()=default

std::string const &	get_name () const noexcept
	A user-defined string identifying the algorithm object. More...

void	apply (model &model, data_coordinator &dc)
	Apply the algorithm to the given model. More...

void	setup_models (std::vector< observer_ptr< model >> const &models, size_t max_mini_batch_size, const std::vector< El::Grid *> &grids)
	Setup a collection of models. More...

std::unique_ptr< ExecutionContext >	get_new_execution_context () const
	Get a default-initialized execution context that fits this training algorithm. More...

Protected Member Functions
ltfb::LTFBExecutionContext *	do_get_new_execution_context () const final
	Covariant return-friendly implementation of `get_new_exection_context()`. More...

Protected Member Functions inherited from lbann::TrainingAlgorithm
	TrainingAlgorithm (const TrainingAlgorithm &other)=delete

TrainingAlgorithm &	operator= (const TrainingAlgorithm &other)=delete

	TrainingAlgorithm (TrainingAlgorithm &&other)=default

TrainingAlgorithm &	operator= (TrainingAlgorithm &&other)=default

Private Attributes
std::unique_ptr< TrainingAlgorithm >	m_local_algo
	The training algorithm for trainer-local training. More...

std::unique_ptr< ltfb::MetaLearningStrategy >	m_meta_learning_strategy
	The strategy for postprocessing local training outputs. More...

ltfb::LTFBTerminationCriteria	m_termination_criteria
	The LTFB stopping criteria. More...

bool	m_suppress_timer = false
	Suppress timer output. More...

Detailed Description

An implementation of the LTFB training algorithm.

This is an example of a "meta-learning" training algorithm in which multiple models are trained in parallel – one model per trainer participating in the lbann_comm object. Following local training, some postprocessing strategy is applied to further optimize the solution. In the case of "classical LTFB", the trainers in the communicator are paired off randomly and "compete" in tournaments. The winner of each tournament is returned from the postprocessing to either undergo further local training or to be returned from the training algorithm.

The salient thing to realize is that every local training will be followed by this postprocessing. Therefore, it is expected that the output of the postprocessing be "at least as good" (by some relevant metric) as the one that went in. If, say, you want to "randomize" your model in some way, and then do some training, and then do some other stuff, this class can certainly serve as a useful guide, but is not likely to be the out-of-the-box solution.

Definition at line 66 of file execution_algorithms/ltfb.hpp.

Member Typedef Documentation

◆ ExeContextType

using lbann::LTFB::ExeContextType = ltfb::LTFBExecutionContext

Definition at line 70 of file execution_algorithms/ltfb.hpp.

◆ TermCriteriaType

using lbann::LTFB::TermCriteriaType = ltfb::LTFBTerminationCriteria

Definition at line 69 of file execution_algorithms/ltfb.hpp.

Constructor & Destructor Documentation

◆ LTFB() [1/3]

lbann::LTFB::LTFB	(	std::string	name,
		std::unique_ptr< TrainingAlgorithm >	local_training_algorithm,
		std::unique_ptr< ltfb::MetaLearningStrategy >	meta_learning_strategy,
		ltfb::LTFBTerminationCriteria	stopping_criteria,
		bool	suppress_timer
	)

inline

Construct LTFB from its component pieces.

Parameters

[in]	name	A string identifying this instance of LTFB.
[in]	local_training_algorithm	The training algorithm to be used for (trainer-)local training.
[in]	meta_learning_strategy	The postprocessing algorithm.
[in]	stopping_criteria	When to stop the training algorithm.

Definition at line 83 of file execution_algorithms/ltfb.hpp.

Here is the call graph for this function:

◆ ~LTFB()

lbann::LTFB::~LTFB ( )

defaultnoexcept

Here is the caller graph for this function:

◆ LTFB() [2/3]

lbann::LTFB::LTFB ( LTFB const & other )

delete

◆ LTFB() [3/3]

lbann::LTFB::LTFB ( LTFB && )

default

Member Function Documentation

◆ apply()

void lbann::LTFB::apply	(	ExecutionContext &	context,
		model &	m,
		data_coordinator &	dc,
		execution_mode	mode
	)

finalvirtual

Apply the training algorithm to refine model weights.

Parameters

[in,out]	context	The persistent execution context for this algorithm.
[in,out]	m	The model to be trained.
[in,out]	dc	The data source for training.
[in]	mode	Completely superfluous.

Implements lbann::TrainingAlgorithm.

Here is the caller graph for this function:

◆ do_get_new_execution_context()

ltfb::LTFBExecutionContext* lbann::LTFB::do_get_new_execution_context ( ) const

inlinefinalprotectedvirtual

Covariant return-friendly implementation of get_new_exection_context().

Implements lbann::TrainingAlgorithm.

Definition at line 123 of file execution_algorithms/ltfb.hpp.

◆ get_type()

std::string lbann::LTFB::get_type ( ) const

inlinefinalvirtual

Queries.

Implements lbann::TrainingAlgorithm.

Definition at line 103 of file execution_algorithms/ltfb.hpp.

Here is the caller graph for this function:

◆ operator=() [1/2]

LTFB& lbann::LTFB::operator= ( LTFB const & )

delete

Here is the caller graph for this function:

◆ operator=() [2/2]

LTFB& lbann::LTFB::operator= ( LTFB && )

default

Member Data Documentation

◆ m_local_algo

std::unique_ptr<TrainingAlgorithm> lbann::LTFB::m_local_algo

private

The training algorithm for trainer-local training.

Definition at line 130 of file execution_algorithms/ltfb.hpp.

◆ m_meta_learning_strategy

std::unique_ptr<ltfb::MetaLearningStrategy> lbann::LTFB::m_meta_learning_strategy

private

The strategy for postprocessing local training outputs.

Definition at line 133 of file execution_algorithms/ltfb.hpp.

◆ m_suppress_timer

bool lbann::LTFB::m_suppress_timer = false

private

Suppress timer output.

Deprecated:: This is a temporary way to disable timer output. This will be more configurable in the future.

Definition at line 143 of file execution_algorithms/ltfb.hpp.

◆ m_termination_criteria

ltfb::LTFBTerminationCriteria lbann::LTFB::m_termination_criteria

private

The LTFB stopping criteria.

Definition at line 136 of file execution_algorithms/ltfb.hpp.

The documentation for this class was generated from the following file:

execution_algorithms/ltfb.hpp

Public Types

Public Member Functions

Protected Member Functions

Private Attributes

Detailed Description

Member Typedef Documentation

◆ ExeContextType

◆ TermCriteriaType

Constructor & Destructor Documentation

◆ LTFB() [1/3]

◆ ~LTFB()

◆ LTFB() [2/3]

◆ LTFB() [3/3]

Member Function Documentation

◆ apply()

◆ do_get_new_execution_context()

◆ get_type()

◆ operator=() [1/2]

◆ operator=() [2/2]

Member Data Documentation

◆ m_local_algo

◆ m_meta_learning_strategy

◆ m_suppress_timer

◆ m_termination_criteria