Abstract base class for gradient-based optimization algorithms. More...

#include <optimizer.hpp>

Inheritance diagram for lbann::optimizer:

Collaboration diagram for lbann::optimizer:

Classes
class	GradientHelper
	Manage gradient information. More...

class	GradientHelperImpl

Public Member Functions
virtual std::string	get_type () const =0
	Human-readable type name. More...

virtual description	get_description () const
	Human-readable description. More...

virtual double	get_learning_rate () const =0

virtual void	set_learning_rate (double)=0

virtual void	write_proto (lbann_data::Optimizer &proto) const =0
	Add optimizer data to prototext. More...

	optimizer (const optimizer &other)
	Copy construct/copy assign. More...

optimizer &	operator= (const optimizer &other)

optimizer_gradient_status	get_gradient_status () const
	Return the current gradient status. More...

void	set_gradient_status (const optimizer_gradient_status status)

std::unordered_set< const void * > &	get_gradient_sources ()

void	set_comm (lbann_comm &comm)

void	set_step_time (EvalType time)

void	inc_step_time (EvalType time)

virtual std::tuple< El::Int, El::Int, El::DistData >	get_matrix_info () const =0

template<typename TensorDataType >
void	accumulate_all_gradient_contributions (El::AbstractDistMatrix< TensorDataType > &gradient)

void	start_gradient_allreduce ()
	Launch non-blocking allreduce on the gradient, if needed. More...

void	finish_gradient_allreduce ()
	Synchronize non-blocking allreduce on the gradient, if needed. More...

Constructors and Destructor
	optimizer ()

virtual	~optimizer ()=default

Gradient update management
virtual void	setup (weights *w)=0

template<typename TensorDataType >
void	add_to_gradient (El::AbstractDistMatrix< TensorDataType > const &contrib, TensorDataType scale=1.f, bool allreduce_needed=false)
	Add to the objective function gradient w.r.t. the weights. More...

void	clear_gradient ()
	Zero out the objective function gradient w.r.t. the weights. More...

El::Int	get_num_gradient_sources () const
	Objects that are expected to contribute to the gradient. More...

void	add_gradient_source (const void *source)
	Register a gradient source. More...

void	remove_gradient_source (const void *source)
	Unregister a gradient source. More...

virtual void	step ()=0
	Perform optimization step. More...

template<typename TensorDataType >
El::AbstractDistMatrix< TensorDataType > &	get_gradient_buffer (TensorDataType &buf_scale, TensorDataType &in_scale, bool allreduce_needed=false)
	Get the gradient buffer. More...


lbann_comm &	get_comm ()
	Communicator access. More...

const lbann_comm &	get_comm () const
	Access LBANN communicator. More...


EvalType	get_step_time () const
	Statistics access and management. More...

virtual void	reset_counters ()
	Reset stats counters. More...

Checkpointing
template<class Archive >
void	serialize (Archive &ar)
	Store state to archive for checkpoint and restart. More...

Public Member Functions inherited from lbann::Cloneable< HasAbstractFunction< optimizer > >
std::unique_ptr< HasAbstractFunction< optimizer > >	clone () const
	Return an exception-safe, memory-safe copy of this object. More...

Private Types
using	gradient_manager_type = GradientHelper
	Map from data types to gradient contributions. More...

using	gradient_manager_ptr = std::unique_ptr< gradient_manager_type >

Private Attributes
lbann_comm *	m_comm
	LBANN communicator. More...

std::unordered_set< const void * >	m_gradient_sources
	Sources of gradient contributions. More...

optimizer_gradient_status	m_gradient_status
	Status of values in objective function gradient. More...

EvalType	m_step_time = 0
	Time spent in optimization step. More...

std::unordered_map< std::type_index, gradient_manager_ptr >	gradients_

Detailed Description

Abstract base class for gradient-based optimization algorithms.

Uses a variant of stochastic gradient descent to optimize the values in a weights instance. The weights values are iteratively adjusted to minimize an objective function. Each optimization step requires the objective function gradient w.r.t. the weights.

Definition at line 85 of file optimizer.hpp.

Member Typedef Documentation

◆ gradient_manager_ptr

using lbann::optimizer::gradient_manager_ptr = std::unique_ptr<gradient_manager_type>

private

Definition at line 329 of file optimizer.hpp.

◆ gradient_manager_type

using lbann::optimizer::gradient_manager_type = GradientHelper

private

Map from data types to gradient contributions.

Todo:: Refactor this out. It's a hack.

Definition at line 328 of file optimizer.hpp.

Constructor & Destructor Documentation

◆ optimizer() [1/2]

lbann::optimizer::optimizer ( )

◆ ~optimizer()

virtual lbann::optimizer::~optimizer ( )

virtualdefault

◆ optimizer() [2/2]

lbann::optimizer::optimizer ( const optimizer & other )

Copy construct/copy assign.

Member Function Documentation

◆ accumulate_all_gradient_contributions()

template<typename TensorDataType >

void lbann::optimizer::accumulate_all_gradient_contributions ( El::AbstractDistMatrix< TensorDataType > & gradient )

Definition at line 112 of file optimizer_impl.hpp.

Here is the call graph for this function:

◆ add_gradient_source()

void lbann::optimizer::add_gradient_source ( const void * source )

Register a gradient source.

Any object that uses the weights and influences the objective function is expected to contribute to the objective function gradient. These objects should register themselves during forward prop.

◆ add_to_gradient()

template<typename TensorDataType >

void lbann::optimizer::add_to_gradient	(	El::AbstractDistMatrix< TensorDataType > const &	contrib,
		TensorDataType	scale = `1.f`,
		bool	allreduce_needed = `false`
	)

Add to the objective function gradient w.r.t. the weights.

Parameters

contrib	Contribution to gradient.
scale	Scaling factor for gradient contribution.
allreduce_needed	Whether the gradient contribution requires an allreduce over its redundant communicator. If false, duplicated data (over the redundant communicator) is assumed to be identical. If true, an allreduce is performed lazily when the gradient is accessed.

Definition at line 36 of file optimizer_impl.hpp.

Here is the call graph for this function:

◆ clear_gradient()

void lbann::optimizer::clear_gradient ( )

Zero out the objective function gradient w.r.t. the weights.

◆ finish_gradient_allreduce()

void lbann::optimizer::finish_gradient_allreduce ( )

Synchronize non-blocking allreduce on the gradient, if needed.

Does nothing if an allreduce isn't needed. Throws an exception if an allreduce is needed but hasn't been started.

◆ get_comm() [1/2]

lbann_comm& lbann::optimizer::get_comm ( )

inline

Communicator access.

Access LBANN communicator.

Definition at line 187 of file optimizer.hpp.

◆ get_comm() [2/2]

const lbann_comm& lbann::optimizer::get_comm ( ) const

inline

Access LBANN communicator.

Definition at line 190 of file optimizer.hpp.

◆ get_description()

virtual description lbann::optimizer::get_description ( ) const

virtual

Human-readable description.

Here is the caller graph for this function:

◆ get_gradient_buffer()

template<typename TensorDataType >

El::AbstractDistMatrix< TensorDataType > & lbann::optimizer::get_gradient_buffer	(	TensorDataType &	buf_scale,
		TensorDataType &	in_scale,
		bool	allreduce_needed = `false`
	)

Get the gradient buffer.

This provides access to the underlying gradient buffer, which may be directly summed into. This buffer should be considered ephemeral and not stored. The caller must also ensure the buffer has an appropriate distribution. buf_scale provides the caller with a scale factor that must be applied to the gradient buffer before writing to it, and in_scale provides a scaling factor that must be applied to the user's data. Essentially, this enables computations of the form

*    gradient = buf_scale*gradient + in_scale*new_gradient
*

This is an expert-mode function and is intended to help eliminate copies and facilitate kernel fusion.

Parameters

buf_scale	A scale factor provided to the caller to scale the returned buffer by.
in_scale	A scale factor provided to the caller to scale their gradient contributions by.
allreduce_needed	Whether this gradient contribution will need to be allreduced.

Definition at line 49 of file optimizer_impl.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ get_gradient_sources()

std::unordered_set<const void*>& lbann::optimizer::get_gradient_sources ( )

inline

Definition at line 271 of file optimizer.hpp.

◆ get_gradient_status()

optimizer_gradient_status lbann::optimizer::get_gradient_status ( ) const

inline

Return the current gradient status.

Definition at line 263 of file optimizer.hpp.

◆ get_learning_rate()

virtual double lbann::optimizer::get_learning_rate ( ) const

pure virtual

◆ get_matrix_info()

virtual std::tuple<El::Int, El::Int, El::DistData> lbann::optimizer::get_matrix_info ( ) const

pure virtual

Here is the caller graph for this function:

◆ get_num_gradient_sources()

El::Int lbann::optimizer::get_num_gradient_sources ( ) const

Objects that are expected to contribute to the gradient.

◆ get_step_time()

EvalType lbann::optimizer::get_step_time ( ) const

inline

Statistics access and management.

Time spent in optimization step.

Definition at line 197 of file optimizer.hpp.

◆ get_type()

virtual std::string lbann::optimizer::get_type ( ) const

pure virtual

Human-readable type name.

◆ inc_step_time()

void lbann::optimizer::inc_step_time ( EvalType time )

inline

Definition at line 279 of file optimizer.hpp.

◆ operator=()

optimizer& lbann::optimizer::operator= ( const optimizer & other )

Here is the caller graph for this function:

◆ remove_gradient_source()

void lbann::optimizer::remove_gradient_source ( const void * source )

Unregister a gradient source.

When an object adds its contribution to the objective function gradient during back prop, it should unregister itself. If there are no more gradient sources remaining, a non-blocking allreduce will be launched on the gradient, if needed.

◆ reset_counters()

virtual void lbann::optimizer::reset_counters ( )

inlinevirtual

Reset stats counters.

Definition at line 200 of file optimizer.hpp.

Here is the call graph for this function:

◆ serialize()

template<class Archive >

void lbann::optimizer::serialize ( Archive & ar )

Store state to archive for checkpoint and restart.

◆ set_comm()

void lbann::optimizer::set_comm ( lbann_comm & comm )

inline

Definition at line 275 of file optimizer.hpp.

◆ set_gradient_status()

void lbann::optimizer::set_gradient_status ( const optimizer_gradient_status status )

inline

Definition at line 267 of file optimizer.hpp.

◆ set_learning_rate()

virtual void lbann::optimizer::set_learning_rate ( double )

pure virtual

◆ set_step_time()

void lbann::optimizer::set_step_time ( EvalType time )

inline

Definition at line 277 of file optimizer.hpp.

◆ setup()

virtual void lbann::optimizer::setup ( weights * w )

pure virtual

◆ start_gradient_allreduce()

void lbann::optimizer::start_gradient_allreduce ( )

Launch non-blocking allreduce on the gradient, if needed.

Does nothing if an allreduce is not needed or has already been started.

◆ step()

virtual void lbann::optimizer::step ( )

pure virtual

Perform optimization step.

◆ write_proto()

virtual void lbann::optimizer::write_proto ( lbann_data::Optimizer & proto ) const

pure virtual

Add optimizer data to prototext.

Member Data Documentation

◆ gradients_

std::unordered_map<std::type_index, gradient_manager_ptr> lbann::optimizer::gradients_

private

Definition at line 330 of file optimizer.hpp.

◆ m_comm

lbann_comm* lbann::optimizer::m_comm

private

LBANN communicator.

Definition at line 304 of file optimizer.hpp.

◆ m_gradient_sources

std::unordered_set<const void*> lbann::optimizer::m_gradient_sources

private

Sources of gradient contributions.

This set contains pointers to objects (e.g. layers and objective function terms) that contribute to the objective function gradient. Objects should register themselves as they use the weights during forward prop and unregister themselves as they add their gradient contributions. Once this set is empty, it is safe to launch a non-blocking allreduce on the gradient, if needed.

Definition at line 316 of file optimizer.hpp.

◆ m_gradient_status

optimizer_gradient_status lbann::optimizer::m_gradient_status

private

Initial value:

=

optimizer_gradient_status::cleared

Status of values in objective function gradient.

Definition at line 319 of file optimizer.hpp.

◆ m_step_time

EvalType lbann::optimizer::m_step_time = 0

private

Time spent in optimization step.

Definition at line 323 of file optimizer.hpp.

The documentation for this class was generated from the following files:

Classes

Public Member Functions

Private Types

Private Attributes

Detailed Description

Member Typedef Documentation

◆ gradient_manager_ptr

◆ gradient_manager_type

Constructor & Destructor Documentation

◆ optimizer() [1/2]

◆ ~optimizer()

◆ optimizer() [2/2]

Member Function Documentation

◆ accumulate_all_gradient_contributions()

◆ add_gradient_source()

◆ add_to_gradient()

◆ clear_gradient()

◆ finish_gradient_allreduce()

◆ get_comm() [1/2]

◆ get_comm() [2/2]

◆ get_description()

◆ get_gradient_buffer()

◆ get_gradient_sources()

◆ get_gradient_status()

◆ get_learning_rate()

◆ get_matrix_info()

◆ get_num_gradient_sources()

◆ get_step_time()

◆ get_type()

◆ inc_step_time()

◆ operator=()

◆ remove_gradient_source()

◆ reset_counters()

◆ serialize()

◆ set_comm()

◆ set_gradient_status()

◆ set_learning_rate()

◆ set_step_time()

◆ setup()

◆ start_gradient_allreduce()

◆ step()

◆ write_proto()

Member Data Documentation

◆ gradients_

◆ m_comm

◆ m_gradient_sources

◆ m_gradient_status

◆ m_step_time