Miscellaneous Layers

Layer	Description
Argmax	Get index of maximum-value tensor entry
Argmin	Get index of minimum-value tensor entry
ChannelwiseMean	Mean values across channel dimension
ChannelwiseSoftmax	Softmax across channel dimension
Covariance	Covariance between entries of two tensors
DistEmbedding	Embedding layer with distributed weights
External	Create layer from an external library
MiniBatchIndex	Position of data sample within mini-batch
MiniBatchSize	Size of current mini-batch
OneHot	Convert index to a one-hot vector
RowwiseWeightsNorms	L2 norm of each row of a weights matrix
UniformHash	Apply a hash function to get uniformly distributed values
Variance	Variance of tensor entries

Argmax

The Argmax layer gets the index of the maximum-value tensor entry.

Expects a 1D input tensor. If multiple entries have the same maximum value, outputs the index of the first one.

Arguments: None

Back to Top

Argmin

The Argmin layer gets the index of the minimum-value tensor entry.

Expects a 1D input tensor. If multiple entries have the same minimum value, outputs the index of the first one.

Arguments: None

Back to Top

ChannelwiseMean

The ChannelwiseMean layer computes mean values across channel dimensions.

The input tensor is sliced along the first tensor dimension (the “channel” dimension for image data in CHW format) and the mean value is computed for each slice.

Arguments: None

Back to Top

ChannelwiseSoftmax

The ChannelwiseSoftmax layer applies the Softmax function across channel dimensions.

The input tensor is sliced along the first tensor dimension (the “channel” dimension for image data in CHW format) and the softmax function is computed for each slice.

Arguments: None

Back to Top

Covariance

The Covariance layer computes the covarience between entries of two tensors.

Arguments:

biased

(bool) Use biased estimator, i.e. sample covariance

Back to Top

DistEmbedding

The DistEmbedding layer is the embedding layer with distributed weights.

This is similar to the embedding layer, which takes integer indices and returns embedding vectors from a lookup table. However, the embedding vectors are distributed between processes and one-sided inter-process communication is performed with OpenSHMEM (on CPU) or NVSHMEM (on GPU).

The main benefit of this model-parallel approach is to handle cases where the embedding vectors don’t fit on one process. It should also have better scaling properties when the mini-batch size is very large.

To take advantage of sparse gradients, the distributed embedding layer provides the option to bypass the optimizer (which currently only supports dense gradients) and perform sparse SGD directly on the embedding weights. If enabled, SGD occurs during the layers “update” phase (i.e. in the virtual update_compute function). Otherwise, the layer converts sparse gradients to a dense tensor and passes it into the usual optimizer. This is a hack and will be deprecated once the optimizer class supports sparse gradients.

Warning

This is experimental.

Todo

Sparse SGD with optimizer class

Arguments:

num_embeddings

(int64) Size of dictionary of embeddings.

embedding_dim

(int64) Size of embedding vectors.

sparse_sgd

(bool) Perform sparse SGD during backprop.

Bypasses optimizer class.

learning_rate

(double) SGD learning rate.

barrier_in_forward_prop

(bool) Perform a blocking barrier a the beginning of forward prop.

This layer performs synchronization with non-blocking barriers to ensure the correctness of asynchronous communication. However, gradient checking changes the embedding values without performing any synchronization. The quickest fix is to do a blocking barrier at the beginning of forward prop to make sure that all the embeddings are ready to be accessed.

Todo

Think of a way to avoid this synchronization.

Back to Top

External

The External layer creates a layer from an external library.

An external layer can be created by compiling an LBANN layer object in a separate shared library (such as an .so file), along with a setup function that creates it. This layer accepts a file path and a layer name (so more than one can exist in a library), and will invoke the library dynamically to create the layer. The layer in the external library can be set with an arbitrary number of inputs, outputs, and weights.

Compiling a layer only needs to include the LBANN headers and link against liblbann.so. An extern "C" function named setup_<LAYER NAME> must exist for LBANN to be able to create the layer.

Warning

Make sure you link the library with the version of LBANN you plan to run it with.

Note

An example layer can be found in src/layers/unit_test/example_layer.cpp.

Arguments:

filename

(string) Library file name or path.

layer_name

(string) Layer name for setup function.

Back to Top

MiniBatchIndex

The MiniBatchIndex is the position of a data sample within a mini-batch.

LBANN does implicit mini-batching and data samples are usually processed independently. This layer is helpful if some mini-batch samples need to be processed differently from others.

Arguments: None

Back to Top

MiniBatchSize

The MiniBatchSize is the size of the current mini-batch.

Arguments: None

Back to Top

OneHot

The OneHot layer converts an index to a one-hot vector.

Expects a scalar input tensor and outputs a 1D tensor. The input is interpreted as an index, and output entries are one if they correspond to that index and zero otherwise. Out-of-range indices are ignored.

Arguments:

size

(int64) Size of one-hot vector

Back to Top

RowwiseWeightsNorms

The RowwiseWeightsNorms layer is the L2 norm of each row of a weights matrix.

Warning

This layer is experimental and finnicky. It is intended for use with the matrix weights from a fully-connected layer, and other use-cases may have strange behavior.

Given a weights object, this layer computes the L2 norm for each row of the underlying matrix. Note that the internal matrix may have different dimensions than the logical weight dimensions.

This layer expects to have one weights object. During setup, that weights object should be initialized by another layer before this layer’s setup phase. Setting a “hint layer” may be necessary to enforce this ordering.

Arguments: None

Back to Top

UniformHash

The UniformHash layer applies a hash function to get uniformly distributed values.

Each input entry is hashed with MD5 and scaled to [0,1).

Warning

Currently only supported on GPU.

Arguments: None

Back to Top

Variance

The Variance layer computes the variance of tensor entries.

Arguments:

biased

(bool) Use biased estimator, i.e. sample variance

Back to Top