Miscellaneous Layers
Layer |
Description |
---|---|
Get index of maximum-value tensor entry |
|
Get index of minimum-value tensor entry |
|
Mean values across channel dimension |
|
Softmax across channel dimension |
|
Covariance between entries of two tensors |
|
Embedding layer with distributed weights |
|
Create layer from an external library |
|
Position of data sample within mini-batch |
|
Size of current mini-batch |
|
Convert index to a one-hot vector |
|
L2 norm of each row of a weights matrix |
|
Apply a hash function to get uniformly distributed values |
|
Variance of tensor entries |
Argmax
The Argmax
layer gets the index of the maximum-value tensor
entry.
Expects a 1D input tensor. If multiple entries have the same maximum value, outputs the index of the first one.
Arguments: None
Argmin
The Argmin
layer gets the index of the minimum-value tensor
entry.
Expects a 1D input tensor. If multiple entries have the same minimum value, outputs the index of the first one.
Arguments: None
ChannelwiseMean
The ChannelwiseMean
layer computes mean values across
channel dimensions.
The input tensor is sliced along the first tensor dimension (the “channel” dimension for image data in CHW format) and the mean value is computed for each slice.
Arguments: None
ChannelwiseSoftmax
The ChannelwiseSoftmax
layer applies the Softmax function
across channel dimensions.
The input tensor is sliced along the first tensor dimension (the “channel” dimension for image data in CHW format) and the softmax function is computed for each slice.
Arguments: None
Covariance
The Covariance
layer computes the covarience between entries
of two tensors.
Arguments:
- biased
(
bool
) Use biased estimator, i.e. sample covariance
DistEmbedding
The DistEmbedding
layer is the embedding layer with
distributed weights.
This is similar to the embedding layer, which takes integer indices and returns embedding vectors from a lookup table. However, the embedding vectors are distributed between processes and one-sided inter-process communication is performed with OpenSHMEM (on CPU) or NVSHMEM (on GPU).
The main benefit of this model-parallel approach is to handle cases where the embedding vectors don’t fit on one process. It should also have better scaling properties when the mini-batch size is very large.
To take advantage of sparse gradients, the distributed embedding layer provides the option to bypass the optimizer (which currently only supports dense gradients) and perform sparse SGD directly on the embedding weights. If enabled, SGD occurs during the layers “update” phase (i.e. in the virtual update_compute function). Otherwise, the layer converts sparse gradients to a dense tensor and passes it into the usual optimizer. This is a hack and will be deprecated once the optimizer class supports sparse gradients.
Warning
This is experimental.
Todo
Sparse SGD with optimizer class
Arguments:
- num_embeddings
(
int64
) Size of dictionary of embeddings.- embedding_dim
(
int64
) Size of embedding vectors.- sparse_sgd
(
bool
) Perform sparse SGD during backprop.Bypasses optimizer class.
- learning_rate
(
double
) SGD learning rate.- barrier_in_forward_prop
(
bool
) Perform a blocking barrier a the beginning of forward prop.This layer performs synchronization with non-blocking barriers to ensure the correctness of asynchronous communication. However, gradient checking changes the embedding values without performing any synchronization. The quickest fix is to do a blocking barrier at the beginning of forward prop to make sure that all the embeddings are ready to be accessed.
Todo
Think of a way to avoid this synchronization.
External
The External
layer creates a layer from an external
library.
An external layer can be created by compiling an LBANN layer object in a separate shared library (such as an .so file), along with a setup function that creates it. This layer accepts a file path and a layer name (so more than one can exist in a library), and will invoke the library dynamically to create the layer. The layer in the external library can be set with an arbitrary number of inputs, outputs, and weights.
Compiling a layer only needs to include the LBANN headers and link against
liblbann.so
. An extern "C"
function named setup_<LAYER NAME>
must exist for LBANN to be able to create the layer.
Warning
Make sure you link the library with the version of LBANN you plan to run it with.
Note
An example layer can be found in src/layers/unit_test/example_layer.cpp
.
Arguments:
- filename
(
string
) Library file name or path.- layer_name
(
string
) Layer name for setup function.
MiniBatchIndex
The MiniBatchIndex
is the position of a data sample within a
mini-batch.
LBANN does implicit mini-batching and data samples are usually processed independently. This layer is helpful if some mini-batch samples need to be processed differently from others.
Arguments: None
MiniBatchSize
The MiniBatchSize
is the size of the current mini-batch.
Arguments: None
OneHot
The OneHot
layer converts an index to a one-hot vector.
Expects a scalar input tensor and outputs a 1D tensor. The input is interpreted as an index, and output entries are one if they correspond to that index and zero otherwise. Out-of-range indices are ignored.
Arguments:
- size
(
int64
) Size of one-hot vector
RowwiseWeightsNorms
The RowwiseWeightsNorms
layer is the L2 norm of each row of
a weights matrix.
Warning
This layer is experimental and finnicky. It is intended for use with the matrix weights from a fully-connected layer, and other use-cases may have strange behavior.
Given a weights object, this layer computes the L2 norm for each row of the underlying matrix. Note that the internal matrix may have different dimensions than the logical weight dimensions.
This layer expects to have one weights object. During setup, that weights object should be initialized by another layer before this layer’s setup phase. Setting a “hint layer” may be necessary to enforce this ordering.
Arguments: None
UniformHash
The UniformHash
layer applies a hash function to get
uniformly distributed values.
Each input entry is hashed with MD5 and scaled to [0,1).
Warning
Currently only supported on GPU.
Arguments: None
Variance
The Variance
layer computes the variance of tensor entries.
Arguments:
- biased
(
bool
) Use biased estimator, i.e. sample variance