Data Ingestion

Getting data into LBANN requires using one of the predefined data readers or writing a customized data reader tailored to the data in question. Currently, data readers serve two roles:

Define how to ingest data from storage (at rest) and place it into an LBANN-compatible format
Understand the structure of a well-defined (“named”) data set such as MNIST or ImageNet-1K (ILSVRC).

As LBANN is evolving, we are working to separate these two behaviors into distinct objects, but it is still a work in progress. As a result there are some “legacy” data readers that represent both of these features, and some new data readers that focus more on task 1 and incorporate the use of a sample list to help with task 2.

At this time, LBANN can only ingest static data sets; work on streaming data is in progress.

Legacy Data Readers

Some of the legacy data readers are the MNIST, ImageNet, and CIFAR10 data readers.

“New” Data Readers

Two of the new format data readers are the python, SMILES, and HDF5 readers.

Several of these readers (SMILES and HDF5) support the use of sample lists.