Each dataset is identified with a domain-dependent
dataset format.
In general, a dataset consists of multiple
shards; for example, for some
tasks, the training, development, and test sets are standardized; these
would constitute three different shards. For supervised learning, typically
two shards are used: train and test. One can also upload a dataset with
just one shard called
raw, which will be automatically split into training and test.