Each dataset is identified with a domain-dependent dataset format.
In general, a dataset consists of multiple shards; for example, for some
tasks, the training, development, and test sets are standardized; these
would constitute three different shards. For supervised learning, typically
two shards are used: train and test. One can also upload a dataset with
just one shard called raw, which will be automatically split into training and test.
When a dataset is uploaded, it is processed by a dataset processor,
which validates the dataset according to the expected format,
computes statistics on the dataset, and splits the data if necessary.
Runs can only be executed on a dataset only after it has been processed.
Create one shard, called raw. Upon upload, a dataset processor will automatically split
this shard into a train shard and a test shard containing 70% and 30% of the examples,
respectively.
By default, your dataset can be downloaded by anyone.
However, if there are licensing restrictions,
you can check the restricted access box for a dataset to prohibit downloads.
However, note that others can use your data to test their algorithms.