Data8 Administration & Batch Data Cleansing API

Manage your Data8 account, submit data to batch data cleansing jobs and retrieve the results

Jobs are submitted to workflows that are built for you by the Data8 Production Team to your specifications, and the details of the data to be provided to each workflow and generated by it will be documented by them.

If you do not already have a workflow available to submit jobs to, please get in touch with your account manager to discuss your requirements.

All requests must be authenticated using an Authorization: Bearer header, with the bearer token being obtained from the Data8 OAuth token server at https://auth.data-8.co.uk/connect/token.

/Dataset

GET

/Dataset

Retrieves a list of datasets

Gets a paged list of datasets. Up to 100 datasets will be included in each page, and datasets will be sorted alphabetically. Only the basic details of each dataset are included. Use the GET /Dataset/{name} endpoint to retrieve the full details of a dataset. The URL of that endpoint is included in the Location property of the returned objects.

Parameters

The number of the page to retrieve. The first page is number 1.

Indicates whether input datasets should be included in the result

Indicates whether output datasets should be included in the result

Responses

A list of DatasetRef objects that list the basic details of each matching dataset

Some validation error occurred

Example Value

Schema

{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}

POST

/Dataset

Creates a new dataset

Each dataset must have a unique name. The name must start with a character a-z and can only include the characters a-z, 0-9 or _.

Datasets are either input or output datasets. Only input datasets can be created using this endpoint. Output datasets are created automatically as required when jobs are started.

Each input dataset must have its columns specified when it is created. Each column has a name, following the same naming requirements as above for the dataset, and a type.

When the dataset is created it is marked as incomplete. Add data to the dataset using the PATCH /Dataset/{name}/data endpoint. Once all data has been added, use the PUT /Dataset/{name} endpoint to mark the dataset as complete before using it as input to a job.

The full details of the dataset can be retrieved at any time using the GET endpoint.

Parameters

No parameters.

Request Body

The details of the dataset to create

Example Value

Schema

{
	"name": "sample_dataset_1",
	"columns": {
	  "firstname": 0,
	  "lastname": 0,
	  "priority": 1
	}
}

Responses

The dataset has been created

Example Value

Schema

{
	"name": "sample_dataset_1",
	"columns": {
	  "firstname": 0,
	  "lastname": 0,
	  "priority": 1
	},
	"input": true,
	"recordCount": 0,
	"completed": true
}

Some validation error occurred in the dataset details

Example Value

Schema

{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}

Another dataset with the same name already exists

Example Value

Schema

{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}

GET

/Dataset/{name}

Gets the details of a named dataset

Parameters

The name of the dataset to get the details of

Responses

No dataset could be found with the supplied name

The request has succeeded and the details of the requested dataset have been returned

Example Value

Schema

{
	"name": "sample_dataset_1",
	"columns": {
	  "firstname": 0,
	  "lastname": 0,
	  "priority": 1
	},
	"input": true,
	"recordCount": 0,
	"completed": true
}

PUT

/Dataset/{name}

Updates a dataset

Use this endpoint to mark an input dataset as complete. Once it is complete it can be used as input to a job.

Only an incomplete input dataset can be updated. If the dataset is an output dataset, or if it has already been completed, a 400 error will be returned.

Parameters

The name of the dataset to update

Request Body

The properties of the dataset to update

Example Value

Schema

{
	"recordCount": 0,
	"completed": true
}

Responses

The dataset has been updated successfully

A validation error has occurred

Example Value

Schema

{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}

The requested dataset does not exist

DELETE

/Dataset/{name}

Deletes a dataset

Use this endpoint to permanently delete a dataset. All the data in the dataset will be removed, and the dataset will not appear in the GET /Dataset endpoint.

Some basic metadata of the dataset will be retained, including the name, when it was created and deleted and by whom. Because this metadata is kept the name cannot be reused for any future datasets.

Parameters

The name of the dataset to delete

Responses

The dataset has been deleted successfully

The requested dataset does not exist

PATCH

/Dataset/{name}/data

Uploads records to an input dataset

The dataset given by the name parameter must be an incomplete input dataset. If the dataset has already been completed, or if it is an output dataset, a 400 error will be produced.

Multiple uploads can be made to the same dataset. Each block of records must be identified by a unique block number. If the same block number is used for multiple uploads to the same dataset, the previously uploaded records will be overwritten.

Depending on the job the records will be submitted to, the order of the records may be important. Records will be processed in ascending block number order, then the order in which they were submitted within that block.

Split large uploads into multiple blocks. Up to 1,000 records can be supplied in each block.

Parameters

The name of the dataset to upload the records to

Request Body

The details of the records to upload

Example Value

Schema

{
	"blockNumber": 0,
	"records": []
}

Responses

The data has been uploaded to the dataset correctly

A validation error in the record block has occurred

Example Value

Schema

{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}

The requested dataset does not exist

GET

/Dataset/{name}/data

Downloads data from an output dataset.

The dataset given by the name parameter must be a completed output dataset. If the dataset is an input dataset, or if it is incomplete, a 400 error is returned.

Up to 1,000 records can be downloaded at a time. Use the GET /Dataset/{name} endpoint to retrieve the total number of records, and make multiple requests to get the records in blocks of up to 1,000.

The starting point of each block of records is given by the start parameter. The first record in a dataset is number 1.

Parameters

The name of the dataset to download the records from

The number of the first record to download. The first record in a dataset is number 1

The total number of records to download

Responses

The requested dataset does not exist

Example Value

Schema

{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}

The request has succeeded and the requested records have been returned

A validation error has occurred

Example Value

Schema

{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}