Data8 Logo

Data8 Administration & Batch Data Cleansing API

Manage your Data8 account, submit data to batch data cleansing jobs and retrieve the results

Jobs are submitted to workflows that are built for you by the Data8 Production Team to your specifications, and the details of the data to be provided to each workflow and generated by it will be documented by them.

If you do not already have a workflow available to submit jobs to, please get in touch with your account manager to discuss your requirements.

All requests must be authenticated using an Authorization: Bearer header, with the bearer token being obtained from the Data8 OAuth token server at https://auth.data-8.co.uk/connect/token.

GET

/ApiKey

Gets a list of the API keys defined for this account

Parameters
No parameters.
Responses

A list of the API keys defined for this account

POST

/ApiKey

Creates a new API key

Parameters
No parameters.
Request Body

The details of the key to create

Example Value Schema
{
	"key": "string",
	"description": "string",
	"username": "string",
	"expiresOn": "1970-01-01T00:00:00.0000000Z",
	"maxRequestsPerIPPerDay": 0,
	"isClientKey": true
}
Responses

The details of the newly created key

Example Value Schema
{
	"key": "string",
	"description": "string",
	"username": "string",
	"expiresOn": "1970-01-01T00:00:00.0000000Z",
	"maxRequestsPerIPPerDay": 0,
	"allowedDomains": [],
	"allowedIPs": [],
	"allowedServices": []
}
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/ApiKey/{key}

Gets the details of an API key

Parameters

The unique identifier for the API key

Responses

The full details of the requested key

Example Value Schema
{
	"key": "string",
	"description": "string",
	"username": "string",
	"expiresOn": "1970-01-01T00:00:00.0000000Z",
	"maxRequestsPerIPPerDay": 0,
	"allowedDomains": [],
	"allowedIPs": [],
	"allowedServices": []
}
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
PUT

/ApiKey/{key}

Updates an API key

Parameters

The unique identifier for the API key to update

Request Body

The updated details of the API key

Example Value Schema
{
	"key": "string",
	"description": "string",
	"username": "string",
	"expiresOn": "1970-01-01T00:00:00.0000000Z",
	"maxRequestsPerIPPerDay": 0,
	"isClientKey": true
}
Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
DELETE

/ApiKey/{key}

Deletes an API key

Parameters

The unique identifier of the API key to delete

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/ApiKey/{key}/allowedDomains

Gets the domain names that an API key can be used from

Parameters

The unique identifier of the API key to get the list of allowed domain names for

Responses

A list of the domain names that the API key can be used in client-side code from

Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
PUT

/ApiKey/{key}/allowedDomains/{domain}

Adds a domain to the list that an API key can be used from in client-side code

Parameters

The unique identifier of the API key to add an allowed domain to

The domain name that the API key can be used from

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
DELETE

/ApiKey/{key}/allowedDomains/{domain}

Removes a domain from the list that an API key can be used from in client-side code

Parameters

The unique identifier of the API key to remove an allowed domain from

The domain name that the API key can no longer be used from

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/ApiKey/{key}/allowedIPs

Gets the client IP addresses that an API key can be used from

Parameters

The unique identifier of the API key to get the list of allowed domain names for

Responses

A list of the client IP addresses that the API key can be used from

Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
POST

/ApiKey/{key}/allowedIPs

Adds an IP address to the list that an API key can be used from

This method will attempt to parse network as an IP address range. In this case it should be presented as a CIDR string, e.g. 123.45.6.0/24

If this is used in a key that will be used from client-side code, all clients must connect from an allowed IP address. In most cases this list will be left blank for client-side keys to allow users to connect from any IP address, but would be tightly restricted for server-side keys where all requests are made from a small range of known IP addresses.

Parameters

The unique identifier of the API key to add an allowed domain to

Request Body

The subnet that the API key can be used from

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
PUT

/ApiKey/{key}/allowedIPs/{network}/{prefix}

Adds an IP address to the list that an API key can be used from

If prefix is left blank, this method will attempt to parse network as an IP address range. In this case it should be presented as a CIDR string, e.g. 123.45.6.0/24

If this is used in a key that will be used from client-side code, all clients must connect from an allowed IP address. In most cases this list will be left blank for client-side keys to allow users to connect from any IP address, but would be tightly restricted for server-side keys where all requests are made from a small range of known IP addresses.

Parameters

The unique identifier of the API key to add an allowed domain to

The subnet that the API key can be used from

The number of bits of the network that identifies the allowed subnet

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
DELETE

/ApiKey/{key}/allowedIPs/{network}/{prefix}

Removes an IP address from the list that an API key can be used from

Parameters

The unique identifier of the API key to remove an allowed domain from

The subnet that the API key can no longer be used from

The number of bytes of the network that identifies the allowed subnet

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/ApiKey/{key}/allowedServices

Gets the services that an API key can be used to access

Parameters

The unique identifier of the API key to get the list of allowed domain names for

Responses

A list of the service names that the API key can be used to access

Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
PUT

/ApiKey/{key}/allowedServices/{service}

Adds a service to the list that an API key can be used to access

Parameters

The unique identifier of the API key to add an allowed domain to

The name of the service that the API key can be used to access

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
DELETE

/ApiKey/{key}/allowedServices/{service}

Removes a service from the list that an API key can be used to access

Parameters

The unique identifier of the API key to remove an allowed domain from

The name of the service that the API key can no longer be used to access

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/ClientCredential

Retrieves the existing list of client credentials available for this user

The list of associated secrets is not included in the response from this endpoint, and will always be null.

Parameters
No parameters.
Responses

A list of available client credentials

POST

/ClientCredential

Creates a new client credential

The ClientId field of the credential will be auto-generated by this API and should not be supplied in the input. The generated ClientId will be included in the output.

Secrets should not be supplied in the input.

Parameters
No parameters.
Request Body

The details of the client credential to create

Example Value Schema
{
	"clientId": "string",
	"description": "string",
	"secrets": []
}
Responses

The details of the created client credential

Example Value Schema
{
	"clientId": "string",
	"description": "string",
	"secrets": []
}
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/ClientCredential/{id}

Gets the details of a client credential

Parameters

The unique identifier of the client credential to get the details of

Responses

The details of the requested client credential

Example Value Schema
{
	"clientId": "string",
	"description": "string",
	"secrets": []
}
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
PUT

/ClientCredential/{id}

Updates the details of a client credential

Parameters

The unique identifier of the client credential to update the details of

Request Body

The information to change on the client credential

Example Value Schema
{
	"clientId": "string",
	"description": "string",
	"secrets": []
}
Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
DELETE

/ClientCredential/{id}

Deletes a client credential

Parameters

The unique identifier of the client credential to delete

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/ClientCredential/{id}/secrets

Gets the list of client secrets associated with a client credential

Parameters

The unique identifier of the client credential to get the list of client secrets for

Responses

A list of client secrets associated with a client credential

Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
POST

/ClientCredential/{id}/secrets

Creates a new client secret for a client credential

The secret value will be auto-generated by this API and included in the return value. It will not be available again, so please note it as soon as it is returned.

Parameters

The unique identifier of the client credential to create a new client secret for

Request Body

The details of the client secret to create

Example Value Schema
{
	"clientSecretId": 0,
	"expires": "1970-01-01T00:00:00.0000000Z",
	"description": "string"
}
Responses

The details of the client secret that has been created

Example Value Schema
{
	"clientSecretId": 0,
	"expires": "1970-01-01T00:00:00.0000000Z",
	"description": "string",
	"clientSecretId": 0,
	"generatedSecret": "string"
}
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/ClientCredential/{id}/secrets/{secretId}

Gets the details of a client secret

Parameters

The unique identifier of the client credential to get the client secret for

The unique identifier of the client secret to get the details for

Responses

The details of the requested client secret

Example Value Schema
{
	"clientSecretId": 0,
	"expires": "1970-01-01T00:00:00.0000000Z",
	"description": "string"
}
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
PUT

/ClientCredential/{id}/secrets/{secretId}

Updates the details of a client secret

Parameters

The unique identifier of the client credential to update the client secret for

The unique identifier of the client secret to update the details for

Request Body

The details of the client secret to update

Example Value Schema
{
	"clientSecretId": 0,
	"expires": "1970-01-01T00:00:00.0000000Z",
	"description": "string"
}
Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
DELETE

/ClientCredential/{id}/secrets/{secretId}

Deletes a client secret

Parameters

The unique identifier of the client credential to delete the client secret from

The unique identifier of the client secret to delete

Responses
Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/Dataset

Retrieves a list of datasets

Gets a paged list of datasets. Up to 100 datasets will be included in each page, and datasets will be sorted alphabetically. Only the basic details of each dataset are included. Use the GET /Dataset/{name} endpoint to retrieve the full details of a dataset. The URL of that endpoint is included in the Location property of the returned objects.

Parameters

The number of the page to retrieve. The first page is number 1.

Indicates whether input datasets should be included in the result

Indicates whether output datasets should be included in the result

Responses

A list of DatasetRef objects that list the basic details of each matching dataset

Some validation error occurred

Example Value Schema
{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}
POST

/Dataset

Creates a new dataset

Each dataset must have a unique name. The name must start with a character a-z and can only include the characters a-z, 0-9 or _.

Datasets are either input or output datasets. Only input datasets can be created using this endpoint. Output datasets are created automatically as required when jobs are started.

Each input dataset must have its columns specified when it is created. Each column has a name, following the same naming requirements as above for the dataset, and a type.

When the dataset is created it is marked as incomplete. Add data to the dataset using the PATCH /Dataset/{name}/data endpoint. Once all data has been added, use the PUT /Dataset/{name} endpoint to mark the dataset as complete before using it as input to a job.

The full details of the dataset can be retrieved at any time using the GET endpoint.

Parameters
No parameters.
Request Body

The details of the dataset to create

Example Value Schema
{
	"name": "sample_dataset_1",
	"columns": {
	  "firstname": 0,
	  "lastname": 0,
	  "priority": 1
	}
}
Responses

The dataset has been created

Example Value Schema
{
	"name": "sample_dataset_1",
	"columns": {
	  "firstname": 0,
	  "lastname": 0,
	  "priority": 1
	},
	"input": true,
	"recordCount": 0,
	"completed": true
}

Some validation error occurred in the dataset details

Example Value Schema
{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}

Another dataset with the same name already exists

Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}
GET

/Dataset/{name}

Gets the details of a named dataset

Parameters

The name of the dataset to get the details of

Responses

No dataset could be found with the supplied name

The request has succeeded and the details of the requested dataset have been returned

Example Value Schema
{
	"name": "sample_dataset_1",
	"columns": {
	  "firstname": 0,
	  "lastname": 0,
	  "priority": 1
	},
	"input": true,
	"recordCount": 0,
	"completed": true
}
PUT

/Dataset/{name}

Updates a dataset

Use this endpoint to mark an input dataset as complete. Once it is complete it can be used as input to a job.

Only an incomplete input dataset can be updated. If the dataset is an output dataset, or if it has already been completed, a 400 error will be returned.

Parameters

The name of the dataset to update

Request Body

The properties of the dataset to update

Example Value Schema
{
	"recordCount": 0,
	"completed": true
}
Responses

The dataset has been updated successfully

A validation error has occurred

Example Value Schema
{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}

The requested dataset does not exist

DELETE

/Dataset/{name}

Deletes a dataset

Use this endpoint to permanently delete a dataset. All the data in the dataset will be removed, and the dataset will not appear in the GET /Dataset endpoint.

Some basic metadata of the dataset will be retained, including the name, when it was created and deleted and by whom. Because this metadata is kept the name cannot be reused for any future datasets.

Parameters

The name of the dataset to delete

Responses

The dataset has been deleted successfully

The requested dataset does not exist

PATCH

/Dataset/{name}/data

Uploads records to an input dataset

The dataset given by the name parameter must be an incomplete input dataset. If the dataset has already been completed, or if it is an output dataset, a 400 error will be produced.

Multiple uploads can be made to the same dataset. Each block of records must be identified by a unique block number. If the same block number is used for multiple uploads to the same dataset, the previously uploaded records will be overwritten.

Depending on the job the records will be submitted to, the order of the records may be important. Records will be processed in ascending block number order, then the order in which they were submitted within that block.

Split large uploads into multiple blocks. Up to 1,000 records can be supplied in each block.

Parameters

The name of the dataset to upload the records to

Request Body

The details of the records to upload

Example Value Schema
{
	"blockNumber": 0,
	"records": []
}
Responses

The data has been uploaded to the dataset correctly

A validation error in the record block has occurred

Example Value Schema
{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}

The requested dataset does not exist

GET

/Dataset/{name}/data

Downloads data from an output dataset.

The dataset given by the name parameter must be a completed output dataset. If the dataset is an input dataset, or if it is incomplete, a 400 error is returned.

Up to 1,000 records can be downloaded at a time. Use the GET /Dataset/{name} endpoint to retrieve the total number of records, and make multiple requests to get the records in blocks of up to 1,000.

The starting point of each block of records is given by the start parameter. The first record in a dataset is number 1.

Parameters

The name of the dataset to download the records from

The number of the first record to download. The first record in a dataset is number 1

The total number of records to download

Responses

The requested dataset does not exist

Example Value Schema
{
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string"
}

The request has succeeded and the requested records have been returned

A validation error has occurred

Example Value Schema
{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}
GET

/Job

Retrieves a list of jobs that have already been submitted

Use this endpoint to retrieve a list of jobs that have previously been submitted.

The jobs are split into pages of 100, with the most recent jobs first. Use the page parameter to move through later pages, or the workflow, startDate and endDate parameters to refine the list.

Only the basic details of each job are included in the list. Use the GET /Job/{name} endpoint to get the full details of a particular job.

Parameters

The number of the page to retrieve. The first page is 1

The name of the workflow to filter the jobs by

The earliest date of the job to filter by

The latest date of the job to filter by

Responses

The list of jobs has been retrieved successfully

Some validation error occurred

Example Value Schema
{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}
POST

/Job

Starts a new job

The details to be passed to this endpoint will vary depending on how your workflow has been configured. Full details of what each workflow is expecting in terms of input files, datasets and parameters will be agreed with you by the Data8 Production Team.

Once the job has been submitted it can be monitored by polling the GET /Job/{name} endpoint.

Each job must have a unique name. The job name cannot contain characters which are invalid in file names such as /, :, *, ?, ", <, >, |.

Parameters
No parameters.
Request Body

The details of the job to create

Example Value Schema
{
	"name": "Contact Deceased Check 3124",
	"workflowName": "ContactDeceasedCheck",
	"inputFilename": "/ToData8/contact.csv",
	"inputDatasets": {
	  "Contacts": "my_contact_dataset_1",
	  "Accounts": "my_account_dataset_312"
	},
	"parameters": {
	  "MaxContactAge": "5"
	}
}
Responses

The job has been started succesfully

A validation error has occurred

Example Value Schema
{
	"errors": {},
	"type": "string",
	"title": "string",
	"status": 0,
	"detail": "string",
	"instance": "string",
	"errors": {}
}
GET

/Job/{name}

Retrieves the full details of a job

Use this endpoint to get the full details of a job. In particular you can poll this to track the progress of a job and wait until it has completed before attempting to retrieve the results.

Parameters

The name of the job to retrieve

Responses

The details of the job has been retrieved successfully

Example Value Schema
{
	"name": "string",
	"workflowName": "string",
	"submittedAt": "1970-01-01T00:00:00.0000000Z",
	"completedAt": "1970-01-01T00:00:00.0000000Z",
	"status": 0,
	"percentComplete": 0,
	"estimatedFinishTime": "1970-01-01T00:00:00.0000000Z",
	"statistics": {},
	"inputDatasets": {},
	"outputDatasets": {},
	"parameters": {},
	"inputFilename": "string",
	"outputFilenames": {}
}

The requested job name does not exist

An error has occurred. This application may no longer respond until reloaded. Reload 🗙