How to Use Your Data Cleaning APIs

Estimated reading time: 2 min

Structure

To top

This assumes you have created and filled out a taxonomy.

Your Data Cleaning API, as well as your Taxonomy API and Entity API, is automatically created when you initially create a taxonomy. The Data Cleaning API is used specifically for cleaning source data, through either parsing or replacing target entities. It can also be accessed through the Classr for Sheets Google add on.

By default your API is private, not publicly visible to others. It is secured using modern best practices and resides on AWS (Amazon Web Services), supporting high up time and reliability.

First, you will notice that there is a unique group of endpoints, GET calls, created for each taxonomy level. See an example below, of a College Football Team API that has two entities: “University” and “Team Name”.

 

Cleaning Method

To top

There are two ways for you to clean your data using the API:

  • Replace: this replaces entities in source content and returns the total, cleaned source content
  • Parse:  this identifies and “parses” entities in source content and returns just the parsed, clean entities

 

API URL Structure

To top

The call structure for the Data Cleaning API has two formats, one for Replace and one for Parse.

An example of a call:

https://api.classr.io/v1/CfKR7tULOb/entity/taxonomylevelname/replace

CfKR7tULOb: this is the automatically generated hashcode, unique to your account

“taxonomylevelname”: this is representative of the taxonomy level’s name. It is a transformation of your taxonomy level’s name: lowercase and with spaces removed.

“replace”: use replace as an identifier to denote the data cleaning method. Instead of replace, use “parse” in the call format to implement a parse process. Super easy!

 

Response Body Format

To top

The response body for the Replace and Parse processes are very similar.

First, the Replace call response. In the JSON body, supply the data to be cleaned. As we’re using a College Football API, let’s use the source “Duke football takes place in North Carolina.” text and pass it through the Replace data cleaning process to return the following:

{
“dataCleanType”: “replace”,
“dataResults”: “Duke University football takes place in North Carolina.”,
“details”: [
{
“university”: “Duke University”,
“universityId”: 55,
“state”: “North Carolina”,
“stateId”: 14,
“conference”: “ACC”,
“conferenceId”: 2
}
]
}

Note the format of the response body:

  • dataCleanType: this is either “replace” or “parse”
  • dataResults: this is the cleaned, source content. Note, this is unique to the Replace process.
  • details: this is taxonomy information for the entity.

 

Then, let’s review the Parse call response. In the JSON body, supply the data to be cleaned and pass it through the Parse data cleaning process to return the following:

{
“dataCleanType”: “parse”,
“details”: [
{
“university”: “Duke University”,
“universityId”: 55,
“state”: “North Carolina”,
“stateId”: 14,
“conference”: “ACC”,
“conferenceId”: 2
}
]
}

Note the format of the response body:

  • dataCleanType: this is either “replace” or “parse”
  • details: this is taxonomy information for the entity.

 

When is my Data Cleaning API created?
The moment you finish your taxonomy process, the API is automatically created.

How long will it take for API calls to show up in my usage count (on your home dashboard)?
Around a minute.

 

Was this article helpful?
Dislike 0
Views: 1741