FAQ – TopBraid Tagger and AutoClassifier

Business FAQ

Discover how Tagger with AutoClassifier can improve your information management.

How can I get started with AutoClassifier?

You can start by importing unstructured content you want to classify into EDG, selecting a vocabulary with concepts that will be used to classify this content, and providing training data AutoClassifier will learn on (either by manually tagging some resources, or by importing existing training data where available).

You can learn more by reading introductory blog posts about AutoClassifier here and here, or with an informative webinar featuring a demonstration.

Customers can also utilize our Jumpstart Program as a way to get started. For more information about this program and to discuss evaluation licenses, contact us at sales@topquadrant.com.

What professional services are available for AutoClassifier?

Every project and organization has different needs. TopQuadrant provides various flexible services related to unstructured content management goals, including:

  • Content and tags sets lifecycle management;
  • Classification strategy definition;
  • Modeling services for vocabularies that act as AutoClassifier’s sources of classes;
  • Integration of the automated classification into a larger workflow.

Technical FAQ

Get answers to common technical questions.

What formats and unstructured data sources are supported by AutoClassifier?

AutoClassifier and Tagger are built on the TopBraid EDG platform, and can make use of data, text and documents in any format that can be imported into TopBraid EDG. This includes any format that can be converted to RDF, and data sources such as REST APIs. TopQuadrant offers TopBraid Composer as a powerful development environment for such conversions and integrations, and offers training and professional services to support the process. Examples of formats that have been successfully used are XML, PDF, Microsoft.

What APIs are available?

The main AutoClassifier functionalities can be accessed through SPARQL functions in a SPARQL Endpoint or specific RESTful web services described in the product documentation.

How does authentication works with AutoClassifier?

AutoClassifier internally uses a web service packaged as a separate web application. This web application can be deployed either on the same application server as EDG, or a different one. The application server’s security features may be used to lock down access to the service. EDG can be configured to use a username and password when communicating with the service.

What operating systems requirements apply to AutoClassifier?

AutoClassifier makes use of Maui Server for its machine learning capabilities, which itself can run on any environment supporting EDG.

Using TopBraid EDG AutoClassifier

Learn the ins and outs of AutoClassifier.

How can I evaluate the relevance of tags proposed by AutoClassifier?

Every tag proposed by AutoClassifier has a confidence value expressed as a percentage. Low-confidence tags can be discarded automatically, and proposed tags can always be reviewed manually.

We want to automate classification of a repository of ___ documents (e.g. PDF or DOCX), how can we run AutoClassifier on it?

One approach is to import these documents into an RDF content graph that can afterwards be managed in EDG infrastructure. Another approach is to integrate the functionality into an existing content repository via the AutoClassifier API.

How large should training datasets be?

The size of training datasets doesn’t have to increase in proportion with the size of the documents corpus you want to classify. Their quality matters, i.e. training documents should be tagged with concepts of which labelling forms do occur in the document, but you don’t need an exhaustive set of training documents for all available concepts before each of them can be assigned by the auto-classification process. More on training datasets can be found here.

What languages are supported for the unstructured content?

AutoClassifier relies on an algorithm that requires a language-specific stemmer and stopword lists. AutoClassifier currently includes these for English, French, German and Spanish, and we may add more if demand exists. The rest of the tool is language-agnostic. The quality of results should be fairly independent of the language.

How long does AutoClassifier needs to process a documents set?

This depends on many parameters such as:

 

  • the amount of relevant textual content in each document,
  • the number of documents,
  • the number of available concepts,
  • the hardware EVN runs on.

 

For example, in one application, auto-classifying a set of 1000 short documents against a thesaurus of 26000 entries took 90 seconds on a laptop. Training AutoClassifier for this application with 100 training documents took less than 10 seconds.

What operating systems requirements apply to AutoClassifier?

AutoClassifier makes use of Maui Server for its machine learning capabilities, which itself can run on any environment supporting EDG.

How does authentication work with AutoClassifier?

AutoClassifier internally uses a web service packaged as a separate web application. This web application can be deployed either on the same application server as EDG or a different one. The application server’s security features may be used to lock down access to the service. EDG can be configured to use a username and password when communicating with the service.

Ready to get started?

Get in touch today to learn how to improve semantic data governance for your enterprise.