initiate User Jobs, which execute in the
background. Each Job can be executed locally or can use
cloud computing services, providing a scalable solution. There
are 4 job
Performs ETL (extract, transform, load) tasks to bring external data onto the server, parse and convert. May split text into smaller units (phrase, sentence, etc.).
Inputs: An external data
source and one Data Source Definition (DSD) entry, which
defines load parameters.
Outputs: Extracted Text items, which store data fields for each document.
ACRE uses the Natural
Language Toolkit as well as customized code. NLP
Stemming - Convert words to their
+ Stop List - List of words to be eliminated from analysis.
+ Go List - List of words to be exclusively used in analysis.
Marks words with part of speech (i.e., noun, verb, adverb, adjective,
+ Named Entity Identification - Marks named entities.
+ Min/Max Counts - Minimum / maximum word counts to be used in analysis
- List of alternate words/abbreviations/spellings equivalent to a given
Bi-Grams - Includes
2-word phrases in default analysis.
Inputs: Extracted data and NLP Profile, which specifies the set of NLP functions required.
Outputs: A Term Vector for each document, contains a list of term (possibly modified by stemming), count (number of occurrences), weight (configurable), and tags (POS / NE).
Trains a Machine Learning model by
updating its stored Trained Word Cloud (TWC).
Inputs: A label model; a set term vectors for training documents, with a reference label for each.
Outputs: The model's Trained Word Cloud will be updated.
Executes a label model.
Inputs: One label model
and one document set of
extracted text or term vectors
Outputs: (a) one or more labels are assigned to each document, (b) summary label counts calculated for each node of Category Tree (CT) (c) term frequency tables and word clouds available for each document and for every node in CT.