Category Trees

The Category Tree (CT) specifies the possible label values for all model types except Pattern Models (which require only a pattern string).  The Category Tree is a hierarchical set of label nodes, where the top node (the root) contains the model name and label values are assigned from the leaf nodes.  For example, the category tree for a basic Sentiment label model. 

When this model is executed on a set of documents, each document will be assigned to one or more label values, as shown below.   Depending on model design, some documents may be assigned a 'No label' value.

Sentiment is an example of a single-value model, which assigns only one label to each document.   For example, it would make no sense to assign both 'Positive' and 'Negative' labels to a document just because it contains both positive and negative words (though some other sentiment labeling systems do exactly that!).  Instead the model must choose the best choice among multiple matches.  ACRE Rule models use rule weights to make this decision, choosing the label with the highest summed weight over all rule matches.  By default, this will choose the label with the most matches, but this behavior can be changed if the modeler modifies the default rule weights.  For ML models, the label with the highest Confidence level is chosen.

On the other hand, for some models, assigning multiple labels is the right thing to do.  The figure above shows a tree for categorizing restaurant survey responses based on whether words related to Food, Service or Drinks are mentioned in the response.  In this case, a survey response that discusses both food and service should be assigned both labels.   The modeler sets option Multi_Values=YES to enable this multi-labeling.  In this case, for rules models, all matched rule labels will be assigned and, for ML models, all labels with Confidence > Threshold (where Threshold is another configurable model parameter) will be assigned.

Category trees can have any number of levels.  This figure shows an extended Sentiment tree that splits the 'Negative' node into three sub-nodes.  A document that would have been labeled 'Negative' in the basic model will now be labeled 'Threat' if it contains threatening or violent words, 'Response Required' if it contains demands or high emotional content, and 'Other Negative' otherwise.  

This extended Sentiment tree would provide added value to an organization that uses it to monitor customer feedback and wants to respond quickly when appropriate. 

Organizations with existing hierarchical taxonomies, such as organizational charts or directory trees, can import these structures as Category Trees and create ACRE models that map incoming documents onto them.  Mapping documents onto an organizational chart containing e-mail addresses, for example, allows the associated model to automatically send categorized documents by e-mail to the selected individual.

Increasing the number of nodes in the category tree also increases the set of analytic results available after model execution.   Each tree node represents a different set of labeled documents (i.e., the documents assigned at or below that node), and all model performance results, including label tables, summary tables, word frequency tables and word clouds, can be generated separately for each node in the tree.