Machine Learning Tasks
A Machine Learning Task takes any Term Vector column as input and assigns one or more Tags to each term vector item. It uses an ML Model object, which must have previously been trained using a Train Agent.
See help: Machine Learning Overview, ML Model, Train Agent
When the ML Task is executed, it calculates a Confidence value for each Tag based on the similarity between the input Term Vector and the Trained Word Cloud for that Tag, as stored in the ML Model. There are multiple ML Parameters associated with the ML Model that specify how the Confidence calculation is done. ML Tasks can generate up to 2 columns:
· Tag (text): the Tags selected for the input Term Vector
· Description (text): (optional) a Description of the Tag
· Confidence (double): the sum of Scores for all Rules matched for each Tag
No Tags are displayed if Confidence = 0 for all Tags. Otherwise, the number of Tags displayed is determined by the Tag Display Options, which include:
· Multi-Value (boolean): if False, then only the single Tag with highest Confidence is displayed.
· Threshold (double): if set, then only Tags whose Confidence exceeds Threshold are displayed
In the figure above, ML Task “LOINC” has been executed on a Term Vector column (not shown) generated from input column “Exam Description” to select a LOINC Code Tag. Results are shown in columns LOINC[tag], LOINC[desc], and LOINC[conf].
To create a Machine Learning Task, the prerequisite objects are:
· Template – the new Task will be added to this.
· Input Column – a Term Vector column in the Template
· ML Model – after training, this stores a Trained Word Cloud for each Tag. (ML Model help, Tagset help)
Results from the ML Task can be viewed in Data Views or by selecting Tasks or Columns on the Datasets page and then View or Summary.
Creating a Machine Learning Task
Click Pipelines > Tasks then Create Task and select Type “ML”:
· Name for the ML Task and a Description, if needed.
· Template – Choose an existing Template.
· ML Model – Choose an existing ML Model which has previously been trained
· Term Vector Column – Choose input Term Vector column in Template
· Change Results Displayed – click to view Tag display options:
o Threshold (double): if set, then only Tags whose Confidence exceeds Threshold are displayed
o Multi-Value (boolean): if unchecked, then only the single Tag with highest Confidence is displayed.
o View Nulls (boolean): if unchecked, then results where Confidence=0 are left out.
· Pre Search – if checked, then Confidence is calculated only for Tags whose TWC contains at least 1 term from the input Term Vector. This can greatly speed evaluation for large Tagsets
See Task Help for other Task parameters.