Named Entity Recognition (NER) is the automated process of detecting person, location, organization names (and more) from a text corpus. The respective names are learned from a training corpus.
Even if the detection of person, location or organization names is the initial and main objective, it is not restricted to this use case. It is a just matter of imagination what “name”, “document”, “sentence” or “word” means in a certain context.
Corpus
coming soon
Plugins
NERBuilder
@Plugin(type = SparkSink.PLUGIN_TYPE)
@Name("NERBuilder")
@Description("A building stage for an Apache Spark-NLP based NER (CRF) model.")
public class NERBuilder extends TextSink {
...
}
Parameters
Model Name | The unique name of the NER (CRF) model. |
Embedding Name | The unique name of the trained Word2Vec embedding model. |
Corpus Field | The name of the field in the input schema that contains the labeled tokens. |
Model Configuration | |
Minimum Epochs | Minimum number of epochs to train. Default is 10. |
Maximum Epochs | Maximum number of epochs to train. Default is 1000. |
NERTagger
@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("NERTagger")
@Description("A tagging stage that leverages a trained Word2Vec model and NER (CRF) model "
+ "to map an input text field onto an output token & entities field.")
public class NERTagger extends TextCompute {
...
}
Parameters
Model Name | The unique name of the NER (CRF) model. |
Embedding Name | The unique name of the trained Word2Vec embedding model. |
Text Field | The name of the field in the input schema that contains the text document. |
Token Field | The name of the field in the output schema that contains the extracted tokens. |
Entities Field | The name of the field in the output schema that contains the extracted entities. |
Normalization | The indicator to determine whether token normalization has to be applied. Normalization restricts the characters of a token to [A-Za-z0-9-]. Supported values are 'true' and 'false'. Default is 'true'. |
NERRelation
@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("NERRelation")
@Description("A transformation stage that leverages the result of an NER tagging stage and "
+ "extracts relations between named entities from their co-occurring.")
public class NERRelation extends TextCompute {
...
}
Parameters
Token Field | The name of the field in the input schema that contains the extracted tokens. |
Entities Field | The name of the field in the input schema that contains the extracted named entity tags. |