Named Entity Recognition (NER) is the automated process of detecting person, location, organization names (and more) from a text corpus. The respective names are learned from a training corpus.

Even if the detection of person, location or organization names is the initial and main objective, it is not restricted to this use case. It is a just matter of imagination what “name”, “document”, “sentence” or “word” means in a certain context.

Corpus

coming soon

Plugins

NERBuilder

@Plugin(type = SparkSink.PLUGIN_TYPE)
@Name("NERBuilder")
@Description("A building stage for an Apache Spark-NLP based NER (CRF) model.")
public class NERBuilder extends TextSink {

    ...

}

Parameters

Model Name The unique name of the NER (CRF) model.
Embedding Name The unique name of the trained Word2Vec embedding model.
Corpus Field The name of the field in the input schema that contains the labeled tokens.
Model Configuration
Minimum Epochs Minimum number of epochs to train. Default is 10.
Maximum Epochs Maximum number of epochs to train. Default is 1000.

NERTagger

@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("NERTagger")
@Description("A tagging stage that leverages a trained Word2Vec model and NER (CRF) model "
  + "to map an input text field onto an output token & entities field.")
public class NERTagger extends TextCompute {

    ...

}

Parameters

Model Name The unique name of the NER (CRF) model.
Embedding Name The unique name of the trained Word2Vec embedding model.
Text Field The name of the field in the input schema that contains the text document.
Token Field The name of the field in the output schema that contains the extracted tokens.
Entities Field The name of the field in the output schema that contains the extracted entities.
Normalization The indicator to determine whether token normalization has to be applied. Normalization restricts the characters of a token to [A-Za-z0-9-]. Supported values are 'true' and 'false'. Default is 'true'.

NERRelation

@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("NERRelation")
@Description("A transformation stage that leverages the result of an NER tagging stage and "
  + "extracts relations between named entities from their co-occurring.")
public class NERRelation extends TextCompute {

    ...

}

Parameters

Token Field The name of the field in the input schema that contains the extracted tokens.
Entities Field The name of the field in the input schema that contains the extracted named entity tags.