Text Matching

Plugins

DateMatcher

This plugin recognizes the following kinds of date formats and transforms them into a user-defined output format:

1978-01-28
1984/04/02
1/02/1980
2/28/79
The 31st of April in the year 2008
Fri, 21 Nov 1997
Jan 21, ‘97
Sun, Nov 21
jan 1st
next thursday
last wednesday
today
tomorrow
yesterday
next week
next month
next year
day after
the day before
0600h
06:00 hours
6pm
5:30 a.m.
at 5
12:59
23:59
1988/11/23 6pm
next week at 7.30
5 am tomorrow

@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("DateMatcher")
@Description("A transformation stage that reads different forms of date and time expressions "
	+ "and converts them to a provided date format. This stage transforms each text document "
	+ "into a list of sentences where each detected date and time expression is replaced by "
	+ "the provided format. As an alternative, the list of detected date and time expressions "
	+ "is returned.")
public class DateMatcher extends TextCompute {

    ...

}    

Parameters

Text Field	The name of the field in the input schema that contains the text document.
Date Field	The name of the field in the output schema that contains the text matches.
Date Format	The expected output date format. Default is 'yyyy/MM/dd'.
Output Option	An option to determine how to format the output of the date matcher. Supported values are 'extract' and 'replace'. Default is 'replace'.

PhraseMatcher

@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("PhraseMatcher")
@Description("A transformation stage that leverages the Spark NLP Text Matcher to detected provided ."
	+ "phrases in the input text document.")
public class PhraseMatcher extends TextCompute {

		...

}

Parameters

The delimiter used to separate the different text phrases.

Text Field	The name of the field in the input schema that contains the text document.
Phrase Field	The name of the field in the output schema that contains the text matches.
Phrases	A delimiter separated list of text phrases.
Phrase Delimiter

RegexMatcher

@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("RegexMatcher")
@Description("A transformation stage that leverages the Spark NLP Regex Matcher to detected provided ."
		+ "Regex rules in the input text document.")
public class RegexMatcher extends TextCompute {

		...

}

Parameters

The delimiter used to separate the different Regex rules.

Text Field	The name of the field in the input schema that contains the text document.
Regex Field	The name of the field in the output schema that contains the text matches.
Regex Rules	A delimiter separated list of Regex rules.
Rule Delimiter

Table of Content

Plugins