Plugins

DateMatcher

This plugin recognizes the following kinds of date formats and transforms them into a user-defined output format:

  • 1978-01-28
  • 1984/04/02
  • 1/02/1980
  • 2/28/79
  • The 31st of April in the year 2008
  • Fri, 21 Nov 1997
  • Jan 21, ‘97
  • Sun, Nov 21
  • jan 1st
  • next thursday
  • last wednesday
  • today
  • tomorrow
  • yesterday
  • next week
  • next month
  • next year
  • day after
  • the day before
  • 0600h
  • 06:00 hours
  • 6pm
  • 5:30 a.m.
  • at 5
  • 12:59
  • 23:59
  • 1988/11/23 6pm
  • next week at 7.30
  • 5 am tomorrow
@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("DateMatcher")
@Description("A transformation stage that reads different forms of date and time expressions "
	+ "and converts them to a provided date format. This stage transforms each text document "
	+ "into a list of sentences where each detected date and time expression is replaced by "
	+ "the provided format. As an alternative, the list of detected date and time expressions "
	+ "is returned.")
public class DateMatcher extends TextCompute {

    ...

}    

Parameters

Text Field The name of the field in the input schema that contains the text document.
Date Field The name of the field in the output schema that contains the text matches.
Date Format The expected output date format. Default is 'yyyy/MM/dd'.
Output Option An option to determine how to format the output of the date matcher. Supported values are 'extract' and 'replace'. Default is 'replace'.

PhraseMatcher

@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("PhraseMatcher")
@Description("A transformation stage that leverages the Spark NLP Text Matcher to detected provided ."
	+ "phrases in the input text document.")
public class PhraseMatcher extends TextCompute {

		...

}

Parameters

The delimiter used to separate the different text phrases.
Text Field The name of the field in the input schema that contains the text document.
Phrase Field The name of the field in the output schema that contains the text matches.
Phrases A delimiter separated list of text phrases.
Phrase Delimiter

RegexMatcher

@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("RegexMatcher")
@Description("A transformation stage that leverages the Spark NLP Regex Matcher to detected provided ."
		+ "Regex rules in the input text document.")
public class RegexMatcher extends TextCompute {

		...

}

Parameters

The delimiter used to separate the different Regex rules.
Text Field The name of the field in the input schema that contains the text document.
Regex Field The name of the field in the output schema that contains the text matches.
Regex Rules A delimiter separated list of Regex rules.
Rule Delimiter