Plugin

Spark SQL is designed to support structured data processing and is an important means to standardize and unify data processing. Datasets, independent of their origin and format, once transformed into an Apache Spark Dataset can be explored leveraging SQL queries.

ApplySQL

Integrating Spark SQL with CDAP pipeline technology, makes SQL queries available to explore both historical datasets and real-time event streams.

@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("ApplySQL")
@Description("A transformation stage that uses an Apache Spark SQL based data exploration engine "
	+ "to aggregate, group and filter pipeline data records.")
public class ApplySQL extends SparkCompute<StructuredRecord, StructuredRecord> {

    ...

}

Parameters

SQL Statement Please provide an Apache Spark compliant SQL statement to filter, group or aggregate the data records that have been published by your previous pipeline stage.

Integration