Objective
Google CDAP offers many connector plugins to integrate with data stores, streaming platforms and cloud applications.
PredictiveWorks. complements CDAP’s built-in data connectors with purpose-built plugins, designed to support Customer, User & Entity Behavior Analytics (CUEBA). The main focus is on (but not limited to) low latency data stores and real-time streaming platforms:
Data Streaming
Eclipse Ditto
Eclipse Ditto
Eclipse Ditto is an IoT technology that implements the "digital twin" pattern. It is an abstraction layer that unifies access to millions of physical devices. PredictiveWorks. connects to Eclipse Ditto via its web socket interface and exposes twin events and live messages as an Apache Spark real-time stream. Eclipse Ditto is integrated into IoT platforms such as Aloxy IIOT Hub and Bosch IoT Suite. Use Cases(Industrial) Internet-of-Things |
@Plugin(type = StreamingSource.PLUGIN_TYPE)
@Name("DittoSource")
@Description("An Eclipse Ditto IoT streaming source that supports "
+ "real-time websocket events from a digital twin service.")
public class DittoStreamSource extends StreamingSource<StructuredRecord>{
...
}
Eclipse Paho
Eclipse Paho
Eclipse Paho is an open-source MQTT 3.1.1 and MQTT-SN client. PredictiveWorks. exposes MQTT broker events as an Apache Spark real-time stream. The current implementation of this plugin supports JSON schema inference for MQTT streams (default), and explicitly transforms uplink messages from The Things Network server. Use Cases(Industrial) Internet-of-Things |
@Plugin(type = StreamingSource.PLUGIN_TYPE)
@Name("MQTTSource")
@Description("An MQTT streaming source that listens to an MQTT 3.1.1 broker "
+ "and subscribes to a given topic.")
public class MqttSource extends StreamingSource<StructuredRecord> {
...
}
HiveMQ
HiveMQ
HiveMQ is an Enterprise-ready MQTT Broker that fully supports MQTT 3.x and MQTT 5. PredictiveWorks. exposes HiveMQ broker events as Apache Spark real-time stream. The current implementation also supports publishing of device-specific anomalies, forecasts, predictions and other behavior analytics events and thereby seamlessly integrates machine intelligence into an (Industrial) IoT environment. Use Cases(Industrial) Internet-of-Things |
@Plugin(type = StreamingSource.PLUGIN_TYPE)
@Name("HiveMQSource")
@Description("An MQTT streaming source that listens to a HiveMQ MQTT broker "
+ "and subscribes to a given topic.")
public class HiveMQSource extends StreamingSource<StructuredRecord> {
...
}
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("HiveMQSink")
@Description("Batch sink plugin to send messages to a HiveMQ MQTT server.")
public class HiveMQSink extends BatchSink<StructuredRecord, Void, Void> {
...
}
Kafka
Apache Kafka
This Kafka connector is an easy-to-use alternative for Google CDAP's plugin. PredictiveWorks. connector comes with automated schema inference. Users no longer have to explicitly specify the data format of a real-time event stream. Use CasesCyber Defense, Internet-of-Things and more |
@Plugin(type = StreamingSource.PLUGIN_TYPE)
@Name("KafkaSource")
@Description("A real-time connector plugin to consume events from Apache Kafka topics "
+ "and transform them into structured pipeline records.")
public class KafkaSource extends StreamingSource<StructuredRecord> {
...
}
Kolide
Kolide Fleet
Kolide Fleet is an open-source Osquery Manager and extends Osquery's capabilities from a single machine to the entire fleet of endpoints. Osquery results sent to Kolide are currently forwarded to Google PubSub. PredictiveWorks. exposes osquery-based Google PubSub events as Apache Spark real-time stream. Use CasesCyber Defense |
@Plugin(type = StreamingSource.PLUGIN_TYPE)
@Name("KolideSource")
@Description("A Kolide streaming source to read osquery messages from Google PubSub.")
public class KolideSource extends StreamingSource<StructuredRecord> {
...
}
PubSub
Google PubSub
PubSub is Google's real-time messaging and streaming platform. PredictiveWorks. exposes Google PubSub events as Apache Spark real-time stream. Use CasesCyber Defense, Internet-of-Things and more |
@Plugin(type = StreamingSource.PLUGIN_TYPE)
@Name("PubSubSource")
@Description("A PubSub streaming source to read messages from Google PubSub.")
public class PubSubSource extends StreamingSource<StructuredRecord> {
...
}
Data Stores
PredictiveWorks. connector plugins were built with a strong focus on low latency data stores for Cyber Defense and Internet-of-Things use cases.
Aerospike
Aerospike
A Next-Generation NoSQL Data Platform for low latency solutions at extreme scale, with predictable performance at any scale and unmatched uptime and reliability. Use CasesAdvertising, Cyber Defense, Internet-of-Things |
@Plugin(type = BatchSource.PLUGIN_TYPE)
@Name("AerospikeSource")
@Description("A batch connector plugin to read data records from Aerospike namespaces and "
+ "sets and transform them into structured pipeline records.")
public class AerospikeSource extends BatchSource<AerospikeKey, AerospikeRecord, StructuredRecord> {
...
}
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("AerospikeSink")
@Description("A batch connector plugin to write structured pipelines records to "
+ "Aerospike namespaces and sets.")
public class AerospikeSink extends BatchSink<StructuredRecord, AerospikeKey, AerospikeRecord> {
...
}
Crate
Crate DB
Crate is a distributed SQL database at IoT-scale built on top of a NoSQL foundation. It combines the familiarity of SQL with the scalability and data flexibility of NoSQL. Use CasesCyber Defense, Internet-of-Things |
@Plugin(type = BatchSource.PLUGIN_TYPE)
@Name("CrateSource")
@Description("A batch connector plugin to read data rows from Crate tables "
+ "and transform them into structured pipeline records.")
public class CrateSource extends BatchSource<NullWritable, CrateWritable, StructuredRecord> {
...
}
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("CrateSink")
@Description("A batch connector plugin to write structured pipelines records to "
+ "Crate tables.")
public class CrateSink extends BatchSink<StructuredRecord, NullWritable, CrateWritable> {
...
}
Druid
Apache Druid
Apache Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. Druid leverages the columnar storage paradigm. Use CasesAdvertising, Manufacturing, Marketing, Network Telemetry |
@Plugin(type = BatchSource.PLUGIN_TYPE)
@Name("DruidSource")
@Description("A batch connector plugin to read data from Druid datasources "
+ "and transform them into structured pipeline records.")
public class DruidSource extends BatchSource<NullWritable, DruidWritable, StructuredRecord> {
...
}
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("CrateSink")
@Description("A batch connector plugin to write structured pipelines records to "
+ "Druid datasources.")
public class DruidSink extends BatchSink<StructuredRecord, NullWritable, DruidWritable> {
...
}
Elastic
Elasticsearch
This Elastic connector is an easy-to-use alternative for Google CDAP's plugin. PredictiveWorks. connector comes with automated schema inference. Users no longer have to explicitly specify the data format of Elastic documents. Use CasesCyber Defense, Internet-of-Things and more |
@Plugin(type = BatchSource.PLUGIN_TYPE)
@Name("ElasticSource")
@Description("A batch connector plugin to read documents from Elastic indices and mappings "
+ "and transform them into structured pipeline records.")
public class ElasticSource extends BatchSource<NullWritable, ElasticWritable, StructuredRecord> {
...
}
InfluxDB
InfluxDB
InfluxDB is an open-source time series database. It is optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics. Use CasesInternet-of-Things |
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("InfluxSink")
@Description("A batch connector to write data records to an InfluxDB time series database. "
+ "As structured data record is divided into numeric and other fields. Numeric field "
+ "values are transformed into Doubles and persisted as a measurement field. String "
+ "field values are persisted as tags, while all other fields are ignored.")
public class InfluxSink extends BatchSink<StructuredRecord, InfluxPointWritable, NullWritable> {
...
}
Ignite
Apache Ignite
Apache Ignite is an in-memory computing platform for transactional, analytical, and streaming workloads delivering in-memory speeds at petabyte scale. Use CasesCyber Defense, Internet-of-Things |
@Plugin(type = BatchSource.PLUGIN_TYPE)
@Name("IgniteSource")
@Description("A batch connector plugin to read data from Apache Ignite caches as BinaryObjects "
+ "and transform them into structured pipeline records.")
public class IgniteSource extends BatchSource<NullWritable, IgniteWritable, StructuredRecord> {
...
}
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("IgniteSink")
@Description("A batch connector plugin to write structured pipelines records as BinaryObjects to "
+ "Apache Ignite caches.")
public class IgniteSink extends BatchSink<StructuredRecord, NullWritable, IgniteWritable> {
...
}
SAP HANA
SAP HANA
SAP in-memory business data computing platform. Use CasesGeneral purpose |
@Plugin(type = BatchSource.PLUGIN_TYPE)
@Name("HANASource")
@Description("A batch connector plugin to read data from SAP HANA database "
+ "and transform them into structured pipeline records.")
public class HANASource extends BatchSource<NullWritable, HanaWritable, StructuredRecord> {
...
}
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("HANASink")
@Description("A batch connector plugin to write structured pipelines records to "
+ "SAP HANA database.")
public class HANASink extends BatchSink<StructuredRecord, NullWritable, HanaWritable> {
...
}
Data Warehouses
Panoply
Panoply Data Warehouse
Panoply is a data management platform that integrates with 80+ data sources. Its data warehouse provides access to data from individual e-commerce, marketing, payment and sales platforms. Use CasesAccounting, Advertising, E-Commerce, Marketing and Sales. |
@Plugin(type = BatchSource.PLUGIN_TYPE))
@Name("PanoplySource")
@Description("A batch source to read structured records from a Panoply data warehouse.")
public class PanoplySource extends RedshiftSource {
...
}
@Plugin(type = BatchSin.PLUGIN_TYPE)
@Name("PanoplySink")
@Description("A batch sink to write structured records to a Panoply data warehouse. "
+ "It is not recommended to use this sink connector for (very) large datasets. "
+ "In this case leverage the S3 sink connector and the S3-to-Redshift action.")
public class PanoplySink extends RedshiftSink {
...
}
Redshift
Amazon Redshift
Redshift is Amazon's popular and cloud data warehouse. Use CasesGeneral purpose |
@Plugin(type = BatchSource.PLUGIN_TYPE)
@Name("RedshiftSource")
@Description("A batch connector plugin to read data from Amazon Redshift data warehouse "
+ "and transform them into structured pipeline records.")
public class RedshiftSource extends BatchSource<NullWritable, RedshiftWritable, StructuredRecord> {
...
}
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("RedshiftSink")
@Description("A batch connector plugin to write structured pipelines records to "
+ "Amazon Redshift data warehouse.")
public class RedshiftSink extends BatchSink<StructuredRecord, NullWritable, RedshiftWritable> {
...
}
Snowflake
Snowflake
Snowflake is a general purpose cloud data platform and warehouse. Use CasesGeneral purpose |
@Plugin(type = BatchSource.PLUGIN_TYPE)
@Name("SnowflakeSource")
@Description("A batch source to read structured records from a Snowflake database.")
public class SnowflakeSource extends JdbcSource {
...
}
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("SnowflakeSink")
@Description("A batch sink to write structured records to a Snowflake database.")
public class SnowflakeSink extends JdbcSink<SnowflakeWritable> {
...
}
Human Intelligence
DataSift
DataSift
DataSift connects to a real-time feed of human data, uncovers insights with sophisticated data augmentation, filtering and classification engine, and provides the data for analysis. Use CasesHuman Intelligence |
@Plugin(type = BatchSource.PLUGIN_TYPE)
@Name("DataSiftSource")
@Description("A batch connector plugin to read data from DataSift cloud service "
+ "and transform them into structured pipeline records.")
public class DataSiftSource extends BatchSource<NullWritable, DataSiftWritable, StructuredRecord> {
...
}
Internet of Things
ThingsBoard
ThingsBoard
ThingsBoard is an open-source IoT Platform for device management, data collection, processing and visualization. Use Cases(Industrial) Internet-of-Things |
@Plugin(type = StreamingSource.PLUGIN_TYPE)
@Name("ThingsboardSource")
@Description("An Apache Kafka streaming source that supports real-time "
+ "events that originate from Thingsboard.")
public class ThingsboardSource extends StreamingSource<StructuredRecord> {
...
}
@Plugin(type = BatchSink.PLUGIN_TYPE)
@Name("ThingsBoardSink")
@Description("A batch connector plugin to write structured pipelines records to "
+ "ThingsBoard.")
public class ThingsBoardSink extends BatchSource<NullWritable, ThingsBoardWritable, StructuredRecord> {
...
}