Objective

Google CDAP supports PredictiveWorks. by covering two important aspects:

  • Google CDAP is the technical foundation of PredictiveWorks. as it provides a standardized cloud-ready platform to build unified and reusable data (analytics) pipelines.

  • Google CDAP offers a wide variety of data integration plugins (connectors), supporting many prominent databases and SaaS applications. Connectors can be used to read from data sources and also write to data destinations, thereby defining the starting and endpoints of data pipelines.

Implementation

The amount of connector plugins is continuously increased by Google CDAP.

Data Store Plugins
Amazon S3 as Source

Reads data from Amazon S3 and converts them into CDAP's standard pipeline data format.

AmazonS3 as Sink

Writes pipeline data records to Amazon S3.

Apache Cassandra as Source

Reads data from Cassandra Key-Value tables and converts them into CDAP's standard pipeline data format.

Apache Cassandra as Sink

Writes pipeline data records into Cassandra tables.

Apache HBase as Source

Reads data from HBase tables and converts them into CDAP's standard pipeline data format.

Apache HBase as Sink

Writes pipeline data records into HBase tables.

Apache Hive as Source

Reads data from Hive tables and converts them into CDAP's standard pipeline data format.

Apache Hive as Sink

Writes pipeline data records into Hive tables.

Apache Kudu as Source

Pull rows from Kudu tables and converts them into CDAP's standard pipeline data format, using the Kudu native client.

Apache Kudu as Sink

Writes data records from batch and real-time pipelines into Kudu tables.

Azure Blob Storage as Source

Reads data from Azure Blob Storage and converts it into CDAP's standard pipeline data format.

Azure Data Lake as Source

Reads data from Azure Data Lake store files and converts it into CDAP's standard pipeline data format.

Azure Data Lake as Sink

Write pipeline data records to an Azure Data Lake store in avro, orc or text format.

Apache Phoenix as Source

Reads rows from Phoenix tables and converts them into CDAP's standard pipeline data format.

Apache Phoenix as Sink

Write pipeline data records into Phoenix tables.

Couchbase as Source

Reads documents from Couchbase buckets. A filter can be specified to only output documents that meet a specific criteria.

Couchbase as Sink

Writes data records as documents to Couchbase buckets.

DynamoDB as Source

Reads data from Amazon DynamoDB tables and converts them into CDAP's standard pipeline data format.

DynamoDB as Sink

Writes pipeline data records to Amazon DynamoDB tables.

Elastic as Source

Reads indexed documents from Elasticsearch and converts them into CDAP's standard pipeline data format.

Elastic as Sink

Writes pipeline data records as documents to Elasticsearch indices.

Google BigQuery as Source

Reads the entire contents of a table from Google's BigQuery data warehouse. Data from the BigQuery table is first exported to a temporary location on Google Cloud Storage, then read into the pipeline from there.

Google BigQuery as Sink

Writes pipeline data records to a BigQuery table. Data is first written to a temporary location on Google Cloud Storage, then loaded into BigQuery from there.

Google BigTable as Source

Reads data from Google Cloud Bigtable. Cloud Bigtable is Google's NoSQL Big Data database service.

Google BigTable as Sink

Writes pipeline data records to Google Cloud Bigtable.

Google Cloud Datastore as Source

Reads data from Google Cloud Datastore (Datastore mode). Datastore is a NoSQL document database built for automatic scaling and high performance.

Google Cloud Datastore as Sink

Writes pipeline data records to Google Cloud Datastore (Datastore mode).

Google Cloud Storage as Source

Reads objects from Google Cloud Storage buckets. Cloud Storage allows world-wide storage and retrieval of any amount of data at any time.

Google Cloud Storage as Sink

Writes pipeline data records to one or more files in a directory on Google Cloud Storage. Files can be written in various formats such as csv, avro, parquet, and json.

MongoDB as Source

Reads documents from MongoDB collections and converts them into CDAP's standard pipeline data format.

MongoDB as Sink

Writes pipeline data records as documents to MongoDB collections.

OrientDB as Sink

Writes pipeline data records as vertices and edges to OrientDB using the Graph API.

SAP OData Service as Source

Reads data from SAP OData service and converts them into CDAP's standard pipeline data format.

Data Streaming
Amazon Kinesis as Source

Reads real-time events and transforms them into CDAP's standard pipeline data format.

Amazon Kinesis as Sink

Writes pipeline data records to Amazon Kinesis streams.

Apache Kafka as Source

Consumes real-time events from Kafka streams and transforms them into CDAP's standard pipeline data format.

Apache Kafka as Sink

Produces real-time events for Kafka streams from pipeline data records.

Azure Event Hub as Source

Reads real-time events and transforms them into CDAP's standard pipeline data format.

Google Pub/Sub as Source

Reads from a Google Cloud Pub/Sub subscription in realtime. Cloud Pub/Sub brings the scalability, flexibility, and reliability of enterprise message-oriented middleware to the cloud.

Google PubSub as Sink

Writes pipeline data records to a Google Cloud Pub/Sub topic.

MapR Streams as Source

Reads real-time events and transforms them into CDAP's standard pipeline data format.

MQTT as Source

Allows to subscribe to an MQTT broker and consume real-time events for a certain topic. Events are transformed into CDAP's standard pipeline data format.

PubNub Cloud as Source

PubNub is a global data stream network (DSN) and realtime network-as-a-service offering. This plugin reads realtime messages from PubNub cloud channels.

SaaS Applications
Google Ads as Source

Allows users to retrieve reports of a specified report type from their Google Ads account.

Google Analytics as Batch Source

Allows users to query Google Analytics Reporting API to get metrics and dimensions with built-in set.

Google Analytics as Streaming Source

Allows users to query Google Analytics Real Time Reporting API to get metrics and dimensions with built-in set.

Google Shopping as Source

Reads product data from Google Shopping Content API.

Salesforce as Batch Source

Reads sObjects, such as accounts, contacts, leads and others, by specifying SOQL (Salesforce Object Query Language) queries, or using sObject with incremental or range date filters.

Salesforce as Batch Sink

Inserts pipeline data records as sObjects into Salesforce. Currently, only inserts (no upserts) are supported.

Salesforce as Streaming Source

Tracks updates of sObjects using PushTopic events and transforms them into CDAP's standard pipeline data format.

Salesforce Marketing Cloud as Sink

Inserts pipeline data records into a Salesforce Marketing Cloud Data Extension. The sink requires Server-to-Server integration with the Salesforce Marketing Cloud API.

Zuora as Batch Source

Fetches data from Zuora and transforms them into CDAP's standard pipeline data format. Zuora is a subscription management platform designed to meet your modern order-to-cash needs.

Zuora as Batch Sink

Writes records to a Zuora objects. Each record will be written to an object collection.

Zuora as Streaming Source

Fetches data from Zuora periodically and transforms them into CDAP's standard pipeline data format.