Built-in Connectors

Objective

Google CDAP supports PredictiveWorks. by covering two important aspects:

Google CDAP is the technical foundation of PredictiveWorks. as it provides a standardized cloud-ready platform to build unified and reusable data (analytics) pipelines.
Google CDAP offers a wide variety of data integration plugins (connectors), supporting many prominent databases and SaaS applications. Connectors can be used to read from data sources and also write to data destinations, thereby defining the starting and endpoints of data pipelines.

Implementation

The amount of connector plugins is continuously increased by Google CDAP.

Data Store Plugins
	Amazon S3 as Source Reads data from Amazon S3 and converts them into CDAP's standard pipeline data format.
	AmazonS3 as Sink Writes pipeline data records to Amazon S3.
	Apache Cassandra as Source Reads data from Cassandra Key-Value tables and converts them into CDAP's standard pipeline data format.
	Apache Cassandra as Sink Writes pipeline data records into Cassandra tables.
	Apache HBase as Source Reads data from HBase tables and converts them into CDAP's standard pipeline data format.
	Apache HBase as Sink Writes pipeline data records into HBase tables.
	Apache Hive as Source Reads data from Hive tables and converts them into CDAP's standard pipeline data format.
	Apache Hive as Sink Writes pipeline data records into Hive tables.
	Apache Kudu as Source Pull rows from Kudu tables and converts them into CDAP's standard pipeline data format, using the Kudu native client.
	Apache Kudu as Sink Writes data records from batch and real-time pipelines into Kudu tables.
	Azure Blob Storage as Source Reads data from Azure Blob Storage and converts it into CDAP's standard pipeline data format.
	Azure Data Lake as Source Reads data from Azure Data Lake store files and converts it into CDAP's standard pipeline data format.
	Azure Data Lake as Sink Write pipeline data records to an Azure Data Lake store in avro, orc or text format.
	Apache Phoenix as Source Reads rows from Phoenix tables and converts them into CDAP's standard pipeline data format.
	Apache Phoenix as Sink Write pipeline data records into Phoenix tables.
	Couchbase as Source Reads documents from Couchbase buckets. A filter can be specified to only output documents that meet a specific criteria.
	Couchbase as Sink Writes data records as documents to Couchbase buckets.
	DynamoDB as Source Reads data from Amazon DynamoDB tables and converts them into CDAP's standard pipeline data format.
	DynamoDB as Sink Writes pipeline data records to Amazon DynamoDB tables.
	Elastic as Source Reads indexed documents from Elasticsearch and converts them into CDAP's standard pipeline data format.
	Elastic as Sink Writes pipeline data records as documents to Elasticsearch indices.
	Google BigQuery as Source Reads the entire contents of a table from Google's BigQuery data warehouse. Data from the BigQuery table is first exported to a temporary location on Google Cloud Storage, then read into the pipeline from there.
	Google BigQuery as Sink Writes pipeline data records to a BigQuery table. Data is first written to a temporary location on Google Cloud Storage, then loaded into BigQuery from there.
	Google BigTable as Source Reads data from Google Cloud Bigtable. Cloud Bigtable is Google's NoSQL Big Data database service.
	Google BigTable as Sink Writes pipeline data records to Google Cloud Bigtable.
	Google Cloud Datastore as Source Reads data from Google Cloud Datastore (Datastore mode). Datastore is a NoSQL document database built for automatic scaling and high performance.
	Google Cloud Datastore as Sink Writes pipeline data records to Google Cloud Datastore (Datastore mode).
	Google Cloud Storage as Source Reads objects from Google Cloud Storage buckets. Cloud Storage allows world-wide storage and retrieval of any amount of data at any time.
	Google Cloud Storage as Sink Writes pipeline data records to one or more files in a directory on Google Cloud Storage. Files can be written in various formats such as csv, avro, parquet, and json.
	MongoDB as Source Reads documents from MongoDB collections and converts them into CDAP's standard pipeline data format.
	MongoDB as Sink Writes pipeline data records as documents to MongoDB collections.
	OrientDB as Sink Writes pipeline data records as vertices and edges to OrientDB using the Graph API.
	SAP OData Service as Source Reads data from SAP OData service and converts them into CDAP's standard pipeline data format.
Data Streaming
	Amazon Kinesis as Source Reads real-time events and transforms them into CDAP's standard pipeline data format.
	Amazon Kinesis as Sink Writes pipeline data records to Amazon Kinesis streams.
	Apache Kafka as Source Consumes real-time events from Kafka streams and transforms them into CDAP's standard pipeline data format.
	Apache Kafka as Sink Produces real-time events for Kafka streams from pipeline data records.
	Azure Event Hub as Source Reads real-time events and transforms them into CDAP's standard pipeline data format.
	Google Pub/Sub as Source Reads from a Google Cloud Pub/Sub subscription in realtime. Cloud Pub/Sub brings the scalability, flexibility, and reliability of enterprise message-oriented middleware to the cloud.
	Google PubSub as Sink Writes pipeline data records to a Google Cloud Pub/Sub topic.
	MapR Streams as Source Reads real-time events and transforms them into CDAP's standard pipeline data format.
	MQTT as Source Allows to subscribe to an MQTT broker and consume real-time events for a certain topic. Events are transformed into CDAP's standard pipeline data format.
	PubNub Cloud as Source PubNub is a global data stream network (DSN) and realtime network-as-a-service offering. This plugin reads realtime messages from PubNub cloud channels.
SaaS Applications
	Google Ads as Source Allows users to retrieve reports of a specified report type from their Google Ads account.
	Google Analytics as Batch Source Allows users to query Google Analytics Reporting API to get metrics and dimensions with built-in set.
	Google Analytics as Streaming Source Allows users to query Google Analytics Real Time Reporting API to get metrics and dimensions with built-in set.
	Google Shopping as Source Reads product data from Google Shopping Content API.
	Salesforce as Batch Source Reads sObjects, such as accounts, contacts, leads and others, by specifying SOQL (Salesforce Object Query Language) queries, or using sObject with incremental or range date filters.
	Salesforce as Batch Sink Inserts pipeline data records as sObjects into Salesforce. Currently, only inserts (no upserts) are supported.
	Salesforce as Streaming Source Tracks updates of sObjects using PushTopic events and transforms them into CDAP's standard pipeline data format.
	Salesforce Marketing Cloud as Sink Inserts pipeline data records into a Salesforce Marketing Cloud Data Extension. The sink requires Server-to-Server integration with the Salesforce Marketing Cloud API.
	Zuora as Batch Source Fetches data from Zuora and transforms them into CDAP's standard pipeline data format. Zuora is a subscription management platform designed to meet your modern order-to-cash needs.
	Zuora as Batch Sink Writes records to a Zuora objects. Each record will be written to an object collection.
	Zuora as Streaming Source Fetches data from Zuora periodically and transforms them into CDAP's standard pipeline data format.

Table of Content

Objective
Implementation