Objective
Google CDAP supports PredictiveWorks. by covering two important aspects:
-
Google CDAP is the technical foundation of PredictiveWorks. as it provides a standardized cloud-ready platform to build unified and reusable data (analytics) pipelines.
-
Google CDAP offers a wide variety of data integration plugins (connectors), supporting many prominent databases and SaaS applications. Connectors can be used to read from data sources and also write to data destinations, thereby defining the starting and endpoints of data pipelines.
Implementation
The amount of connector plugins is continuously increased by Google CDAP.
Data Store Plugins | |
Amazon S3 as Source
Reads data from Amazon S3 and converts them into CDAP's standard pipeline data format. |
|
AmazonS3 as Sink
Writes pipeline data records to Amazon S3. |
|
Apache Cassandra as Source
Reads data from Cassandra Key-Value tables and converts them into CDAP's standard pipeline data format. |
|
Apache Cassandra as Sink
Writes pipeline data records into Cassandra tables. |
|
Apache HBase as Source
Reads data from HBase tables and converts them into CDAP's standard pipeline data format. |
|
Apache HBase as Sink
Writes pipeline data records into HBase tables. |
|
Apache Hive as Source
Reads data from Hive tables and converts them into CDAP's standard pipeline data format. |
|
Apache Hive as Sink
Writes pipeline data records into Hive tables. |
|
Apache Kudu as Source
Pull rows from Kudu tables and converts them into CDAP's standard pipeline data format, using the Kudu native client. |
|
Apache Kudu as Sink
Writes data records from batch and real-time pipelines into Kudu tables. |
|
Azure Blob Storage as Source
Reads data from Azure Blob Storage and converts it into CDAP's standard pipeline data format. |
|
Azure Data Lake as Source
Reads data from Azure Data Lake store files and converts it into CDAP's standard pipeline data format. |
|
Azure Data Lake as Sink
Write pipeline data records to an Azure Data Lake store in avro, orc or text format. |
|
Apache Phoenix as Source
Reads rows from Phoenix tables and converts them into CDAP's standard pipeline data format. |
|
Apache Phoenix as Sink
Write pipeline data records into Phoenix tables. |
|
Couchbase as Source
Reads documents from Couchbase buckets. A filter can be specified to only output documents that meet a specific criteria. |
|
Couchbase as Sink
Writes data records as documents to Couchbase buckets. |
|
DynamoDB as Source
Reads data from Amazon DynamoDB tables and converts them into CDAP's standard pipeline data format. |
|
DynamoDB as Sink
Writes pipeline data records to Amazon DynamoDB tables. |
|
Elastic as Source
Reads indexed documents from Elasticsearch and converts them into CDAP's standard pipeline data format. |
|
Elastic as Sink
Writes pipeline data records as documents to Elasticsearch indices. |
|
Google BigQuery as Source
Reads the entire contents of a table from Google's BigQuery data warehouse. Data from the BigQuery table is first exported to a temporary location on Google Cloud Storage, then read into the pipeline from there. |
|
Google BigQuery as Sink
Writes pipeline data records to a BigQuery table. Data is first written to a temporary location on Google Cloud Storage, then loaded into BigQuery from there. |
|
Google BigTable as Source
Reads data from Google Cloud Bigtable. Cloud Bigtable is Google's NoSQL Big Data database service. |
|
Google BigTable as Sink
Writes pipeline data records to Google Cloud Bigtable. |
|
Google Cloud Datastore as Source
Reads data from Google Cloud Datastore (Datastore mode). Datastore is a NoSQL document database built for automatic scaling and high performance. |
|
Google Cloud Datastore as Sink
Writes pipeline data records to Google Cloud Datastore (Datastore mode). |
|
Google Cloud Storage as Source
Reads objects from Google Cloud Storage buckets. Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. |
|
Google Cloud Storage as Sink
Writes pipeline data records to one or more files in a directory on Google Cloud Storage. Files can be written in various formats such as csv, avro, parquet, and json. |
|
MongoDB as Source
Reads documents from MongoDB collections and converts them into CDAP's standard pipeline data format. |
|
MongoDB as Sink
Writes pipeline data records as documents to MongoDB collections. |
|
OrientDB as Sink
Writes pipeline data records as vertices and edges to OrientDB using the Graph API. |
|
SAP OData Service as Source
Reads data from SAP OData service and converts them into CDAP's standard pipeline data format. |
|
Data Streaming | |
Amazon Kinesis as Source
Reads real-time events and transforms them into CDAP's standard pipeline data format. |
|
Amazon Kinesis as Sink
Writes pipeline data records to Amazon Kinesis streams. |
|
Apache Kafka as Source
Consumes real-time events from Kafka streams and transforms them into CDAP's standard pipeline data format. |
|
Apache Kafka as Sink
Produces real-time events for Kafka streams from pipeline data records. |
|
Azure Event Hub as Source
Reads real-time events and transforms them into CDAP's standard pipeline data format. |
|
Google Pub/Sub as Source
Reads from a Google Cloud Pub/Sub subscription in realtime. Cloud Pub/Sub brings the scalability, flexibility, and reliability of enterprise message-oriented middleware to the cloud. |
|
Google PubSub as Sink
Writes pipeline data records to a Google Cloud Pub/Sub topic. |
|
MapR Streams as Source
Reads real-time events and transforms them into CDAP's standard pipeline data format. |
|
MQTT as Source
Allows to subscribe to an MQTT broker and consume real-time events for a certain topic. Events are transformed into CDAP's standard pipeline data format. |
|
PubNub Cloud as Source
PubNub is a global data stream network (DSN) and realtime network-as-a-service offering. This plugin reads realtime messages from PubNub cloud channels. |
|
SaaS Applications | |
Google Ads as Source
Allows users to retrieve reports of a specified report type from their Google Ads account. |
|
Google Analytics as Batch Source
Allows users to query Google Analytics Reporting API to get metrics and dimensions with built-in set. |
|
Google Analytics as Streaming Source
Allows users to query Google Analytics Real Time Reporting API to get metrics and dimensions with built-in set. |
|
Google Shopping as Source
Reads product data from Google Shopping Content API. |
|
Salesforce as Batch Source
Reads sObjects, such as accounts, contacts, leads and others, by specifying SOQL (Salesforce Object Query Language) queries, or using sObject with incremental or range date filters. |
|
Salesforce as Batch Sink
Inserts pipeline data records as sObjects into Salesforce. Currently, only inserts (no upserts) are supported. |
|
Salesforce as Streaming Source
Tracks updates of sObjects using PushTopic events and transforms them into CDAP's standard pipeline data format. |
|
Salesforce Marketing Cloud as Sink
Inserts pipeline data records into a Salesforce Marketing Cloud Data Extension. The sink requires Server-to-Server integration with the Salesforce Marketing Cloud API. |
|
Zuora as Batch Source
Fetches data from Zuora and transforms them into CDAP's standard pipeline data format. Zuora is a subscription management platform designed to meet your modern order-to-cash needs. |
|
Zuora as Batch Sink
Writes records to a Zuora objects. Each record will be written to an object collection. |
|
Zuora as Streaming Source
Fetches data from Zuora periodically and transforms them into CDAP's standard pipeline data format. |