Machine Learning

Works ML

PredictiveWorks. externalizes Apache Spark ML machine learning as standardized plugins, with the ability of seamless combination with any other plugin.

Feature engineering, classification, regression and more can be used via point-and-click selection as pipeline components. The focus is on Apache Spark ML v2.1.3 to be compliant with Google CDAP's pipeline technology.

The following machine learning tasks are supported by Works ML plugins:

ML Tasks

Classification

The classification scope of Apache Spark ML is externalized as standardized model building and prediction plugins that share the same technology.

Trained and retrained classification models are immediately available for production pipelines. Supported models & predictions:

Decision Tree
Gradient-Boosted Tree
Logistic Regression
Multi-Layer Perceptron
Naive Bayes
Random Forest

Clustering

The clustering scope of Apache Spark ML is externalized as standardized model building and prediction plugins that share the same technology.

Trained and retrained clustering models are immediately available for production pipelines. Supported models & predictions:

Bisecting K-Means
Gaussian Mixture
K-Means
Latent Dirichlet Allocation

Data Mining

The data mining scope of Apache Spark ML is externalized as standardized plugin components. Supported mining tasks:

Frequent Pattern (FP-Growth)

Feature Engineering

The feature engineering scope of Apache Spark ML is externalized as standardized pipeline plugins that can be seamlessly combined with model building and prediction plugins to cover all facets of a machine learning process.

PredictiveWorks. complements Apache Spark ML with a pipeline plugin for SMOTE Sampling to master imbalanced training sets for ML classification tasks. Supported feature engineering tasks:

Binarizer
Bucketed LSH
Bucketizer
Chi-Squared Selector
Count Vectorizer
Discrete Cosine Transformation (DCT)
Hashing TF
Index to String
Min-Hash LSH
N-Gram Tokenizer
Normalizer
One-Hot Encoder
Principal Component Analysis (PCA)
Quantile Discretizer
Scaling
SMOTE Sampling
String to Index
TF-IDF
Tokenizer
Vector Assembler
Vector Indexer
Word-to-Vec Embeddings

Recommendation

The recommendation scope of Apache Spark ML is externalized as standardized model building and prediction plugins that share the same technology.

Trained and retrained recommendation models are immediately available for production pipelines. Supported models & predictions:

Collaborative Filtering (ALS)

Regression

The regression scope of Apache Spark ML is externalized as standardized model building and prediction plugins that share the same technology.

Trained and retrained regression models are immediately available for production pipelines. Supported models & predictions:

Decision Tree
Gradient-Boosted Tree
Generalized Linear Regression
Isotonic Regression
Linear Regression
Random Forest
Survival (AFT)

Integration

Works ML

Table of Content

Works ML
ML Tasks
Integration