Objective
PredictiveWorks. supports end-to-end management of the machine learning lifecycle and integrates model building and prediction pipelines:
-
Building tracks parameters and metrics for each pipeline run and supports automated versioning to enable a clear-cut separation of each model experiment.
-
Prediction pipelines have direct access to trained models. This minimizes time from model building and evaluation to production to its minimal value.
Sample: Logistic Regression
The sample illustrates how PredictiveWorks. model recorder is used to track a certain logistic regression model experiment.
...
Dataset<Row> predictions = model.transform(testset);
String metricsJson = Evaluator.evaluate(predictions, labelCol, predictionCol);
/*
* Store trained logistic regression model including
* its associated parameters and metrics
*/
String modelName = config.modelName;
new LRRecorder().track(context, modelName, paramsJson, metricsJson, model);
The same recorder can the be used to retrieve a certain model instance for prediction purposes.
@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("LRPredictor")
@Description("A prediction stage that leverages a trained Apache Spark ML Logistic "
+ "Regression classifier model.")
public class LRPredictor extends PredictorCompute {
...
@Override
public void initialize(SparkExecutionPluginContext context) throws Exception {
config.validate();
classifier = new LRRecorder().read(context, config.modelName);
if (classifier == null)
throw new IllegalArgumentException(String.format("[%s] A classifier model with name "
+ "'%s' does not exist.", this.getClass().getName(), config.modelName));
}
...
}
Storage
Model experiments are organized as versioned time series and registered with the PredictiveWorks. model registry. Each data model has a unique name, contains versions, and other metadata.
The registry is a centralized store based on Google CDAP’s File & Table API to register model artifacts (file) and model metadata (parameters & metrics).
Versioning
Each registered model can have one or many versions. When a new model is added to the model registry, it is added as version 1. Each new model registered to the same model name increments the version number.