Objective

PredictiveWorks. supports end-to-end management of the machine learning lifecycle and integrates model building and prediction pipelines:

  • Building tracks parameters and metrics for each pipeline run and supports automated versioning to enable a clear-cut separation of each model experiment.

  • Prediction pipelines have direct access to trained models. This minimizes time from model building and evaluation to production to its minimal value.

Sample: Logistic Regression

The sample illustrates how PredictiveWorks. model recorder is used to track a certain logistic regression model experiment.

...

Dataset<Row> predictions = model.transform(testset);
String metricsJson = Evaluator.evaluate(predictions, labelCol, predictionCol);
/*
 * Store trained logistic regression model including
 * its associated parameters and metrics
 */		
String modelName = config.modelName;
new LRRecorder().track(context, modelName, paramsJson, metricsJson, model);

The same recorder can the be used to retrieve a certain model instance for prediction purposes.

@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Name("LRPredictor")
@Description("A prediction stage that leverages a trained Apache Spark ML Logistic "
+ "Regression classifier model.")
public class LRPredictor extends PredictorCompute {

  ...

  @Override
  public void initialize(SparkExecutionPluginContext context) throws Exception {
    config.validate();

    classifier = new LRRecorder().read(context, config.modelName);
    if (classifier == null)
      throw new IllegalArgumentException(String.format("[%s] A classifier model with name "
        + "'%s' does not exist.", this.getClass().getName(), config.modelName));

  }

  ...
}

Storage

Model experiments are organized as versioned time series and registered with the PredictiveWorks. model registry. Each data model has a unique name, contains versions, and other metadata.

The registry is a centralized store based on Google CDAP’s File & Table API to register model artifacts (file) and model metadata (parameters & metrics).

Versioning

Each registered model can have one or many versions. When a new model is added to the model registry, it is added as version 1. Each new model registered to the same model name increments the version number.

Integration