Collaborative Filtering


@Plugin(type = SparkSink.PLUGIN_TYPE)
@Description("A building stage for an Apache Spark ML Collaborative Filtering model. "
+ "This technique is commonly used for recommender systems and aims to fill in the "
+ "missing entries of a User-Item association matrix. "
+ "Spark ML uses the ALS (Alternating Least Squares) algorithm.")
public class ALSSink extends RecommenderSink {




Model Name The unique name of the recommendation model.
User Field The name of the input field that defines the user identifiers. The values must be within the integer value range.
Item Field The name of the input field that defines the item identifiers. The values must be within the integer value range.
Rating Field The name of the input field that defines the item ratings. The values must be within the integer value range.
Data Split The split of the dataset into train & test data, e.g. 80:20. Default is 70:30.
Model Configuration
Factorization Rank A positive number that defines the rank of the matrix factorization. Default is 10.
Nonnegative Constraints The indicator to determine whether to apply nonnegativity constraints for least squares. Support values are 'true' and 'false'. Default is 'false'.
Maximum Iterations The maximum number of iterations to train the ALS model. Default is 10.
Regularization Parameter The nonnegative regularization parameter. Default is 0.1.
User Blocks The number of user blocks. Default is 10.
Item Blocks The number of item blocks. Default is 10.
Implicit Preference The indicator to determine whether to use implicit preference. Support values are 'true' and 'false'. Default is 'false'.
Alpha Parameter The nonnegative alpha parameter in the implicit preference formulation. Default is 1.0.


@Plugin(type = SparkCompute.PLUGIN_TYPE)
@Description("A prediction stage that leverages a trained Apache Spark ML ALS recommendation model.")
public class ALSPredictor extends RecommenderCompute {




Model Name The unique name of the recommendation model.
User Field The name of the input field that defines the user identifiers. The values must be within the integer value range.
Item Field The name of the input field that defines the item identifiers. The values must be within the integer value range.
Prediction Field The name of the field in the output schema that contains the predicted rating.

Smart Adaptive Recommendations

The Smart Adaptive Recommendation (SAR) algorithm is outside the scope of Apache Spark ML. It is currently part of the commercial offering of Dr. Krusche & Partner and will be open sourced by mid of 2020.