ML.NET:如何解决“具有角色 MatrixColumnIndex 的列应该是已知基数 U4 键,但改为“UInt32””

Nat*_*ins 5 .net f# ml.net

我正在尝试将以下 ML.NET F# 产品推荐示例改编为我自己的用例:https://github.com/dotnet/machinelearning-samples/tree/master/samples/fsharp/getting-started/MatrixFactorization_ProductRecommendation

但是,在我的数据集中,我没有两个数字 ID。相反,我有一个 UserId(数字)和一个 ProductId(字符串)。因为键值似乎只能是数字,所以我尝试使用该MapValueToKey函数来映射它。但是,我仍然收到以下错误:

Unhandled Exception: System.InvalidOperationException: Column 'UserId' with role MatrixColumnIndex should be a known cardinality U4 key, but is instead 'UInt32'
   at Microsoft.ML.Recommender.RecommenderUtils.CheckRowColumnType(RoleMappedData data, ColumnRole role, Column& col, Boolean isDecode)
   at Microsoft.ML.Recommender.RecommenderUtils.CheckAndGetMatrixIndexColumns(RoleMappedData data, Column& matrixColumnIndexColumn, Column& matrixRowIndexColumn, Boolean isDecode)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.TrainCore(IChannel ch, RoleMappedData data, RoleMappedData validData)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.Fit(IDataView trainData, IDataView validationData)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.Fit(IDataView input)
   at <StartupCode$Recommender>.$Program.main@() in /Users/nat/Projects/Recommender/Recommender/Program.fs:line 75
Run Code Online (Sandbox Code Playgroud)

我的数据的架构类似于以下内容:

UserId,ProductId
1,test-product-id
Run Code Online (Sandbox Code Playgroud)

这是失败的代码,改编自链接的示例:

Unhandled Exception: System.InvalidOperationException: Column 'UserId' with role MatrixColumnIndex should be a known cardinality U4 key, but is instead 'UInt32'
   at Microsoft.ML.Recommender.RecommenderUtils.CheckRowColumnType(RoleMappedData data, ColumnRole role, Column& col, Boolean isDecode)
   at Microsoft.ML.Recommender.RecommenderUtils.CheckAndGetMatrixIndexColumns(RoleMappedData data, Column& matrixColumnIndexColumn, Column& matrixRowIndexColumn, Boolean isDecode)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.TrainCore(IChannel ch, RoleMappedData data, RoleMappedData validData)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.Fit(IDataView trainData, IDataView validationData)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.Fit(IDataView input)
   at <StartupCode$Recommender>.$Program.main@() in /Users/nat/Projects/Recommender/Recommender/Program.fs:line 75
Run Code Online (Sandbox Code Playgroud)

我一直用作指导的另一个链接是https://medium.com/machinelearningadvantage/build-a-product-recommender-using-c-and-ml-net-machine-learning-ab890b802d25

我已经尝试让它工作几个小时了。我到底做错了什么?


更新

通过使我的程序与官方 .NET 示例更加相似,我已经取得了一些进展。我现在得到的是:

UserId,ProductId
1,test-product-id
Run Code Online (Sandbox Code Playgroud)

现在失败的地方是这一行: let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)

与错误

Unhandled Exception: System.ArgumentOutOfRangeException: UserIdEncoded column 'MatrixColumnIndex' not found
Parameter name: schema
   at Microsoft.ML.Data.RoleMappedSchema.MapFromNames(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.RoleMappedSchema..ctor(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.GenericScorer.Bindings.Create(IHostEnvironment env, ISchemaBindableMapper bindable, DataViewSchema input, IEnumerable`1 roles, String suffix, Boolean user)
   at Microsoft.ML.Data.GenericScorer.Bindings.ApplyToSchema(IHostEnvironment env, DataViewSchema input)
   at Microsoft.ML.Data.GenericScorer..ctor(IHostEnvironment env, GenericScorer transform, IDataView data)
   at Microsoft.ML.Data.GenericScorer.ApplyToDataCore(IHostEnvironment env, IDataView newSource)
   at Microsoft.ML.Data.RowToRowScorerBase.ApplyToData(IHostEnvironment env, IDataView newSource)
   at Microsoft.ML.Data.PredictionTransformerBase`1.Microsoft.ML.ITransformer.GetRowToRowMapper(DataViewSchema inputSchema)
   at Microsoft.ML.PredictionEngineBase`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngine`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngineExtensions.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, IHostEnvironment env, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.ModelOperationsCatalog.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
Run Code Online (Sandbox Code Playgroud)

Zru*_*uty 2

我相信您已经克服了最初的障碍:您成功地训练了模型,现在您需要将所有经过训练的资产组装到预测引擎中。

请注意,您已经“训练”了两个转换器:预处理管道(调用 的结果pipeline.Fit(traindata))和推荐器本身(调用 的结果)est.Fit(mappedDataView)

但是,您创建的预测引擎仅采用第二个变压器,因此只有当我们为其提供第一个变压器的输出时,它才会起作用。

更好的方法是使用预处理器和推荐器形成一个估计器(对于可能的错误,我深表歉意,F# 不是我的母语):

let pipeline = 
    EstimatorChain().Append(
        mlContext.Transforms.Conversion
            .MapValueToKey(inputColumnName="UserId",outputColumnName="UserIdEncoded"))
        .Append(
            mlContext.Transforms.Conversion
                .MapValueToKey(inputColumnName="ProductId",outputColumnName="ProductIdEncoded"))


let traindata =
    let columns = 
        [|
            TextLoader.Column("Label", DataKind.Single, 0)
            TextLoader.Column("UserId", DataKind.UInt32, source = [|TextLoader.Range(0)|], keyCount = KeyCount 6248UL) 
            TextLoader.Column("ProductId", DataKind.String, source = [|TextLoader.Range(1)|]) 
        |]
    mlContext.Data.LoadFromTextFile(trainDataPath, columns, hasHeader=true, separatorChar=',')

// No need to do it: 
// let mappedDataView = pipeline.Fit(traindata).Transform(traindata)

let options = MatrixFactorizationTrainer.Options(MatrixColumnIndexColumnName = "UserIdEncoded", 
                                                 MatrixRowIndexColumnName = "ProductIdEncoded",
                                                 LossFunction = MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass,
                                                 LabelColumnName = "Label",
                                                 Alpha = 0.01,
                                                 Lambda = 0.025)

// Rather than this:
// let est = mlContext.Recommendation().Trainers.MatrixFactorization(options)
// Do this:
let est = pipeline.Append( mlContext.Recommendation().Trainers.MatrixFactorization(options));

// Now train the whole pipeline.
let model = est.Fit(traindata)

// The rest should now work.
let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)
let prediction = predictionengine.Predict {ProductId = "farfetch-13470673"; UserId = (uint32 13854); Label = 0.f}

Run Code Online (Sandbox Code Playgroud)