为什么要串联机器学习中的特征？

Question

为什么要串联机器学习中的特征？

我正在学习 Microsoft ML 框架，并且很困惑为什么需要连接功能。在 Microsoft 的鸢尾花示例中，此处： https ://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/iris-clustering

...特征被串联起来：

string featuresColumnName = "Features";
var pipeline = mlContext.Transforms
    .Concatenate(featuresColumnName, "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
    ...

Run Code Online (Sandbox Code Playgroud)

为了进行线性回归等计算，是否将多个特征视为单个特征？如果是这样，这有多准确？幕后发生了什么？

Answer 1

Hap*_*lop 2

根据官方文档，

连接是必要的，因为训练器将特征向量作为输入。

它本质上是将单独列形式的特征转换为单列特征向量。特征值本身保持不变；只是它们的格式和类型发生了变化。通过这个例子就更清楚了：

改造前：

        var samples = new List<InputData>()
        {
            new InputData(){ Feature1 = 0.1f, Feature2 = new[]{ 1.1f, 2.1f,
                3.1f }, Feature3 = 1 },

            new InputData(){ Feature1 = 0.2f, Feature2 = new[]{ 1.2f, 2.2f,
                3.2f }, Feature3 = 2 },

            new InputData(){ Feature1 = 0.3f, Feature2 = new[]{ 1.3f, 2.3f,
                3.3f }, Feature3 = 3 },

            new InputData(){ Feature1 = 0.4f, Feature2 = new[]{ 1.4f, 2.4f,
                3.4f }, Feature3 = 4 },

            new InputData(){ Feature1 = 0.5f, Feature2 = new[]{ 1.5f, 2.5f,
                3.5f }, Feature3 = 5 },

            new InputData(){ Feature1 = 0.6f, Feature2 = new[]{ 1.6f, 2.6f,
                3.6f }, Feature3 = 6 },
        };

Run Code Online (Sandbox Code Playgroud)

后：

    //  "Features" column obtained post-transformation.
    //  0.1 1.1 2.1 3.1 1
    //  0.2 1.2 2.2 3.2 2
    //  0.3 1.3 2.3 3.3 3
    //  0.4 1.4 2.4 3.4 4
    //  0.5 1.5 2.5 3.5 5
    //  0.6 1.6 2.6 3.6 6

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年前
查看次数：	1516 次
最近记录：	5 年，2 月前