熊猫:将多个类别转换为假人

我有一个表,每行可以属于多个类别,如,

test = pd.DataFrame({
            'name': ['a', 'b'],
            'category': [['cat1', 'cat2'],['cat1', 'cat3']]
    })

Run Code Online (Sandbox Code Playgroud)

如何以上表所示的方式将每个类别转换为虚拟变量,

test_res = pd.DataFrame({
        'name': ['a', 'b'],
        'cat1': [1, 1],
        'cat2': [1, 0],
        'cat3': [0, 1]
    })

Run Code Online (Sandbox Code Playgroud)

我试过pd.get_dummies(test['category'])但得到以下错误,

TypeError: unhashable type: 'list'

Run Code Online (Sandbox Code Playgroud)

python pandas

Ste*_*reo

lucky-day

6
推荐指数

1
解决办法

2336
查看次数

C++ 11使用shared_ptr转换为向量和类

我正在尝试将变换应用于a shared_ptr并存储一段shared_ptr时间也使用类中的函数.

我创建了这个例子:

#include <vector>
#include <iostream>
#include <memory>
#include <algorithm>

using namespace std;

class MyClass {
public:
    int factor = 0;
    MyClass(const int factor_) : factor(factor_) {}

    shared_ptr<vector<int> > mult(shared_ptr<vector<int> > numbers) {
        shared_ptr<vector<int> > result(new vector<int>() );

        transform(numbers->begin(), numbers->end(), result->begin(),
            [this](int x){ return factor * x; });

        return result;
    }
};

int main()
{
    shared_ptr<vector<int> > numbers(new vector<int>());
    shared_ptr<vector<int> > res(new vector<int>());
    MyClass times_two(2);

    numbers->push_back(1);
    numbers->push_back(2);
    numbers->push_back(3);

    res = times_two.mult(numbers);

    cout << "{";
    for (unsigned …

Run Code Online (Sandbox Code Playgroud)

c++ lambda transform shared-ptr c++11

Ste*_*reo

lucky-day

5
推荐指数

1
解决办法

309
查看次数

将R数据表列从JSON转换为数据表

我有一个包含JSON数据的列,如下例所示,

library(data.table)
test <- data.table(a = list(1,2,3), 
           info = list("{'duration': '10', 'country': 'US'}", 
                       "{'duration': '20', 'country': 'US'}",
                       "{'duration': '30', 'country': 'GB', 'width': '20'}"))

Run Code Online (Sandbox Code Playgroud)

我想将最后一列转换为等效的R存储,看起来类似于

res <- data.table(a = list(1, 2, 3),
                  duration = list(10, 20, 30),
                  country = list('US', 'US', 'GB'),
                  width = list(NA, NA, 20))

Run Code Online (Sandbox Code Playgroud)

由于我有500K行具有不同的内容,我会寻找一个快速的方法来做到这一点.

json r data.table

Ste*_*reo

2016 10-25

3
推荐指数

1
解决办法

2720
查看次数

在 scikit-learn 管道中使用 gensim word2vec

我正在尝试word2vec在 scikit-learn 管道中使用。

from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd
import numpy as np

class ItemSelector(BaseEstimator, TransformerMixin):
    def __init__(self, key):
        self.key = key

    def fit(self, x, y=None):
        return self

    def transform(self, data_dict):
        return data_dict[self.key]


from sklearn.pipeline import Pipeline
from gensim.sklearn_api import W2VTransformer
pipeline_word2vec = Pipeline([
                ('selector', ItemSelector(key='X')),
                ('w2v', W2VTransformer()),
            ])

pipeline_word2vec.fit(pd.DataFrame({'X':['hello world','is amazing']}), np.array([1,0]))

Run Code Online (Sandbox Code Playgroud)

这给了我

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-9e2dd309d07c> in <module>()
     23                 ('w2v', W2VTransformer()),
     24             ])
---> 25 pipeline_word2vec.fit(pd.DataFrame({'X':['hello world','is amazing']}), np.array([1,0])) …

Run Code Online (Sandbox Code Playgroud)

python gensim scikit-learn word2vec

Ste*_*reo

2018 06-02

3
推荐指数

1
解决办法

3195
查看次数

NLTK标签荷兰语句子

我从NLTK开始，想要标记荷兰语句子，但是我在指定语料库时遇到了麻烦。

from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
from nltk.corpus import alpino

pos_tag(word_tokenize("Python is een goede data science taal."), tagset = 'alpino')

Run Code Online (Sandbox Code Playgroud)

给，

[('Python', 'UNK'),
 ('is', 'UNK'),
 ('een', 'UNK'),
 ('goede', 'UNK'),
 ('data', 'UNK'),
 ('science', 'UNK'),
 ('taal', 'UNK'),
 ('.', 'UNK')]

Run Code Online (Sandbox Code Playgroud)

很明显，我没有正确指定语料库。我下载了白化语语料库。谁能帮助我找出如何正确指定语料库？

python nltk

Ste*_*reo

lucky-day

2
推荐指数

1
解决办法

4597
查看次数

使用pandas折叠重复的行

我有一个具有重复列名称的数据框.我想将所有相同的条目折叠成一个条目.

数据的csv数据是,

id,col1,col2,col1,col2
'a',1,0,1,0
'b',0,1,1,0
'c',1,0,0,0

Run Code Online (Sandbox Code Playgroud)

我要找的结果是,

id,col1,col2
'a',2,0
'b',1,1
'c',1,0

Run Code Online (Sandbox Code Playgroud)

我想总结一下这些专栏.

我是熊猫的新手,似乎无法找到如何正确聚合值.请注意,我有大约4000列.

python pandas

Ste*_*reo

2016 10-24

2
推荐指数

1
解决办法

667
查看次数

`[]`运算符导致地图上的编译错误

我试图从for循环中的地图中获取一个元素.按照cppreference的例子我试试这个:

#include <iostream>
#include <map>

using namespace std;

int main()
{
    map<int, int> mapping;

    mapping.insert(pair<int, int>(11,1));
    mapping.insert(pair<int, int>(12,2));
    mapping.insert(pair<int, int>(13,3));

    for (const auto &it : mapping)
        mapping[it]++;


    cout << "array: ";
    for (const auto &it : mapping)
        cout << it.second << " ";

    return 0;
}

Run Code Online (Sandbox Code Playgroud)

这给出了gcc的以下编译错误:

main.cpp: In function 'int main()':
main.cpp:15:16: error: no match for 'operator[]' (operand types are 'std::map<int, int>' and 'const std::pair<const int, int>')
         mapping[it]++;

Run Code Online (Sandbox Code Playgroud)

如果我理解正确,问题是auto解决了std::pair<const int, int>没有[]定义运算符的问题.我想知道是否有办法让这个工作. …

c++ stdmap c++11

Ste*_*reo

lucky-day

0
推荐指数

1
解决办法

50
查看次数

比较Rcpp中的两个值而不转换为特定类型

我试图使用Rcpp比较C++中的两个通用R值.如何在不将它们转换为C++中的特定类型的情况下比较两个值？

解释我的问题的代码如下,

require("Rcpp")
require("inline")
src <- "return wrap(x1 == x2);"

fun <- cxxfunction(signature(x1 = "SEXP", x2 = "SEXP"), src, plugin = "Rcpp")

fun("a", "a")

to_cmp <- "a"

fun(to_cmp, to_cmp)

Run Code Online (Sandbox Code Playgroud)

它现在给我FALSE和TRUE它想要它屈服TRUE和TRUE.

由于我的目标是在C++中实现数据结构,我更倾向于使用潜在的用户定义==方法.

可能的方法

我试过的一种方法是,

要求( "RCPP")

src <- '
Language call("\`==\`", x1, x2);

return call.eval();
'

fun <- cxxfunction(signature(x1 = "SEXP", x2 = "SEXP"), src, plugin = "Rcpp")

fun("a", "a")

to_cmp <- "a"

fun(to_cmp, to_cmp)

Run Code Online (Sandbox Code Playgroud)

但是,当我运行这个时,我得到了 Error: could not find function …

c++ r rcpp

Ste*_*reo

2016 10-04

0
推荐指数

1
解决办法

130
查看次数