小编mez*_*ezz的帖子

导入文本文件:无要从文件中解析的列

我试图从sys.stdin获取输入.这是hadoop的map reducer程序.输入文件采用txt格式.预览数据集:

196 242 3   881250949
186 302 3   891717742
22  377 1   878887116
244 51  2   880606923
166 346 1   886397596
298 474 4   884182806
115 265 2   881171488
253 465 5   891628467
305 451 3   886324817
6   86  3   883603013
62  257 2   879372434
286 1014    5   879781125
200 222 5   876042340
210 40  3   891035994
224 29  3   888104457
303 785 3   879485318
122 387 5   879270459
194 274 2   879539794
291 1042    4   874834944

Run Code Online (Sandbox Code Playgroud)

我一直在尝试的代码 - …

python pandas hadoop-streaming

mez*_*ezz

2016 10-22

8
推荐指数

1
解决办法

4万
查看次数

str.contains在pandas数据帧中创建新列

我正在探索巨大的数据集,并希望创建一个名称相似的列.例如,包含"Charles"的任何名称都将显示为"ch",因为我希望稍后使用这些名称来执行某些组.我使用以下代码创建了一个函数:

def cont(Name):
    for a in Name:
        if a.str.contains('Charles'):
            return('Ch')

Run Code Online (Sandbox Code Playgroud)

然后使用这个应用:

titanic['namest']=titanic['Name'].apply(cont,axis=1)

Run Code Online (Sandbox Code Playgroud)

错误: 'str' object has no attribute 'str'

notebook_link

python dataframe python-3.x pandas

mez*_*ezz

2016 04-18

4
推荐指数

1
解决办法

2149
查看次数

标签统计

pandas ×2

python ×2

dataframe ×1

hadoop-streaming ×1

python-3.x ×1

导入文本文件:无要从文件中解析的列

str.contains在pandas数据帧中创建新列

标签 统计

小编mez_ezz的帖子

标签统计