jmh*_*ead 6 python pivot-table pandas
关于这个错误有很多问题,但在环顾四周之后,我仍然无法找到/解决问题.我正在尝试使用字符串来转动数据框,以使某些行数据成为列,但到目前为止还没有成功.
我的df的形状
<class 'pandas.core.frame.DataFrame'>
Int64Index: 515932 entries, 0 to 515931
Data columns (total 5 columns):
id 515932 non-null object
cc_contact_id 515932 non-null object
Network_Name 515932 non-null object
question 515932 non-null object
response_answer 515932 non-null object
dtypes: object(5)
memory usage: 23.6+ MB
Run Code Online (Sandbox Code Playgroud)
样本格式
id contact_id question response_answer
16 137519 2206 State Ca
17 137520 2206 State Ca
18 137521 2206 State Ca
19 137522 2206 State Ca
20 137523 2208 City Lancaster
21 137524 2208 City Lancaster
22 137525 2208 City Lancaster
23 137526 2208 City Lancaster
24 137527 2208 Trip_End Location Home
25 137528 2208 Trip_End Location Home
26 137529 2208 Trip_End Location Home
27 137530 2208 Trip_End Location Home
Run Code Online (Sandbox Code Playgroud)
我想转向什么
id contact_id State City Trip_End Location
16 137519 2206 Ca None None None
20 137523 2208 None Lancaster None None
24 137527 2208 None None None Home
etc. etc.
Run Code Online (Sandbox Code Playgroud)
如果问题值成为列,则response_answer位于其对应的列中,并保留ID
我试过了什么
unified_df = pd.DataFrame(unified_data, columns=target_table_headers, dtype=object)
pivot_table = unified_df.pivot_table('response_answer',['id','cc_contact_id'],'question')
# OR
pivot_table = unified_df.pivot_table('response_answer','question')
Run Code Online (Sandbox Code Playgroud)
DataError:无需聚合的数字类型
使用字符串值转动数据框的方法是什么?
cwh*_*and 23
默认aggfunc
的pivot_table
是np.sum
,它不知道如何处理字符串做,你有没有指定的指数应该是什么正常.尝试类似的东西:
pivot_table = unified_df.pivot_table(index=['id', 'contact_id'],
columns='question',
values='response_answer',
aggfunc=lambda x: ' '.join(x))
Run Code Online (Sandbox Code Playgroud)
这明确地为每id, contact_id
对设置一行并在该组上创建response_answer
值question
.在aggfunc
刚刚确保如果您有多个答案中的原始数据相同的问题,我们只是用空格串联在一起.语法pivot_table
可能因您的熊猫版本而异.
这是一个简单的例子:
In [24]: import pandas as pd
In [25]: import random
In [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})
In [27]: df.head()
Out[27]:
id question response
0 3100 1 116
1 4500 2 113
2 5000 1 120
3 3900 2 103
4 4300 0 117
In [28]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 3 columns):
id 100 non-null int64
question 100 non-null object
response 100 non-null object
dtypes: int64(1), object(2)
memory usage: 3.1+ KB
In [29]: df.pivot_table(index='id', columns='question', values='response', aggfunc=lambda x: ' '.join(x)).head()
Out[29]:
question 0 1 2 3
id
1000 110 120 NaN 100 NaN
1100 NaN 106 108 104 NaN
1200 104 113 119 NaN 101
1300 102 NaN 116 108 120
1400 NaN NaN 116 NaN
Run Code Online (Sandbox Code Playgroud)
小智 8
有几种方法.
df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate(lambda x: x).unstack().reset_index()
df1.columns=df1.columns.tolist()
print (df1)
Run Code Online (Sandbox Code Playgroud)
df1 = df.set_index(["id","contact_id","Network_Name","question"])['response_answer'].unstack().reset_index()
df1.columns=df1.columns.tolist()
print (df1)
Run Code Online (Sandbox Code Playgroud)
df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate('first').unstack().reset_index()
df1.columns=df1.columns.tolist()
print (df1)
Run Code Online (Sandbox Code Playgroud)
df1 = df.pivot_table(index=["id","contact_id","Network_Name"], columns='question', values=['response_answer'], aggfunc='first')
df1.columns = df1.columns.droplevel()
df1 = df1.reset_index()
df1.columns=df1.columns.tolist()
print (df1)
Run Code Online (Sandbox Code Playgroud)
同样的.
id contact_id Network_Name City State Trip_End_Location
0 16 137519 2206 None Ca None
1 17 137520 2206 None Ca None
2 18 137521 2206 None Ca None
3 19 137522 2206 None Ca None
4 20 137523 2208 Lancaster None None
5 21 137524 2208 Lancaster None None
6 22 137525 2208 Lancaster None None
7 23 137526 2208 Lancaster None None
8 24 137527 2208 None None Home
9 25 137528 2208 None None Home
10 26 137529 2208 None None Home
11 27 137530 2208 None None Home
Run Code Online (Sandbox Code Playgroud)