use*_*105 136 python tolist pandas
我根据满足的另一列中的条件从列中提取数据的子集.
我可以返回正确的值,但它位于pandas.core.frame.DataFrame中.如何将其转换为列表?
import pandas as pd
tst = pd.read_csv('C:\\SomeCSV.csv')
lookupValue = tst['SomeCol'] == "SomeValue"
ID = tst[lookupValue][['SomeCol']]
#How To convert ID to a list
Run Code Online (Sandbox Code Playgroud)
Aka*_*all 231
用.values得到numpy.array,然后.tolist()得到一个列表.
例如:
import pandas as pd
df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9],
'b':[3,5,6,2,4,6,7,8,7,8,9]})
Run Code Online (Sandbox Code Playgroud)
结果:
>>> df['a'].values.tolist()
[1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]
Run Code Online (Sandbox Code Playgroud)
或者你可以使用
>>> df['a'].tolist()
[1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]
Run Code Online (Sandbox Code Playgroud)
要删除重复项,您可以执行以下操作之一:
>>> df['a'].drop_duplicates().values.tolist()
[1, 3, 5, 7, 4, 6, 8, 9]
>>> list(set(df['a'])) # as pointed out by EdChum
[1, 3, 4, 5, 6, 7, 8, 9]
Run Code Online (Sandbox Code Playgroud)
Mar*_*ese 21
我想澄清一些事情:
pandas.Series.tolist().我不确定为什么最高投票的答案导致使用,pandas.Series.values.tolist()因为据我所知,它增加了语法/混乱,没有额外的好处.tst[lookupValue][['SomeCol']]是一个数据框(如问题中所述),而不是一系列(如对问题的评论中所述).这是因为它tst[lookupValue]是一个数据帧,并且在[['SomeCol']]请求列的列(该列表恰好具有1的长度)时对其进行切片,从而导致返回数据帧.如果删除额外的括号集,如同
tst[lookupValue]['SomeCol'],则只需要一列而不是列列表,这样就可以得到一个系列.pandas.Series.tolist(),所以在这种情况下你绝对应该跳过第二组括号.仅供参考,如果你最终得到一个像这样不容易避免的单列数据框,你可以用pandas.DataFrame.squeeze()它来转换成一个系列.tst[lookupValue]['SomeCol']通过链式切片获取特定列的子集.它会切片一次以获得只剩下某些行的数据帧,然后再次切片以获得某个列.你可以在这里摆脱它,因为你只是在阅读,而不是写作,但正确的方法是tst.loc[lookupValue, 'SomeCol'](它返回一个系列).ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()演示代码:
import pandas as pd
df = pd.DataFrame({'colA':[1,2,1],
'colB':[4,5,6]})
filter_value = 1
print "df"
print df
print type(df)
rows_to_keep = df['colA'] == filter_value
print "\ndf['colA'] == filter_value"
print rows_to_keep
print type(rows_to_keep)
result = df[rows_to_keep]['colB']
print "\ndf[rows_to_keep]['colB']"
print result
print type(result)
result = df[rows_to_keep][['colB']]
print "\ndf[rows_to_keep][['colB']]"
print result
print type(result)
result = df[rows_to_keep][['colB']].squeeze()
print "\ndf[rows_to_keep][['colB']].squeeze()"
print result
print type(result)
result = df.loc[rows_to_keep, 'colB']
print "\ndf.loc[rows_to_keep, 'colB']"
print result
print type(result)
result = df.loc[df['colA'] == filter_value, 'colB']
print "\ndf.loc[df['colA'] == filter_value, 'colB']"
print result
print type(result)
ID = df.loc[rows_to_keep, 'colB'].tolist()
print "\ndf.loc[rows_to_keep, 'colB'].tolist()"
print ID
print type(ID)
ID = df.loc[df['colA'] == filter_value, 'colB'].tolist()
print "\ndf.loc[df['colA'] == filter_value, 'colB'].tolist()"
print ID
print type(ID)
Run Code Online (Sandbox Code Playgroud)
结果:
df
colA colB
0 1 4
1 2 5
2 1 6
<class 'pandas.core.frame.DataFrame'>
df['colA'] == filter_value
0 True
1 False
2 True
Name: colA, dtype: bool
<class 'pandas.core.series.Series'>
df[rows_to_keep]['colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df[rows_to_keep][['colB']]
colB
0 4
2 6
<class 'pandas.core.frame.DataFrame'>
df[rows_to_keep][['colB']].squeeze()
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df.loc[rows_to_keep, 'colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df.loc[df['colA'] == filter_value, 'colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df.loc[rows_to_keep, 'colB'].tolist()
[4, 6]
<type 'list'>
df.loc[df['colA'] == filter_value, 'colB'].tolist()
[4, 6]
<type 'list'>
Run Code Online (Sandbox Code Playgroud)
zhq*_*907 17
您可以使用 pandas.Series.tolist
例如:
import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
Run Code Online (Sandbox Code Playgroud)
跑:
>>> df['a'].tolist()
Run Code Online (Sandbox Code Playgroud)
你会得到
>>> [1, 2, 3]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
416134 次 |
| 最近记录: |