相关疑难解决方法(0)

使用itertuples迭代pandas数据帧

我正在使用itertuples迭代一个pandas数据帧.我也想捕获行号.迭代时:

示例代码:

for row in df.itertuples():
    print row['name']

Run Code Online (Sandbox Code Playgroud)

预期产量:

1 larry
2 barry
3 michael

Run Code Online (Sandbox Code Playgroud)

这里1,2,3是行号.我想避免使用计数器并获取行号.有没有简单的方法来实现这个使用熊猫？

python pandas

Sun*_*Sun

2019 03-11

17
推荐指数

3
解决办法

3万
查看次数

推断哪些列是日期时间

我有一个巨大的数据框，其中包含许多列，其中许多列都是type的datetime.datetime。问题在于，许多还具有混合类型，包括例如datetime.datetime值和None值（以及可能的其他无效值）：

0         2017-07-06 00:00:00
1         2018-02-27 21:30:05
2         2017-04-12 00:00:00
3         2017-05-21 22:05:00
4         2018-01-22 00:00:00
                 ...         
352867    2019-10-04 00:00:00
352868                   None
352869            some_string
Name: colx, Length: 352872, dtype: object

Run Code Online (Sandbox Code Playgroud)

因此导致object类型列。这可以用解决df.colx.fillna(pd.NaT)。问题在于数据框太大，无法搜索单个列。

另一种方法是使用pd.to_datetime(col, errors='coerce')，但是这将强制转换为datetime包含数值的许多列。

我也可以做df.fillna(float('nan'), inplace=True)，尽管包含日期的列仍然是object类型，并且仍然会有相同的问题。

我可以采用哪种方法将那些其值确实包含datetime值但也可能包含None，以及可能包含一些无效值的列转换为日期时间（提及，否则pd.to_datetime在try/ except子句中使用a即可）？像是弹性版本pd.to_datetime(col)

python pandas

yat*_*atu

2019 10-28

15
推荐指数

2
解决办法

369
查看次数

使用seaborn在xy散点图中添加标签

我花了好几个小时试图做我认为是一项简单的任务,即在使用seaborn时将标签添加到XY图上.

这是我的代码

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

df_iris=sns.load_dataset("iris") 

sns.lmplot('sepal_length', # Horizontal axis
           'sepal_width', # Vertical axis
           data=df_iris, # Data source
           fit_reg=False, # Don't fix a regression line
           size = 8,
           aspect =2 ) # size and dimension

plt.title('Example Plot')
# Set x-axis label
plt.xlabel('Sepal Length')
# Set y-axis label
plt.ylabel('Sepal Width')

Run Code Online (Sandbox Code Playgroud)

我想在图中的每个点添加"种类"栏中的文字.

我见过许多使用matplotlib但不使用seaborn的例子.

有任何想法吗？谢谢.

python plot seaborn

Tre*_*eha

2017 09-04

13
推荐指数

4
解决办法

2万
查看次数

如何获得列中列表的最大值和最小值？

鉴于此，我有一个如下的数据框：

import pandas as pd
import numpy as np

dict = {
        "A": [[1,2,3,4],[3],[2,8,4],[5,8]]
}

dt = pd.DataFrame(dict)

Run Code Online (Sandbox Code Playgroud)

我希望B 列中每一行的最大值和最小值。我最喜欢的输出是：

              A    B
0  [1, 2, 3, 4]    [1,4]
1           [3]    [3,3] 
2     [2, 8, 4]    [2,8] 
3        [5, 8]    [5,8]

Run Code Online (Sandbox Code Playgroud)

我已经尝试过以下代码不起作用：

dt["B"] =[np.min(dt.A), np.max(dt.A)]

Run Code Online (Sandbox Code Playgroud)

python list pandas

Jef*_*eff

2020 06-05

9
推荐指数

3
解决办法

218
查看次数

遍历大熊猫系列

我想绕过系列索引

In [44]: type(ed1)
Out[44]: pandas.core.series.Series

In [43]: for _, row  in ed1.iterrows():
...:     print(row.name)

Run Code Online (Sandbox Code Playgroud)

我得到错误：

  AtributeError: 'Series' ojbect has no attribute 'iterrows'

Run Code Online (Sandbox Code Playgroud)

系列是否有类似迭代的方法？非常感谢

python series pandas

Ala*_*lan

2018 05-10

8
推荐指数

1
解决办法

8191
查看次数

迭代熊猫系列元素的最佳方式

以下所有内容似乎都适用于迭代熊猫系列的元素。我相信有更多的方法可以做到这一点。有什么区别，哪种方法最好？

import pandas


arr = pandas.Series([1, 1, 1, 2, 2, 2, 3, 3])

# 1
for el in arr:
    print(el)

# 2
for _, el in arr.iteritems():
    print(el)

# 3
for el in arr.array:
    print(el)

# 4
for el in arr.values:
    print(el)

# 5
for i in range(len(arr)):
    print(arr.iloc[i])

Run Code Online (Sandbox Code Playgroud)

python pandas

d.b*_*d.b

2021 08-06

8
推荐指数

1
解决办法

624
查看次数

Pandas 替代应用 - 基于多列创建新列

我有一个 Pandas 数据框，我想根据其他列的值添加一个新列。下面是一个说明我的用例的最小示例。

df = pd.DataFrame([[4,5,19],[1,2,0],[2,5,9],[8,2,5]], columns=['a','b','c'])
df

    a   b   c
---------------
0   4   5   19
1   1   2   0
2   2   5   9
3   8   2   5

x = df.sample(n=2)
x

    a   b   c
---------------
3   8   2   5
1   1   2   0

def get_new(row):
    a, b, c = row
    return random.choice(df[(df['a'] != a) & (df['b'] == b) & (df['c'] != c)]['c'].values)

y = x.apply(lambda row: get_new(row), axis=1)
x['new'] = y
x

    a   b   c   new
--------------------
3   8 …

Run Code Online (Sandbox Code Playgroud)

python numpy apply dataframe pandas

swa*_*his

2018 03-02

6
推荐指数

1
解决办法

3949
查看次数

将一组字典解析为单行pandas(Python)

嗨,我有一个类似于下面的熊猫df

information         record
name                apple
size                {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}
country             America
partiesrelated      [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]

Run Code Online (Sandbox Code Playgroud)

我想把df转换成另一个像这样的df

information                  record
name                         apple
size_weight_gram             300
size_weight_oz               10.5
size_description_height      10
size_description_width       15 
country                      America
partiesrelated_nameOfFarmer  John Smith
partiesrelated_farmerID      A0001

Run Code Online (Sandbox Code Playgroud)

在这种情况下,字典将解析成单行,其中size_weight_gram包含值.

的代码 df

df = pd.DataFrame({'information': ['name', 'size', 'country', 'partiesrealated'], 
                   'record': ['apple', {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}, 'America', [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]]})
df = df.set_index('information')

Run Code Online (Sandbox Code Playgroud)

python dataframe pandas

Pla*_*nor

2018 08-02

5
推荐指数

1
解决办法

71
查看次数

遍历熊猫的前N行

建议像在文件中那样迭代熊猫中的行的建议方法是什么？例如：

LIMIT = 100
for row_num, row in enumerate(open('file','r')):
    print (row)
    if row_num == LIMIT: break

Run Code Online (Sandbox Code Playgroud)

我正在考虑做类似的事情：

for n in range(LIMIT):
    print (df.loc[n].tolist())

Run Code Online (Sandbox Code Playgroud)

尽管在熊猫中有内置的方法可以做到这一点？

python pandas

Dav*_*542

lucky-day

5
推荐指数

2
解决办法

4247
查看次数

如何将 pandas 数据框转换为带有列名的 numpy 数组

这必须使用向量化方法，无需迭代

我想从 pandas 数据帧创建一个 numpy 数组。

我的代码：

import pandas as pd
_df = pd.DataFrame({'itme': ['book', 'book' , 'car', ' car', 'bike', 'bike'], 'color': ['green', 'blue' , 'red', 'green' , 'blue', 'red'], 'val' : [-22.7, -109.6, -57.19, -11.2, -25.6, -33.61]})
 
item     color    val
book    green   -22.70
book    blue    -109.60
car     red     -57.19
car     green   -11.20
bike    blue    -25.60
bike    red     -33.61

Run Code Online (Sandbox Code Playgroud)

大约有 12,000 万行。

我需要创建一个 numpy 数组，例如：

item    green    blue     red
book    -22.70  -109.60   null
car     -11.20   null     -57.19
bike    null …

Run Code Online (Sandbox Code Playgroud)

python numpy dataframe pandas pytorch

use*_*011

2020 11-15

5
推荐指数

1
解决办法

2313
查看次数

标签统计

python ×10

pandas ×9

dataframe ×3

numpy ×2

apply ×1

list ×1

plot ×1

pytorch ×1

seaborn ×1

series ×1

标签 统计

标签统计