小编Ger*_*rry的帖子

有条件地填写熊猫数据框

df在column中有一个带有浮点值的数据框A。我想添加另一列,B例如:

  1. B[0] = A[0]

    为了i > 0...

  2. B[i] = if(np.isnan(A[i])) then A[i] else Step3
  3. B[i] = if(abs((B[i-1] - A[i]) / B[i-1]) < 0.3) then B[i-1] else A[i]

df可以如下生成样本数据框

import numpy as np
import pandas as pd
df = pd.DataFrame(1000*(2+np.random.randn(500, 1)), columns=list('A'))
df.loc[1, 'A'] = np.nan
df.loc[15, 'A'] = np.nan
df.loc[240, 'A'] = np.nan
df.loc[241, 'A'] = np.nan
Run Code Online (Sandbox Code Playgroud)

python dataframe pandas

5
推荐指数
1
解决办法
535
查看次数

删除 pandas 数据框 groupby 中的最后 n 行

我有一个数据框df,我想n在其中删除一组列中的最后一行。例如,saydf的定义如下,组由列a和组成b

>>> import pandas as pd
>>> df = pd.DataFrame({'a':['abd']*4 + ['pqr']*5 + ['xyz']*7, 'b':['john']*7 + ['doe']*9, 'c':range(16), 'd':range(1000,1016)})
>>> df
      a     b   c     d
0   abd  john   0  1000
1   abd  john   1  1001
2   abd  john   2  1002
3   abd  john   3  1003
4   pqr  john   4  1004
5   pqr  john   5  1005
6   pqr  john   6  1006
7   pqr   doe   7  1007
8   pqr   doe   8  1008 …
Run Code Online (Sandbox Code Playgroud)

python group-by dataframe pandas pandas-groupby

5
推荐指数
1
解决办法
1649
查看次数

Adding new column with most popular string value in each row in Pandas DataFrame

I have a very large (15 million rows) pandas dataframe df with sample being given below:

import pandas as pd
df = pd.DataFrame({'a':['ar', 're' ,'rw', 'rew', 'are'], 'b':['gh', 're', 'ww', 'rew', 'all'], 'c':['ar', 're', 'ww', '', 'different']})
df
     a    b          c
0   ar   gh         ar
1   re   re         re
2   rw   ww         ww
3  rew  rew         
4  are  all  different
Run Code Online (Sandbox Code Playgroud)

I want to add another column d which has the most common value from the other 3 columns (could be …

string dataframe python-3.x pandas

2
推荐指数
1
解决办法
68
查看次数

使用seaborn的辅助y轴

我正在尝试将条形图和折线图绘制为单个图,并且seaborn由于其良好的格式功能而倾向于使用。但是,当我df1.plot(kind='bar',...)执行后df1.plot(kind='line',..., secondary_y=True),我得到如下类似的结果,即没有折线图,但没有错误。

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample dataframe.
df1 = pd.DataFrame({'date':pd.date_range(datetime(2020,1,1), periods=699).tolist(), 'amount':range(1,700), 'balance':np.cumsum(range(1,700))})
df1.loc[:, 'month'] = df1['date'].dt.to_period("M")
df1.loc[:, 'month_str'] = df1['date'].dt.year.astype(str) + '-' + df1['date'].dt.month.astype(str)
df1.loc[:, 'month_dt'] = pd.to_datetime(df1.month.dt.year*10000+df1.month.dt.month*100+1,format='%Y%m%d')

# Case-1: This doesn't work.
df2 = df1.groupby(['month']).agg({'amount':'sum','balance':'sum'})
sns.barplot(x='month', y='amount', data=df2.reset_index(), palette="Blues_d")
ax2 = plt.twinx()
sns.lineplot(x='month', y='balance', data=df2.reset_index(), color='red', markers=True, ax=ax2)

# Case-2: This doesn't work (as intended, if months grow …
Run Code Online (Sandbox Code Playgroud)

matplotlib python-3.x seaborn

0
推荐指数
1
解决办法
3478
查看次数