我df在column中有一个带有浮点值的数据框A。我想添加另一列,B例如:
B[0] = A[0]
为了i > 0...
B[i] = if(np.isnan(A[i])) then A[i] else Step3B[i] = if(abs((B[i-1] - A[i]) / B[i-1]) < 0.3) then B[i-1] else A[i]df可以如下生成样本数据框
import numpy as np
import pandas as pd
df = pd.DataFrame(1000*(2+np.random.randn(500, 1)), columns=list('A'))
df.loc[1, 'A'] = np.nan
df.loc[15, 'A'] = np.nan
df.loc[240, 'A'] = np.nan
df.loc[241, 'A'] = np.nan
Run Code Online (Sandbox Code Playgroud) 我有一个数据框df,我想n在其中删除一组列中的最后一行。例如,saydf的定义如下,组由列a和组成b:
>>> import pandas as pd
>>> df = pd.DataFrame({'a':['abd']*4 + ['pqr']*5 + ['xyz']*7, 'b':['john']*7 + ['doe']*9, 'c':range(16), 'd':range(1000,1016)})
>>> df
a b c d
0 abd john 0 1000
1 abd john 1 1001
2 abd john 2 1002
3 abd john 3 1003
4 pqr john 4 1004
5 pqr john 5 1005
6 pqr john 6 1006
7 pqr doe 7 1007
8 pqr doe 8 1008 …Run Code Online (Sandbox Code Playgroud) I have a very large (15 million rows) pandas dataframe df with sample being given below:
import pandas as pd
df = pd.DataFrame({'a':['ar', 're' ,'rw', 'rew', 'are'], 'b':['gh', 're', 'ww', 'rew', 'all'], 'c':['ar', 're', 'ww', '', 'different']})
df
a b c
0 ar gh ar
1 re re re
2 rw ww ww
3 rew rew
4 are all different
Run Code Online (Sandbox Code Playgroud)
I want to add another column d which has the most common value from the other 3 columns (could be …
我正在尝试将条形图和折线图绘制为单个图,并且seaborn由于其良好的格式功能而倾向于使用。但是,当我df1.plot(kind='bar',...)执行后df1.plot(kind='line',..., secondary_y=True),我得到如下类似的结果,即没有折线图,但没有错误。
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample dataframe.
df1 = pd.DataFrame({'date':pd.date_range(datetime(2020,1,1), periods=699).tolist(), 'amount':range(1,700), 'balance':np.cumsum(range(1,700))})
df1.loc[:, 'month'] = df1['date'].dt.to_period("M")
df1.loc[:, 'month_str'] = df1['date'].dt.year.astype(str) + '-' + df1['date'].dt.month.astype(str)
df1.loc[:, 'month_dt'] = pd.to_datetime(df1.month.dt.year*10000+df1.month.dt.month*100+1,format='%Y%m%d')
# Case-1: This doesn't work.
df2 = df1.groupby(['month']).agg({'amount':'sum','balance':'sum'})
sns.barplot(x='month', y='amount', data=df2.reset_index(), palette="Blues_d")
ax2 = plt.twinx()
sns.lineplot(x='month', y='balance', data=df2.reset_index(), color='red', markers=True, ax=ax2)
# Case-2: This doesn't work (as intended, if months grow …Run Code Online (Sandbox Code Playgroud)