我有一个DataFrame
:
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
Run Code Online (Sandbox Code Playgroud)
然后我只想要那些EPS
不是NaN
,df.drop(....)
即将返回数据帧的记录,如下所示:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
Run Code Online (Sandbox Code Playgroud)
我怎么做?
我刚刚将我的Pandas从0.11升级到0.13.0rc1.现在,该应用程序正在弹出许多新的警告.其中一个是这样的:
E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE
Run Code Online (Sandbox Code Playgroud)
我想知道究竟是什么意思?我需要改变什么吗?
如果我坚持使用,我应该如何暂停警告quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE
?
def _decode_stock_quote(list_of_150_stk_str):
"""decode the webpage and return dataframe"""
from cStringIO import StringIO
str_of_all = "".join(list_of_150_stk_str)
quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]
quote_df['TClose'] = quote_df['TPrice']
quote_df['RT'] …
Run Code Online (Sandbox Code Playgroud) 我有一个Python pandas DataFrame rpt
:
rpt
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 47518 entries, ('000002', '20120331') to ('603366', '20091231')
Data columns:
STK_ID 47518 non-null values
STK_Name 47518 non-null values
RPT_Date 47518 non-null values
sales 47518 non-null values
Run Code Online (Sandbox Code Playgroud)
我可以过滤库存ID '600809'
如下的行:rpt[rpt['STK_ID'] == '600809']
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25 entries, ('600809', '20120331') to ('600809', '20060331')
Data columns:
STK_ID 25 non-null values
STK_Name 25 non-null values
RPT_Date 25 non-null values
sales 25 non-null values
Run Code Online (Sandbox Code Playgroud)
我想把一些股票的所有行放在一起,例如['600809','600141','600329']
.这意味着我想要这样的语法:
stk_list = ['600809','600141','600329']
rst = rpt[rpt['STK_ID'] in stk_list] # …
Run Code Online (Sandbox Code Playgroud) 假设我有一个df
列'ID', 'col_1', 'col_2'
.我定义了一个函数:
f = lambda x, y : my_function_expression
.
现在我想应用f
to df
的两列'col_1', 'col_2'
来逐元素地计算一个新列'col_3'
,有点像:
df['col_3'] = df[['col_1','col_2']].apply(f)
# Pandas gives : TypeError: ('<lambda>() takes exactly 2 arguments (1 given)'
Run Code Online (Sandbox Code Playgroud)
怎么做 ?
** 添加详细示例如下 ***
import pandas as pd
df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']
def get_sublist(sta,end):
return mylist[sta:end+1]
#df['col_3'] = df[['col_1','col_2']].apply(get_sublist,axis=1)
# expect above to output df as below
ID col_1 col_2 col_3
0 1 0 …
Run Code Online (Sandbox Code Playgroud) 我有一个数据帧df:
>>> df
sales discount net_sales cogs
STK_ID RPT_Date
600141 20060331 2.709 NaN 2.709 2.245
20060630 6.590 NaN 6.590 5.291
20060930 10.103 NaN 10.103 7.981
20061231 15.915 NaN 15.915 12.686
20070331 3.196 NaN 3.196 2.710
20070630 7.907 NaN 7.907 6.459
Run Code Online (Sandbox Code Playgroud)
然后我想删除具有列表中指示的某些序列号的行,假设此处为[1,2,4],
左:
sales discount net_sales cogs
STK_ID RPT_Date
600141 20060331 2.709 NaN 2.709 2.245
20061231 15.915 NaN 15.915 12.686
20070630 7.907 NaN 7.907 6.459
Run Code Online (Sandbox Code Playgroud)
如何或有什么功能可以做到这一点?
我有熊猫数据帧df1
和df2
(DF1是vanila数据帧,DF2由"STK_ID"和"RPT_Date"索引):
>>> df1
STK_ID RPT_Date TClose sales discount
0 000568 20060331 3.69 5.975 NaN
1 000568 20060630 9.14 10.143 NaN
2 000568 20060930 9.49 13.854 NaN
3 000568 20061231 15.84 19.262 NaN
4 000568 20070331 17.00 6.803 NaN
5 000568 20070630 26.31 12.940 NaN
6 000568 20070930 39.12 19.977 NaN
7 000568 20071231 45.94 29.269 NaN
8 000568 20080331 38.75 12.668 NaN
9 000568 20080630 30.09 21.102 NaN
10 000568 20080930 26.00 30.769 NaN
>>> df2 …
Run Code Online (Sandbox Code Playgroud) 当我运行程序时,Pandas每次都给出如下所示的"未来警告".
D:\Python\lib\site-packages\pandas\core\frame.py:3581: FutureWarning: rename with inplace=True will return None from pandas 0.11 onward
" from pandas 0.11 onward", FutureWarning)
Run Code Online (Sandbox Code Playgroud)
我得到了消息,但我只想阻止Pandas一次又一次地显示这样的消息,是否有任何buildin参数我可以设置让Pandas不会弹出'Future warning'?
我使用"$ ipython notebook --pylab inline"来启动ipython笔记本.显示matplotlib图形尺寸对我来说太大了,我必须手动调整它.如何设置单元格中显示的图形的默认大小?
这听起来有点奇怪,但我需要将Pandas控制台输出字符串保存到png图片.例如:
>>> df
sales net_pft ROE ROIC
STK_ID RPT_Date
600809 20120331 22.1401 4.9253 0.1651 0.6656
20120630 38.1565 7.8684 0.2567 1.0385
20120930 52.5098 12.4338 0.3587 1.2867
20121231 64.7876 13.2731 0.3736 1.2205
20130331 27.9517 7.5182 0.1745 0.3723
20130630 40.6460 9.8572 0.2560 0.4290
20130930 53.0501 11.8605 0.2927 0.4369
Run Code Online (Sandbox Code Playgroud)
有没有什么方法df.output_as_png(filename='df_data.png')
可以生成一个只显示上面内容的图片文件?
我使用matplotlib绘制散点图:
并根据matplotlib中的提示使用透明框标记气泡:如何在分散自动放置箭头上注释点?
这是代码:
if show_annote:
for i in range(len(x)):
annote_text = annotes[i][0][0] # STK_ID
ax.annotate(annote_text, xy=(x[i], y[i]), xytext=(-10,3),
textcoords='offset points', ha='center', va='bottom',
bbox=dict(boxstyle='round,pad=0.2', fc='yellow', alpha=0.2),
fontproperties=ANNOTE_FONT)
Run Code Online (Sandbox Code Playgroud)
以及由此产生的情节:
但是仍然存在减少重叠的改进空间(例如标签框偏移固定为(-10,3)).是否有算法可以:
我只是想让人眼看起来很简单,所以有些重叠是可以的,不像http://en.wikipedia.org/wiki/Automatic_label_placement所暗示的那样严格.并且图表中的气泡数量大多数时间都不到150.
我发现所谓的Force-based label placement
http://bl.ocks.org/MoritzStefaner/1377729 非常有趣.我不知道是否有任何python代码/包可用于实现该算法.
我不是一个学术人员而且没有寻找最佳解决方案,我的python代码需要标记许多图表,因此速度/内存在考虑范围内.
我正在寻找一种快速有效的解决方案.有关此主题的任何帮助(代码,算法,提示,想法)?谢谢.