我正在尝试这样做:
from sklearn.model_selection import train_test_split
Run Code Online (Sandbox Code Playgroud)
并得到一个错误:
In [31]: from sklearn.model_selection import train_test_split
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-31-73edc048c06b> in <module>()
----> 1 from sklearn.model_selection import train_test_split
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/__init__.py in <module>()
----> 1 from ._split import BaseCrossValidator
2 from ._split import KFold
3 from ._split import GroupKFold
4 from ._split import StratifiedKFold
5 from ._split import TimeSeriesSplit
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py in <module>()
29 from ..externals.six import with_metaclass
30 from ..externals.six.moves import zip
---> 31 from ..utils.fixes import signature, comb
32 from ..base …Run Code Online (Sandbox Code Playgroud) 我有一个看起来像这样的CSV:
Date,Open,High,Low,Close,Adj Close,Volume
2007-07-25,4.929000,4.946000,4.896000,4.904000,4.904000,0
2007-07-26,4.863000,4.867000,4.759000,4.777000,4.777000,0
2007-07-27,4.741000,4.818000,4.741000,4.788000,4.788000,0
2007-07-30,4.763000,4.810000,4.763000,4.804000,4.804000,0
Run Code Online (Sandbox Code Playgroud)
后
data = pd.read_csv(file, index_col='Date').drop(['Open','Close','Adj Close','Volume'], axis=1)
Run Code Online (Sandbox Code Playgroud)
我最终得到一个看起来像这样的df:
High Low
Date
2007-07-25 4.946000 4.896000
2007-07-26 4.867000 4.759000
2007-07-27 4.818000 4.741000
2007-07-30 4.810000 4.763000
2007-07-31 4.843000 4.769000
Run Code Online (Sandbox Code Playgroud)
现在我想得到高 - 低.尝试:
np.diff(data.values, axis=1)
Run Code Online (Sandbox Code Playgroud)
但得到一个错误:不支持的操作数类型 - :'str'和'str'
但是确定为什么df中的值首先是str.感谢任何解决方案.
我有一个看起来像这样的DataFrame:
28 91 182
Date
2017-09-07 0.97 1.05 1.15
2017-09-08 0.95 1.04 1.14
2017-09-11 0.96 1.06 1.16
2017-09-12 0.99 1.04 1.16
2017-09-13 0.99 1.04 1.16
Run Code Online (Sandbox Code Playgroud)
从这个DataFrame我想得到最后一行的值列表.
[0.99, 1.04, 1.16]
Run Code Online (Sandbox Code Playgroud)
我试图用
np.array(tbill.iloc[-1:].values).tolist()
Run Code Online (Sandbox Code Playgroud)
返回
[[0.99, 1.04, 1.16]]
Run Code Online (Sandbox Code Playgroud)
但感觉过于复杂.
有更简单的方法吗?
我的数据框如下所示:
In [120]: data.head()
Out[120]:
date open high low close volume
0 2017-08-07 2.276 2.276 2.253 2.257 0.0
1 2017-08-08 2.260 2.291 2.253 2.283 0.0
2 2017-08-09 2.225 2.249 2.212 2.241 0.0
3 2017-08-10 2.241 2.241 2.210 2.212 0.0
4 2017-08-11 2.199 2.222 2.182 2.189 0.0
Run Code Online (Sandbox Code Playgroud)
做完之后:
data.index = pd.to_datetime(data['date'])
Run Code Online (Sandbox Code Playgroud)
我最终得到的是这样的:
In [122]: data.head()
Out[122]:
date open high low close volume
date
2017-08-07 2017-08-07 2.276 2.276 2.253 2.257 0.0
2017-08-08 2017-08-08 2.260 2.291 2.253 2.283 0.0
2017-08-09 2017-08-09 2.225 2.249 …Run Code Online (Sandbox Code Playgroud) 我有以下带有布尔值的Dataframe
Out[25]:
0 1 2
Date
2007-01-03 False True False
2007-01-04 False False True
2007-01-05 False True False
2007-01-08 True False False
2007-01-09 False True False
Run Code Online (Sandbox Code Playgroud)
我希望得到一个DF,它返回每行的列值'True'的列索引.
要求输出:
0
Date
2007-01-03 1
2007-01-04 2
2007-01-05 1
2007-01-08 0
2007-01-09 1
Run Code Online (Sandbox Code Playgroud)
什么是最好的pythonic方式来做这个?