基于条件选择的新列,来自Pandas DataFrame中其他2列的值

Uni*_*est 6 python pandas python-3.3

我有一个DataFrame包含股票价值的股票.

它看起来像这样:

>>>Data Open High Low Close Volume Adj Close Date                                                       
2013-07-08  76.91  77.81  76.85  77.04  5106200  77.04
Run Code Online (Sandbox Code Playgroud)

当我尝试使用以下if语句创建条件新列时:

Data['Test'] =Data['Close'] if Data['Close'] > Data['Open'] else Data['Open']
Run Code Online (Sandbox Code Playgroud)

我收到以下错误:

Traceback (most recent call last):
  File "<pyshell#116>", line 1, in <module>
    Data[1]['Test'] =Data[1]['Close'] if Data[1]['Close'] > Data[1]['Open'] else Data[1]['Open']
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Run Code Online (Sandbox Code Playgroud)

然后我用了a.all():

Data[1]['Test'] =Data[1]['Close'] if all(Data[1]['Close'] > Data[1]['Open']) else Data[1]['Open']
Run Code Online (Sandbox Code Playgroud)

结果是选择了整个['Open']列.我没有得到我想要的条件,这是每次选择['Open']['Close']列之间的最大值.

任何帮助表示赞赏.

谢谢.

DSM*_*DSM 4

来自像这样的数据框:

>>> df
         Date   Open   High    Low  Close   Volume  Adj Close
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04
Run Code Online (Sandbox Code Playgroud)

我能想到的最简单的事情是:

>>> df["Test"] = df[["Open", "Close"]].max(axis=1)
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23
Run Code Online (Sandbox Code Playgroud)

df.ix[:,["Open", "Close"]].max(axis=1)可能会快一点,但我认为看起来不太好。

或者,您可以.apply在行上使用:

>>> df["Test"] = df.apply(lambda row: max(row["Open"], row["Close"]), axis=1)
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23
Run Code Online (Sandbox Code Playgroud)

或者退回到 numpy:

>>> df["Test"] = np.maximum(df["Open"], df["Close"])
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23
Run Code Online (Sandbox Code Playgroud)

基本问题是它if/else不能很好地处理数组,因为if (something)总是将 强制转换something为单个bool. 它不等于“对于数组中的每个元素,如果条件成立”或类似的内容。