小编dur*_*hit的帖子

设置pandas数据帧中的列顺序

有没有办法根据我的个人偏好重新排序pandas数据框中的列(即不按字母顺序或数字排序,但更像是遵循某些约定)？

简单的例子:

frame = pd.DataFrame({
        'one thing':[1,2,3,4],
        'second thing':[0.1,0.2,1,2],
        'other thing':['a','e','i','o']})

Run Code Online (Sandbox Code Playgroud)

产生这个:

   one thing other thing  second thing
0          1           a           0.1
1          2           e           0.2
2          3           i           1.0
3          4           o           2.0

Run Code Online (Sandbox Code Playgroud)

但相反,我想这样:

   one thing second thing  other thing
0          1           0.1           a
1          2           0.2           e
2          3           1.0           i
3          4           2.0           o

Run Code Online (Sandbox Code Playgroud)

(请提供一个通用的解决方案,而不是针对这种情况.非常感谢.)

python pandas

dur*_*hit

2017 02-01

89
推荐指数

9
解决办法

14万
查看次数

获取一个pandas数据帧的行的索引作为整数

例如,假设一个简单的数据帧

    A         B
0   1  0.810743
1   2  0.595866
2   3  0.154888
3   4  0.472721
4   5  0.894525
5   6  0.978174
6   7  0.859449
7   8  0.541247
8   9  0.232302
9  10  0.276566

Run Code Online (Sandbox Code Playgroud)

在给定条件的情况下,如何检索行的索引值？例如: dfb = df[df['A']==5].index.values.astype(int) 返回[4],但我想得到的只是4.这导致我在代码中遇到麻烦.

基于某些条件,我想要记录满足该条件的索引,然后选择之间的行.

我试过了

dfb = df[df['A']==5].index.values.astype(int)
dfbb = df[df['A']==8].index.values.astype(int)
df.loc[dfb:dfbb,'B']

Run Code Online (Sandbox Code Playgroud)

获得所需的输出

    A         B
4   5  0.894525
5   6  0.978174
6   7  0.859449

Run Code Online (Sandbox Code Playgroud)

但我明白了 TypeError: '[4]' is an invalid key

python numpy pandas

dur*_*hit

2017 06-21

40
推荐指数

4
解决办法

13万
查看次数

调整 gridspec 中的填充/图形边距

我的子图在图中的位置有些奇怪。顶部、底部和左侧都有大量空间，我不需要这些空间，但右边距距离右轴标签有点太近了。请参阅示例。

我生成它的代码如下。请注意，我使用了ight_layout以及subplots_adjust，但它们都不起作用。

fig = plt.figure(figsize=(7.5,10.))
fig.patch.set_facecolor('silver')
gs = gridspec.GridSpec(3,2,hspace=0.4,wspace=0.1)

ax1 = plt.subplot(gs[:,0])
a1 =  plt.plot(x, y, linestyle='-',color='b')
a1 = plt.ylabel('Y')
a1 = plt.ylim(zmax[c],zmin[c])
a1 = plt.xlabel('X')
a1 = plt.xticks([np.around(min(x),2),np.around(min(x)/2,2),0,np.around(max(x)/2,2),np.around(max(x),2)])
a1 = plt.title('(a)',fontsize=12)  

ax2 = plt.subplot(gs[0,1])
a2 = plt.plot(x2,y2, linestyle='',marker='o',color='b' )
a2 = plt.ylabel('value')
ax2.yaxis.tick_right()
ax2.yaxis.set_ticks_position('both')
ax2.yaxis.set_label_position("right")
a2 = plt.xlabel('number')
a2 = plt.title('(b)',fontsize=12) 

ax3 = plt.subplot(gs[1,1])
a3 = plt.plot(x3,y3, linestyle='--',color='b')
a3 = plt.ylabel('another y')
ax3.yaxis.tick_right()
ax3.yaxis.set_ticks_position('both') 
ax3.yaxis.set_label_position("right")
a3 = plt.xlabel('x')
a3 = plt.title('(c)',fontsize=12) 

ax4 = plt.subplot(gs[2,1])
a4 = plt.plot(x4,y4, linestyle='--',color='b')
a4 …

Run Code Online (Sandbox Code Playgroud)

python matplotlib

dur*_*hit

lucky-day

8
推荐指数

2
解决办法

1万
查看次数

`ValueError:x_new中的值高于插值范围. - 除了不提升值之外还有什么其他原因？

我在scipy interp1d函数中收到此错误.通常,如果x不是单调增加,则会产生此错误.

import scipy.interpolate as spi
def refine(coarsex,coarsey,step):
    finex = np.arange(min(coarsex),max(coarsex)+step,step)
    intfunc = spi.interp1d(coarsex, coarsey,axis=0)
    finey = intfunc(finex)
    return finex, finey

for num, tfile in enumerate(files):
    tfile = tfile.dropna(how='any')
    x = np.array(tfile['col1'])
    y = np.array(tfile['col2'])
    finex, finey = refine(x,y,0.01)

Run Code Online (Sandbox Code Playgroud)

代码是正确的,因为它成功地处理了6个数据文件并将错误扔到了第7个.所以数据一定有问题.但据我所知,数据一直在增加.我很抱歉没有提供示例,因为我无法在示例中重现错误.

有两件事可以帮助我:

一些头脑风暴 - 如果数据确实单调增加,还有什么可能产生这个错误？关于小数的另一个提示可能就是这个问题,但我认为我的解决方案(x的最小值和最大值)足够强大以避免它.或者不是吗？
是否有可能(怎么样？)在抛出时返回x_new的值和它的索引,ValueError: A value in x_new is above the interpolation range.这样我才能真正看到文件中的哪个位置出现问题？

UPDATE

所以问题在于,由于某种原因,max(finex)大于max(coarsex)(一个是.x39而另一个是.x4).我希望将原始值四舍五入到2位有效数字可以解决问题,但事实并非如此,它显示的数字较少,但仍然计算未显示的数字.我能做些什么呢？

python interpolation scipy

dur*_*hit

2017 08-01

8
推荐指数

1
解决办法

2万
查看次数

Pandas - 使用.isnull(),notnull(),dropna()删除缺少数据的行

这真的很奇怪.我已经尝试了几种方法从pandas数据帧中删除丢失数据的行,但它们似乎都没有工作.这是代码(我只是取消注释使用的方法之一 - 但这些是我在不同的修改中使用的三个 - 这是最新的):

import pandas as pd
Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,'NaN',4,5],'C':[1,2,3,'NaT',5]})
print(Test)
#Test = Test.ix[Test.C.notnull()]
#Test = Test.dropna()
Test = Test[~Test[Test.columns.values].isnull()]
print "And now"
print(Test)

Run Code Online (Sandbox Code Playgroud)

但在所有情况下,我得到的只是:

   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5
And now
   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5

Run Code Online (Sandbox Code Playgroud)

我有什么错误吗？或者问题是什么？理想情况下,我想得到这个:

   A    B    C
0 …

Run Code Online (Sandbox Code Playgroud)

python pandas

dur*_*hit

2016 09-06

7
推荐指数

2
解决办法

2万
查看次数

为什么我的熊猫数据框变成了“无”类型？

我看不出我的问题出在哪里，这个代码对于一个非常简单的例子来说工作正常，但是当我开始使用我的真实数据时，我遇到了问题。

我基本上是从各种 csv 文件中提取某些数据，并试图最终将它们组合在一个数据框中。

matplotlib locator_params 或 set_major_locator 不起作用

我正在尝试减少子图中的轴刻度数（每个轴都有不同的值，因此我无法手动设置刻度），但是诸如此或此之类的其他答案不起作用。我创建图形的语法是标准的，如下所示：

fig = plt.figure(figsize=(7,9))
ax = fig.add_subplot(8,2,i+1) # I am plotting in a much larger loop, but I don't think there is anything wrong with the loop, because everything else (axis limits, labels, plotting itself...) works fine.

Run Code Online (Sandbox Code Playgroud)

为了减少 yticks 的数量，我尝试过

ax = plt.locator_params(nbins=4, axis='y')

Run Code Online (Sandbox Code Playgroud)

这引发了错误TypeError: set_params() got an unexpected keyword argument 'nbins'

我尝试过

ax.yaxis.set_major_locator(plt.MaxNLocator(4))

Run Code Online (Sandbox Code Playgroud)

这给出了错误AttributeError: 'NoneType' object has no attribute 'yaxis'

我不明白为什么我的子图被认为是 NoneType。我怀疑这是问题的核心，但我看到的所有示例都具有相同的结构，即

fig = plt.figure()
ax = fig.add_subplot(111)
ax.yaxis.set_major_locator(plt.MaxNLocator(4))

Run Code Online (Sandbox Code Playgroud)

它应该有效。那么为什么我的斧头是 NoneType 呢？

python matplotlib

dur*_*hit

2017 05-23

6
推荐指数

1
解决办法

1万
查看次数

在熊猫中重新采样不规则间隔的数据

是否可以resample在不规则间隔的数据上使用？（我知道文档说它是为了“重新采样常规时间序列数据”，但我想尝试一下它是否也适用于不规则数据。也许它没有，或者我做错了什么。）

在我的真实数据中，我通常每小时有 2 个样本，它们之间的时间差通常在 20 到 40 分钟之间。所以我希望将它们重新采样为定期的每小时系列。

为了测试我是否正确使用它，我使用了一些我已经拥有的随机日期列表，因此它可能不是最好的例子，但至少适用于它的解决方案将非常强大。这里是：

    fraction  number                time
0   0.729797       0 2014-10-23 15:44:00
1   0.141084       1 2014-10-30 19:10:00
2   0.226900       2 2014-11-05 21:30:00
3   0.960937       3 2014-11-07 05:50:00
4   0.452835       4 2014-11-12 12:20:00
5   0.578495       5 2014-11-13 13:57:00
6   0.352142       6 2014-11-15 05:00:00
7   0.104814       7 2014-11-18 07:50:00
8   0.345633       8 2014-11-19 13:37:00
9   0.498004       9 2014-11-19 22:47:00
10  0.131665      10 2014-11-24 15:28:00
11  0.654018      11 2014-11-26 10:00:00
12  0.886092      12 2014-12-04 06:37:00
13 …

Run Code Online (Sandbox Code Playgroud)

python pandas

dur*_*hit

lucky-day

6
推荐指数

1
解决办法

2553
查看次数

如何绘制pandas数据帧的某些行？

我有这个示例数据帧:

      animal gender     name  first  second  third
0     dog      m      Ben      5       6      3
1     dog      f    Lilly      2       3      5
2     dog      m      Bob      3       2      1
3     cat      f     Puss      1       4      4
4     cat      m  Inboots      3       6      5
5    wolf      f     Lady    NaN       0      3
6    wolf      m   Summer      2       2      1
7    wolf      m     Grey      4       2      3
8    wolf      m     Wind      2       3      5
9    lion      f     Elsa      5       1      4
10   lion      m …

Run Code Online (Sandbox Code Playgroud)

python matplotlib pandas seaborn

dur*_*hit

2017 09-26

5
推荐指数

1
解决办法

2384
查看次数

我有Matplotlib版本1.5.1，并且遇到了一个有趣的问题。我想在绘图中添加一个箭头，该箭头的两端各有一个头，并指定了颜色和宽度。但是，在研究Matplotlib文档时，我意识到我可能无法同时拥有这两者。我可以使箭头指向两端，但是该箭头将具有默认的颜色和线宽-如果我将其包含arrowstyle在中arrowprops，则可以使用此选项，也可以arrowstyle在arrow属性中省略并设置颜色和宽度，但是我只有默认箭头。有办法同时获得两者吗？

我有以下代码：

plt.annotate('', xy=(p[0][0]-p[0][2], 0), xycoords='data', xytext=(p[0][0], 0), textcoords='data', arrowprops=dict(arrowstyle: '<|-|>',color='k',lw=2.5))

Run Code Online (Sandbox Code Playgroud)

结果是SyntaxError: invalid syntax。

（注意：p这只是列表的列表，我从中获取x和y值，我正在循环绘制）

python annotate matplotlib

dur*_*hit

lucky-day

4
推荐指数

1
解决办法

5651
查看次数

如何根据条件替换pandas数据帧中的任何值？

关于替换某些行或列或特定值有很多问题,但我还没有找到我要找的东西.想象一下像这样的数据帧,

          a         b         c         d
a  0.354511  0.416929  0.704512  0.598345
b  0.948605  0.473364  0.154856  0.637639
c  0.250829  0.130928  0.682998  0.056049
d  0.504516  0.880731  0.216192  0.314724

Run Code Online (Sandbox Code Playgroud)

现在我想基于条件用其他东西替换所有值(无论它们在哪个列或行中).假设我想将所有值<0.5替换为np.nan.我已经尝试了几件事并且没有任何效果(即没有发生任何事情,数据框保持不变).

这里的示例代码:

frame = pd.DataFrame(np.random.rand(4,4),index=['a','b','c','d'], columns=['a','b','c','d'])
print frame
for row,col in enumerate(frame):
    frame.replace(frame.ix[row,col]<0.5,np.nan,inplace=True)
print frame

Run Code Online (Sandbox Code Playgroud)

要么

for row,col in enumerate(frame):
    if frame.ix[row,col]<=0.5:
        M.ix[row,col]=np.nan
print M

Run Code Online (Sandbox Code Playgroud)

不过最终,

          a         b         c         d
a  0.600701  0.823570  0.159012  0.615898
b  0.234855  0.086080  0.950064  0.982248
c  0.440625  0.960078  0.191975  0.598865
d  0.127866  0.537867  0.434326  0.507635
          a         b         c …

Run Code Online (Sandbox Code Playgroud)

python pandas

dur*_*hit

lucky-day

3
推荐指数

1
解决办法

2066
查看次数

如何在matplotlib散点图中规范化色彩映射？

该matplotlib文档详细讲解了如何正常化的colormaps的pcolormesh,但我怎么能这样做是正确的散点图？

normalize = mcolors.Normalize(vmin=-1, vmax=1)
plt.scatter(x,y,z,cmap=colormap(normalize),marker='*',s=5)

Run Code Online (Sandbox Code Playgroud)

不起作用(TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe')

它只是z数据不是从-1到1,我正在绘制多个数据集,其限制在+/- 0.93 - 98左右,但我希望颜色以零为中心,从-1到1,所以我对所有各种数据集都有相同的参考.

哦,当我不尝试正常化时,我明白了TypeError: scatter() got multiple values for keyword argument 's'.显然,我不知道如何在散点图中使用色彩映射.

python matplotlib colormap

dur*_*hit

lucky-day

3
推荐指数

1
解决办法

8298
查看次数

移动平均线产生不同长度的阵列？

关于如何获得移动平均线,这个问题有很多有用的答案.我已经尝试了numpy卷积和numpy cumsum这两种方法,并且在一个示例数据集上运行良好,但在我的实际数据上生成了一个较短的数组.

数据间隔0.01.示例数据集的长度为50真实数据数万.所以它必须是引起问题的窗口大小,我不太明白函数中发生了什么.

这是我定义函数的方式:

def smoothMAcum(depth,temp, scale): # Moving average by cumsum, scale = window size in m
    dz = np.diff(depth)  
    N = int(scale/dz[0])
    cumsum = np.cumsum(np.insert(temp, 0, 0)) 
    smoothed=(cumsum[N:] - cumsum[:-N]) / N 
    return smoothed

def smoothMAconv(depth,temp, scale): # Moving average by numpy convolution
    dz = np.diff(depth) 
    N = int(scale/dz[0])
    smoothed=np.convolve(temp, np.ones((N,))/N, mode='valid') 
    return smoothed

Run Code Online (Sandbox Code Playgroud)

然后我实现它:

scale = 5.
smooth = smoothMAconv(dep,data, scale)

Run Code Online (Sandbox Code Playgroud)

但是print len(dep), len(smooth) 回归81071 80572

如果我使用其他功能,也会发生同样的情况.如何获得与数据长度相同的平滑数组？

为什么它适用于小型数据集？即使我尝试不同的比例(并且对于示例和数据使用相同的比例),示例中的结果与原始数据具有相同的长度,但在实际应用程序中却没有.我考虑了nan …

python numpy

dur*_*hit

2017 11-26

0
推荐指数

1
解决办法

1524
查看次数

标签统计

python ×13

pandas ×7

matplotlib ×5

numpy ×2

annotate ×1

colormap ×1

interpolation ×1

scipy ×1

seaborn ×1

小编dur_hit的帖子

设置pandas数据帧中的列顺序

获取一个pandas数据帧的行的索引作为整数

调整 gridspec 中的填充/图形边距

`ValueError:x_new中的值高于插值范围. - 除了不提升值之外还有什么其他原因？

Pandas - 使用.isnull(),notnull(),dropna()删除缺少数据的行

为什么我的熊猫数据框变成了“无”类型？

matplotlib locator_params 或 set_major_locator 不起作用

在熊猫中重新采样不规则间隔的数据

如何绘制pandas数据帧的某些行？

matplotlib中的箭头属性注释

如何根据条件替换pandas数据帧中的任何值？

如何在matplotlib散点图中规范化色彩映射？

移动平均线产生不同长度的阵列？

标签统计

标签 统计

小编dur_hit的帖子

标签统计