小编Myk*_*tko的帖子

保留数据框中的行，对于某些列的值的所有组合，在另一列中包含相同的元素

df = pd.DataFrame({'a':['x','x','x','x','x','y','y','y','y','y'],'b':['z','z','z','w','w','z','z','w','w','w'],'c':['c1','c2','c3','c1','c3','c1','c3','c1','c2','c3'],'d':range(1,11)})

   a  b   c   d
0  x  z  c1   1
1  x  z  c2   2
2  x  z  c3   3
3  x  w  c1   4
4  x  w  c3   5
5  y  z  c1   6
6  y  z  c3   7
7  y  w  c1   8
8  y  w  c2   9
9  y  w  c3  10

Run Code Online (Sandbox Code Playgroud)

我怎么能只保留行，对于所有的组合a和b，包含相同的值c？或者换句话说，如何与排除行c了仅在一些组合的现值a和b？

例如，只有c1andc3出现在aand b( [x,z] …

python combinations filter pandas

Hap*_*yPy

2021 01-20

20
推荐指数

5
解决办法

808
查看次数

如何获取表中出现频率最高的行

如何获取DataFrame中最频繁的行？例如，如果我有下表：

   col_1  col_2 col_3
0      1      1     A
1      1      0     A
2      0      1     A
3      1      1     A
4      1      0     B
5      1      0     C

Run Code Online (Sandbox Code Playgroud)

预期结果：

   col_1  col_2 col_3
0      1      1     A

Run Code Online (Sandbox Code Playgroud)

编辑：我需要最频繁的行（作为一个单位）而不是可以使用该mode()方法计算的最频繁的列值。

python numpy mode frequency pandas

Myk*_*tko

2020 09-29

15
推荐指数

3
解决办法

351
查看次数

如何向下转换 Pandas 中的数字列？

如何优化数据帧内存占用并找到数字列的最佳（最小）数据类型dtypes。例如：

   A        B    C         D
0  1  1000000  1.1  1.111111
1  2 -1000000  2.1  2.111111

>>> df.dtypes
A      int64
B      int64
C    float64
D    float64

Run Code Online (Sandbox Code Playgroud)

预期结果：

>>> df.dtypes
A       int8
B      int32
C    float32
D    float32
dtype: object

Run Code Online (Sandbox Code Playgroud)

python numeric dataframe pandas dtype

Myk*_*tko

lucky-day

13
推荐指数

1
解决办法

9254
查看次数

如何将 Pandas 中的所有 float64 列转换为 float32？

有没有一种通用方法可以将 pandas 数据框中的所有 float64 值转换为 float32 值？但不将 uint16 更改为 float32？我事先不知道信号名称，但只想没有 float64。

就像是：

if float64, then convert to float32, else nothing?

Run Code Online (Sandbox Code Playgroud)

数据的结构是：

DF.dtypes

Counter               uint16
p_007                 float64
p_006                 float64
p_005                 float64
p_004                 float64

Run Code Online (Sandbox Code Playgroud)

python type-conversion pandas dtype

Mar*_*arK

2021 09-15

13
推荐指数

2
解决办法

2万
查看次数

如何在 PySpark 中导入 AnalysisException

我找不到如何AnalysisException在 PySpark 中导入，所以我可以抓住它。例如：

df = spark.createDataFrame([[1, 2], [1, 2]], ['A', 'A'])

try:
  df.select('A')
except AnalysisException as e:
  print(e)

Run Code Online (Sandbox Code Playgroud)

错误信息：

NameError: name 'AnalysisException' is not defined

Run Code Online (Sandbox Code Playgroud)

python exception try-catch apache-spark pyspark

Myk*_*tko

2021 11-11

10
推荐指数

1
解决办法

6311
查看次数

Pandas 的负时差

我通过减去较早的时间戳为较晚的时间戳得到了这个奇怪的结果：

pd.to_datetime('2021-05-21 06:00:00') - pd.to_datetime('2021-05-21 06:02:00')

Run Code Online (Sandbox Code Playgroud)

输出：

Timedelta('-1 days +23:58:00')

Run Code Online (Sandbox Code Playgroud)

预期输出：

Timedelta('-0 days 00:02:00')

Run Code Online (Sandbox Code Playgroud)

计算负时差的正确方法是什么？谢谢你！

python timestamp time-series timedelta pandas

Myk*_*tko

2021 05-21

9
推荐指数

2
解决办法

3692
查看次数

检查数组中的元素是否存在于 pandas DataFrame 中

我有一个 pandas Dataframe 和一个 pandas Series，如下所示。

df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})

  col1 col2 col3
0    a    b    d
1    b    c    f
2    c    e    g
3    d    f    a

df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])

col1    b
col2    g
col3    g
dtype: object

Run Code Online (Sandbox Code Playgroud)

正如您所看到的，的列df0和的索引df1是相同的。对于的每个索引df1，我想知道该索引处的值是否存在于的相应列中df0。df1.col1所以，b我们b只需要查找df0.col1并检查它是否存在。

期望的输出：

array([True, False, True])

Run Code Online (Sandbox Code Playgroud)

有没有办法在不使用循环的情况下做到这一点？也许是 numpy 或 pandas 的原生方法？

python numpy dataframe pandas

作者

2021 09-18

8
推荐指数

2
解决办法

5666
查看次数

How to cut a list by specific item?

I'm trying to cut a list by specific items in it, for example, I have a list like this:

down = ["a", "b", "c", "d", "b", "e", "r"]

Run Code Online (Sandbox Code Playgroud)

What I want is:

[["a", "b"]["c", "d", "b"] ["e", "r"]]

Run Code Online (Sandbox Code Playgroud)

which is cut after every occurrence of "b".

I wrote something like this:

down = ["a", "b", "c", "d", "b", "e", "r"]
up = []
while down is not []:
    up, down = up.append(down[:(down.index("b") + 1)]), down[(down.index("b") + 1):]

Run Code Online (Sandbox Code Playgroud)

It throws …

python list

fra*_*g43

2021 01-18

7
推荐指数

1
解决办法

2248
查看次数

为多个时间序列创建 Tensorflow 数据集

我有多个时间序列数据，如下所示：

df = pd.DataFrame({'Time': np.tile(np.arange(5), 2),
                   'Object': np.concatenate([[i] * 5 for i in [1, 2]]),
                   'Feature1': np.random.randint(10, size=10),
                   'Feature2': np.random.randint(10, size=10)})

   Time  Object  Feature1  Feature2
0     0       1         3         3
1     1       1         9         2
2     2       1         6         6
3     3       1         4         0
4     4       1         7         7
5     0       2         4         8
6     1       2         3         7
7     2       2         1         1
8     3       2         7         5
9     4       2         1         7

Run Code Online (Sandbox Code Playgroud)

其中每个对象（1 和 2）都有自己的数据（实际数据中大约有 2000 个对象）。我想将这些数据分块输入 …

python time-series deep-learning tensorflow tensorflow-datasets

Myk*_*tko

2022 06-28

7
推荐指数

1
解决办法

689
查看次数

分组并查找属于 n 个唯一最大值的所有值

我的数据框：

data = {'Input':[133217,133217,133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426,132426],
 'Font':[30,25,25,21,20,19,50,50,50,38,38,30,30,29]}

     Input  Font
0   133217    30
1   133217    25
2   133217    25
3   133217    21
4   133217    20
5   133217    19
6   132426    50
7   132426    50
8   132426    50
9   132426    38
10  132426    38
11  132426    30
12  132426    30
13  132426    29

Run Code Online (Sandbox Code Playgroud)

我想创建一个仅包含Font中属于 3 个唯一最大值的值的新数据框。例如，输入 133217 的 3 个最大字体值为 30、25、21。

预期输出：

op_data = {'Input':[133217,133217,133217,133217,132426,132426,132426,132426,132426,132426,132426],
 'Font':[30,25,25,21,50,50,50,38,38,30,30]}

     Input  Font
0   133217    30
1   133217    25
2   133217    25
3   133217    21
4   132426 …

Run Code Online (Sandbox Code Playgroud)

python pandas-groupby

DGS*_*DGS

2019 12-04

6
推荐指数

1
解决办法

135
查看次数