小编Mat*_*hew的帖子

如何在Windows 10 Pro中完全卸载Minikube？(巧克力味)

我来到了Windows 10多克,现在已经在它的Kubernetes选项的实现,所以我想完全卸载minikube,并使用自带的泊坞窗的窗口,而不是Kubernetes版本.

如何在Windows 10中完全卸载minikube？

windows docker kubernetes

Mat*_*hew

2018 11-14

21
推荐指数

1
解决办法

1万
查看次数

如何删除 PySpark DataFrame 中具有空值的所有列？

我有一个很大的数据集，我想删除包含null值的列并返回一个新的数据框。我怎样才能做到这一点？

以下仅删除包含null.

df.where(col("dt_mvmt").isNull()) #doesnt work because I do not have all the columns names or for 1000's of columns
df.filter(df.dt_mvmt.isNotNull()) #same reason as above
df.na.drop() #drops rows that contain null, instead of columns that contain null

Run Code Online (Sandbox Code Playgroud)

例如

a |  b  | c
1 |     | 0
2 |  2  | 3

Run Code Online (Sandbox Code Playgroud)

在上述情况下，它将删除整列，B因为其中一个值为空。

python apache-spark apache-spark-sql pyspark

Mat*_*hew

2018 07-24

8
推荐指数

1
解决办法

1万
查看次数

如何在pyspark中对一组列进行分桶？

我正在尝试对 5k 数据集中包含单词“road”的列进行分桶。并创建一个新的数据框。

我不知道该怎么做，这是我迄今为止尝试过的：

from pyspark.ml.feature import Bucketizer

spike_cols = [col for col in df.columns if "road" in col]

for x in spike_cols :

    bucketizer = Bucketizer(splits=[-float("inf"), 10, 100, float("inf")],
                        inputCol=x, outputCol=x + "bucket")

bucketedData = bucketizer.transform(df)

Run Code Online (Sandbox Code Playgroud)

python apache-spark pyspark

Mat*_*hew

2018 07-19

5
推荐指数

1
解决办法

2863
查看次数

如何在ML pyspark管道中添加自己的功能作为自定义阶段？

来自Florian的示例代码

-----------+-----------+-----------+
|ball_column|keep_the   |hall_column|
+-----------+-----------+-----------+
|          0|          7|         14|
|          1|          8|         15|
|          2|          9|         16|
|          3|         10|         17|
|          4|         11|         18|
|          5|         12|         19|
|          6|         13|         20|
+-----------+-----------+-----------+

Run Code Online (Sandbox Code Playgroud)

代码的第一部分删除了禁止列表中的列名

#first part of the code

banned_list = ["ball","fall","hall"]
condition = lambda col: any(word in col for word in banned_list)
new_df = df.drop(*filter(condition, df.columns))

Run Code Online (Sandbox Code Playgroud)

所以上面的代码应该放弃ball_column和hall_column.

代码的第二部分用于列表中的特定列.对于这个例子,我们将只剩下剩下的一个keep_column.

bagging = 
    Bucketizer(
        splits=[-float("inf"), 10, 100, float("inf")],
        inputCol='keep_the',
        outputCol='keep_the')

Run Code Online (Sandbox Code Playgroud)

现在使用管道装箱柱如下

model = Pipeline(stages=bagging).fit(df)

bucketedData …

Run Code Online (Sandbox Code Playgroud)

python apache-spark apache-spark-sql pyspark

Mat*_*hew

2018 07-21

5
推荐指数

1
解决办法

2241
查看次数

如何打印用于预测PySpark中特定行样本的决策路径/规则？

如何在Spark DataFrame中打印特定样本的决策路径？

Spark Version: '2.3.1'

Run Code Online (Sandbox Code Playgroud)

下面的代码打印整个模型的决策路径,如何打印特定样本的决策路径？例如,tagvalue ball等于2的行的决策路径

import pyspark.sql.functions as F
from pyspark.ml import Pipeline, Transformer
from pyspark.sql import DataFrame
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import VectorAssembler

import findspark
findspark.init()

from pyspark import SparkConf
from pyspark.sql import SparkSession
import pandas as pd

import pyspark.sql.functions as F
from pyspark.ml import Pipeline, Transformer
from pyspark.sql import DataFrame
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import VectorAssembler
from pyspark.sql.functions import monotonically_increasing_id, col, row_number
from pyspark.sql.window import Window

spark = SparkSession.builder.appName('demo')\
    .master('local[*]')\
    .getOrCreate()

data …

Run Code Online (Sandbox Code Playgroud)

apache-spark pyspark apache-spark-ml

Mat*_*hew

2018 08-10

5
推荐指数

1
解决办法

995
查看次数

将 keras fit/fit_generator 与 max_queue_size、workers 和 use_multiprocessing 一起使用

我对如何使用max_queue_size,workers感到use_multiprocessing困惑Keras 文档

有人可以举个例子，说明如果您有的话，您将如何使用它们

1xGPU（Nvidia Quadro p1000）
6核CPU，12个逻辑处理器

以下是我根据对这三个字段的不科学猜测来使用它的方法。

classifier.fit_generator(training_set,
                         steps_per_epoch = 8000,
                         epochs = 25,
                         validation_data = test_set,
                         validation_steps = 2000/32,
                         max_queue_size = 10,
                         use_multiprocessing = False,
                         workers=1)

Run Code Online (Sandbox Code Playgroud)

python deep-learning keras tensorflow

Mat*_*hew

2023 11-09

5
推荐指数

1
解决办法

2435
查看次数

图像中的第四个通道是什么？

使用随机生成图像时该列意味着什么np.random.randint

img = np.random.randint(255, size=(4,4,3), dtype='uint8')

Run Code Online (Sandbox Code Playgroud)

这将创建一个 4 x 4 像素、3 列的矩阵。

img = np.random.randint(255, size=(4,4,4), dtype='uint8')

Run Code Online (Sandbox Code Playgroud)

这将创建一个 4 x 4 像素、具有 4 列的矩阵。

在这种情况下，矩阵中的列的作用是什么？

python rgb numpy image cmyk

Mat*_*hew

2019 11-17

5
推荐指数

1
解决办法

1万
查看次数

如何展平熊猫数据框

这是我的熊猫数据框，我想变平。我怎样才能做到这一点？

我的输入

key column
1 {'health_1': 45, 'health_2': 60, 'health_3': 34, 'health_4': 60, 'name': 'Tom'}   
2 {'health_1': 28, 'health_2': 10, 'health_3': 42, 'health_4': 07, 'name': 'John'}  
3 {'health_1': 86, 'health_2': 65, 'health_3': 14, 'health_4': 52, 'name': 'Adam'}

Run Code Online (Sandbox Code Playgroud)

预期输出

所有的health和name都会成为column name自己的一个与自己对应的values。没有特别的顺序。

health_1 health_2 health_3 health_4 name key
45          60       34       60    Tom  1
28          10       42       07    John 2
86          65       14       52    Adam 3

Run Code Online (Sandbox Code Playgroud)

python pandas

Mat*_*hew

2018 12-05

4
推荐指数

1
解决办法

6643
查看次数

Pycharm warns package requirement not satisfied when using pipenv to install package

I am trying to install packages into my Pycharm environment using pipenv. However, when I use pipenv install <package name>, a popup appears on the top mentioning

"Package requirement not satisfied"

and asks me to "install requirements from Pipfile.lock"

When I started the project, I selected Pipenv as my project interpreter.Why is PyCharm asking to re-install a package from the Pipfile.lock file when I clearly used pipenv from the beginning and set the project to use Pipenv as an …

python django pycharm python-3.x pipenv

Mat*_*hew

2019 03-28

4
推荐指数

1
解决办法

550
查看次数

AttributeError: 模块“tensorflow”没有属性“random_shuffle”

当我尝试在machinelearningmastery 中运行代码时，我得到

AttributeError: 模块“tensorflow”没有属性“random_shuffle”

它指向以下

from mrcnn.model import MaskRCNN
from mrcnn.config import Config

model = MaskRCNN(mode='training', model_dir='./', config=config)

Run Code Online (Sandbox Code Playgroud)

如何解决这个问题？

python keras tensorflow

Mat*_*hew

lucky-day

3
推荐指数

1
解决办法

7959
查看次数

标签统计

python ×8

apache-spark ×4

pyspark ×4

apache-spark-sql ×2

keras ×2

tensorflow ×2

apache-spark-ml ×1

cmyk ×1

deep-learning ×1

django ×1

docker ×1

image ×1

kubernetes ×1

numpy ×1

pandas ×1

pipenv ×1

pycharm ×1

python-3.x ×1

rgb ×1

windows ×1

标签 统计

小编Mat_hew的帖子

标签统计