小编Hil*_*laD的帖子

pyspark将新行添加到数据框

我试图将新行添加到数据框，但不能。

我的代码：

newRow = Row(id='ID123')
newDF= df.insertInto(newRow)
 or 
newDF= df.union(newRow)

Run Code Online (Sandbox Code Playgroud)

错误：

AttributeError: _jdf

AttributeError: 'DataFrame' object has no attribute 'insertInto'

Run Code Online (Sandbox Code Playgroud)

python apache-spark

Hil*_*laD

lucky-day

7
推荐指数

2
解决办法

1万
查看次数

sklearn kfold在python中返回错误的索引

我在python上的sklearn包中使用kfold函数在df(数据框)上使用不连续的行索引.

这是代码:

kFold = KFold(n_splits=10, shuffle=True, random_state=None)
for train_index, test_index in kFold.split(dfNARemove):...

Run Code Online (Sandbox Code Playgroud)

我得到了一些在我的df中不存在的train_index或test_index.

我能做什么？

python scikit-learn

Hil*_*laD

lucky-day

6
推荐指数

1
解决办法

4010
查看次数

boto3 put_object with new key python

I want to save a csv file ("test.csv") in S3 using boto3. my bucket is "outputS3Bucket" and the key is "folder/newFolder". I want to check if "newFolder" exists and if not to create it.

import boto3
client = boto3.client('s3')
s3 = boto3.resource('s3')
bucket = s3.Bucket("outputS3Bucket")

result = client.list_objects(Bucket='outputS3Bucket',Prefix="folder/newFolder")

if len(result)==0:
    key = bucket.new_key("folder/newFolder")
    newKey = key + "/" + "test.csv"

client.put_object(Bucket="outputS3Bucket", Key=newKey, Body=content)
# put_object path: 's3://outputS3Bucket/folder/newFolder/test.csv'

Run Code Online (Sandbox Code Playgroud)

I have few problems:

if I don't write the full key name (such …

python amazon-s3 boto3

Hil*_*laD

2018 01-28

5
推荐指数

1
解决办法

1万
查看次数

在 python 中调整 ElasticNet 参数 sklearn 包

我正在尝试使用 GridSearchCV 从 sklearn 包中实现 ElasticNet。我的数据都是数字！我收到一个错误，我不明白是什么问题。当尝试实现线性回归和套索时，这不是问题。有人可以帮忙吗？

编码：

from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV

# Use grid search to tune the parameters:

    parametersGrid = {"max_iter": [1, 5, 10],
                      "alpha": [0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
                      "l1_ratio": np.arange(0.0, 1.0, 0.1)}

    eNet = ElasticNet()
    grid = GridSearchCV(eNet, parametersGrid, scoring='accuracy', cv=10)
    grid.fit(X_train, Y_train)
    Y_pred = grid.predict(X_test)

Run Code Online (Sandbox Code Playgroud)

错误：

File "C:\Users\..\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 58, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value …

Run Code Online (Sandbox Code Playgroud)

python regression scikit-learn

Hil*_*laD

lucky-day

4
推荐指数

1
解决办法

8701
查看次数

获取SelectKBest函数python的功能名称

我从sklearn实现了SelectKBest，我想获得K best列的名称，而不仅仅是每个列的值。

我需要做什么？

我的代码：

X_new = SelectKBest(chi2, k=2).fit_transform(X, y)

X_new.shape

Run Code Online (Sandbox Code Playgroud)

X_new是一个numpy.ndarray，它具有k col但没有col名称。

python selection

Hil*_*laD

lucky-day

4
推荐指数

1
解决办法

6066
查看次数

使用boto3将csv文件保存到s3

我正在尝试将 CSV 文件写入并保存到 s3 中的特定文件夹（存在）。这是我的代码：

from io import BytesIO
import pandas as pd
import boto3
s3 = boto3.resource('s3')

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

csv_buffer = BytesIO()

bucket = 'bucketName/folder/'
filename = "test3.csv"
df.to_csv(csv_buffer)
content = csv_buffer.getvalue()

def to_s3(bucket,filename,content):
  s3.Object(bucket,filename).put(Body=content)

to_s3(bucket,filename,content)

Run Code Online (Sandbox Code Playgroud)

这是我得到的错误：

Invalid bucket name "bucketName/folder/": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"

Run Code Online (Sandbox Code Playgroud)

我也试过：

bucket = bucketName/folder

Run Code Online (Sandbox Code Playgroud)

和：

bucket = bucketName
key = folder/
s3.Object(bucket,key,filename).put(Body=content)

Run Code Online (Sandbox Code Playgroud)

有什么建议？

python csv amazon-s3

Hil*_*laD

lucky-day

4
推荐指数

2
解决办法

2万
查看次数

sklearn MinMaxScaler保存行和列标题python

我试图规范化df并保存列和行的索引/标题。

      Sym1 Sym2 Sym3 Sym4
1     1    1    1    2
8     1    3    3    2
9     1    2    2    2
24    4    2    4    1


scaler = MinMaxScaler(feature_range=(0, 1), copy=True)
scaler.fit(df)
normData = pd.DataFrame(scaler.transform(df))

Run Code Online (Sandbox Code Playgroud)

但是我得到了countinus行和同伴：

      0    1    2    3
0     0    0    0    0.8
1     0    1    0.65 0.8
2     0    0.24 0.5  0.2
3     0.5  0.5  0.5  0.25

Run Code Online (Sandbox Code Playgroud)

我想要一个这样的数据框：

      Sym1 Sym2 Sym3 Sym4
1     0    0    0    0.8
8     0    1    0.65 0.8
9     0    0.24 0.5  0.2
24 …

Run Code Online (Sandbox Code Playgroud)

python dataframe pandas scikit-learn

Hil*_*laD

2017 10-09

2
推荐指数

1
解决办法

1133
查看次数

标签统计

python ×7

scikit-learn ×3

amazon-s3 ×2

apache-spark ×1

boto3 ×1

csv ×1

dataframe ×1

pandas ×1

regression ×1

selection ×1

标签 统计

小编Hil_laD的帖子

标签统计