我试图将新行添加到数据框,但不能。
我的代码:
newRow = Row(id='ID123')
newDF= df.insertInto(newRow)
or
newDF= df.union(newRow)
Run Code Online (Sandbox Code Playgroud)
错误:
AttributeError: _jdf
AttributeError: 'DataFrame' object has no attribute 'insertInto'
Run Code Online (Sandbox Code Playgroud) 我在python上的sklearn包中使用kfold函数在df(数据框)上使用不连续的行索引.
这是代码:
kFold = KFold(n_splits=10, shuffle=True, random_state=None)
for train_index, test_index in kFold.split(dfNARemove):...
Run Code Online (Sandbox Code Playgroud)
我得到了一些在我的df中不存在的train_index或test_index.
我能做什么?
I want to save a csv file ("test.csv") in S3 using boto3. my bucket is "outputS3Bucket" and the key is "folder/newFolder". I want to check if "newFolder" exists and if not to create it.
import boto3
client = boto3.client('s3')
s3 = boto3.resource('s3')
bucket = s3.Bucket("outputS3Bucket")
result = client.list_objects(Bucket='outputS3Bucket',Prefix="folder/newFolder")
if len(result)==0:
key = bucket.new_key("folder/newFolder")
newKey = key + "/" + "test.csv"
client.put_object(Bucket="outputS3Bucket", Key=newKey, Body=content)
# put_object path: 's3://outputS3Bucket/folder/newFolder/test.csv'
Run Code Online (Sandbox Code Playgroud)
I have few problems:
我正在尝试使用 GridSearchCV 从 sklearn 包中实现 ElasticNet。我的数据都是数字!我收到一个错误,我不明白是什么问题。当尝试实现线性回归和套索时,这不是问题。有人可以帮忙吗?
编码:
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV
# Use grid search to tune the parameters:
parametersGrid = {"max_iter": [1, 5, 10],
"alpha": [0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
"l1_ratio": np.arange(0.0, 1.0, 0.1)}
eNet = ElasticNet()
grid = GridSearchCV(eNet, parametersGrid, scoring='accuracy', cv=10)
grid.fit(X_train, Y_train)
Y_pred = grid.predict(X_test)
Run Code Online (Sandbox Code Playgroud)
错误:
File "C:\Users\..\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 58, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value …Run Code Online (Sandbox Code Playgroud) 我从sklearn实现了SelectKBest,我想获得K best列的名称,而不仅仅是每个列的值。
我需要做什么?
我的代码:
X_new = SelectKBest(chi2, k=2).fit_transform(X, y)
X_new.shape
Run Code Online (Sandbox Code Playgroud)
X_new是一个numpy.ndarray,它具有k col但没有col名称。
我正在尝试将 CSV 文件写入并保存到 s3 中的特定文件夹(存在)。这是我的代码:
from io import BytesIO
import pandas as pd
import boto3
s3 = boto3.resource('s3')
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
csv_buffer = BytesIO()
bucket = 'bucketName/folder/'
filename = "test3.csv"
df.to_csv(csv_buffer)
content = csv_buffer.getvalue()
def to_s3(bucket,filename,content):
s3.Object(bucket,filename).put(Body=content)
to_s3(bucket,filename,content)
Run Code Online (Sandbox Code Playgroud)
这是我得到的错误:
Invalid bucket name "bucketName/folder/": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
Run Code Online (Sandbox Code Playgroud)
我也试过:
bucket = bucketName/folder
Run Code Online (Sandbox Code Playgroud)
和:
bucket = bucketName
key = folder/
s3.Object(bucket,key,filename).put(Body=content)
Run Code Online (Sandbox Code Playgroud)
有什么建议?
我试图规范化df并保存列和行的索引/标题。
Sym1 Sym2 Sym3 Sym4
1 1 1 1 2
8 1 3 3 2
9 1 2 2 2
24 4 2 4 1
scaler = MinMaxScaler(feature_range=(0, 1), copy=True)
scaler.fit(df)
normData = pd.DataFrame(scaler.transform(df))
Run Code Online (Sandbox Code Playgroud)
但是我得到了countinus行和同伴:
0 1 2 3
0 0 0 0 0.8
1 0 1 0.65 0.8
2 0 0.24 0.5 0.2
3 0.5 0.5 0.5 0.25
Run Code Online (Sandbox Code Playgroud)
我想要一个这样的数据框:
Sym1 Sym2 Sym3 Sym4
1 0 0 0 0.8
8 0 1 0.65 0.8
9 0 0.24 0.5 0.2
24 …Run Code Online (Sandbox Code Playgroud) python ×7
scikit-learn ×3
amazon-s3 ×2
apache-spark ×1
boto3 ×1
csv ×1
dataframe ×1
pandas ×1
regression ×1
selection ×1