我正在将以下Kaggle代码翻译成Python3.4:
在输出CSV文件的最后一行中,
predictions_file = open("myfirstforest.csv", "wb")
open_file_object = csv.writer(predictions_file)
open_file_object.writerow(["PassengerId","Survived"])
open_file_object.writerows(zip(ids, output))
predictions_file.close()
print('Done.')
Run Code Online (Sandbox Code Playgroud)
有一个类型错误
TypeError: 'str' does not support the buffer interface
Run Code Online (Sandbox Code Playgroud)
发生在线上open_file_object.writerow(["PassengerId","Survived"]).
我相信这是因为在二进制模式打开文件写入CSV数据不会出现在Python 3的工作.然而,增加encoding='utf8'的open()行也不行.
在Python3.4中执行此操作的标准方法是什么?
这个问题真的很奇怪,因为这件事对其他数据集来说效果很好.
完整代码:
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.cross_validation import train_test_split
# # Split the Learning Set
X_fit, X_eval, y_fit, y_eval= train_test_split(
train, target, test_size=0.2, random_state=1
)
clf = xgb.XGBClassifier(missing=np.nan, max_depth=6,
n_estimators=5, learning_rate=0.15,
subsample=1, colsample_bytree=0.9, seed=1400)
# fitting
clf.fit(X_fit, y_fit, early_stopping_rounds=50, eval_metric="logloss", eval_set=[(X_eval, y_eval)])
#print y_pred
y_pred= clf.predict_proba(test)[:,1]
Run Code Online (Sandbox Code Playgroud)
最后一行导致下面的错误(提供完整输出):
Will train until validation_0 error hasn't decreased in 50 rounds.
[0] validation_0-logloss:0.554366
[1] validation_0-logloss:0.451454
[2] validation_0-logloss:0.372142
[3] validation_0-logloss:0.309450
[4] validation_0-logloss:0.259002
Traceback (most recent call …Run Code Online (Sandbox Code Playgroud) 我一直在使用Jupyter Notebook 从kaggle学习主成分分析),但是当我运行这段代码时
from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))
Run Code Online (Sandbox Code Playgroud)
我在下面收到了一个错误
FileNotFoundError Traceback (most recent call last)
<ipython-input-3-de0e39ca3ab8> in <module>()
1 from subprocess import check_output
----> 2 print(check_output(["ls", "C:/Users/wanglei/Documents/input"]).decode("utf8"))
D:\Anaconda3\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
624
625 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 626 **kwargs).stdout
627
628
D:\Anaconda3\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
691 kwargs['stdin'] = PIPE
692
--> 693 with Popen(*popenargs, **kwargs) as process:
694 try:
695 stdout, stderr = process.communicate(input, timeout=timeout)
D:\Anaconda3\lib\subprocess.py in __init__(self, args, bufsize, …Run Code Online (Sandbox Code Playgroud) 我安装了kaggle-cli就好......一切顺利,或者pip说.
但是,当我尝试运行kg命令或只是kg --version
我明白了 kg command not found
我可以在python系统包中看到kaggle,所有py和pyc文件都在那里.但是没有bin目录或任何东西.
试图在网上找到类似的问题没有成功 - 所以我想在这里试试.
我在Windows 10的ubuntu应用程序上.对于我的机器学习装备,python,keras,theano..etc等其他所有工作都很好.
我正在尝试使用 Kaggle API 从 kaggle 网站提取数据。我使用的是 Ubuntu 18.04。当我输入以下命令时:
kaggle competitions download -c home-credit-default-risk
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
Traceback (most recent call last):
File "/home/hduser/anaconda3/bin/kaggle", line 5, in <module>
from kaggle.cli import main
File "/home/hduser/anaconda3/lib/python3.7/site-packages/kaggle/__init__.py", line 23, in <module>
api.authenticate()
File "/home/hduser/anaconda3/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 149, in authenticate
self.config_file, self.config_dir))
OSError: Could not find kaggle.json. Make sure it's located in /home/hduser/.kaggle/kaggle.json. Or use the environment method.
Run Code Online (Sandbox Code Playgroud)
这很奇怪,因为我将 kaggle.json 文件保存在正确的目录中,并且我拥有为其授予的所有权限。
ls ~/.kaggle
Run Code Online (Sandbox Code Playgroud)
返回:
kaggle.json
Run Code Online (Sandbox Code Playgroud)
我在这里错过了什么吗?
我的数据集来自Kaggle 的MNIST
我正在尝试使用该image功能可视化说出训练集中的第一个数字.不幸的是我收到以下错误:
>image(1:28, 1:28, im, col=gray((0:255)/255))
Error in image.default(1:28, 1:28, im, col = gray((0:255)/255)) :
'z' must be numeric or logical
Run Code Online (Sandbox Code Playgroud)
添加几个代码:
rawfile<-read.csv("D://Kaggle//MNIST//train.csv",header=T) #Reading the csv file
im<-matrix((rawfile[1,2:ncol(rawfile)]), nrow=28, ncol=28) #For the 1st Image
image(1:28, 1:28, im, col=gray((0:255)/255))
Error in image.default(1:28, 1:28, im, col = gray((0:255)/255)) :
'z' must be numeric or logical
Run Code Online (Sandbox Code Playgroud) 在使用Kaggle笔记本时,我遇到了一个问题.以下代码块:
from nltk import ngrams
def grams(tokens):
return list(ngrams(tokens, 3))
negative_grams = preprocessed_negative_tweets.apply(grams)
Run Code Online (Sandbox Code Playgroud)
结果出现了一个红色的盒子
/opt/conda/bin/ipython:5: DeprecationWarning: generator 'ngrams' raised StopIteration
Run Code Online (Sandbox Code Playgroud)
该变量preprocessed_negative_tweets是包含令牌的Pandas数据帧.
有谁知道如何让它消失?
(这里有完整笔记本)
尝试使用该模块下载“Cats_vs_Dogs”TensorFlow 数据集时tfds,出现以下错误
DownloadError Traceback (most recent call last)
<ipython-input-2-244305a07c33> in <module>()
7 split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
8 with_info=True,
----> 9 as_supervised=True,
10 )
21 frames
/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/download/downloader.py in _assert_status(response)
257 if response.status_code != 200:
258 raise DownloadError('Failed to get url {}. HTTP code: {}.'.format(
--> 259 response.url, response.status_code))
DownloadError: Failed to get url https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip. HTTP code: 404.
Run Code Online (Sandbox Code Playgroud)
我使用的代码是
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
# split the data manually into 80% training, 10% testing, 10% validation
(raw_train, raw_validation, raw_test), metadata = …Run Code Online (Sandbox Code Playgroud) 我是 colab 的新手。我最近将数据集从 kaggle 下载到了 google colab。在我下次访问时,数据集不在那里,我的 kaggle 安装也被删除了。有谁知道为什么?
我试过在 /content 中查看我记得保存和编辑它的地方!
我设法使用 Kaggle API 从 Kaggle 下载数据集。数据存储在/databricks/driver目录下。
%sh pip install kaggle
%sh
export KAGGLE_USERNAME=my_name
export KAGGLE_KEY=my_key
kaggle competitions download -c ncaaw-march-mania-2021
%sh unzip ncaaw-march-mania-2021.zip
Run Code Online (Sandbox Code Playgroud)
问题是:如何在 DBFS 中使用它们?以下是我读取数据的方式以及尝试使用pyspark读取csv文件时遇到的错误:
spark.read.csv('/databricks/driver/WDataFiles_Stage1/Cities.csv')
AnalysisException: Path does not exist: dbfs:/databricks/driver/WDataFiles_Stage1/Cities.csv
Run Code Online (Sandbox Code Playgroud) kaggle ×10
python ×5
python-3.x ×3
dataset ×2
api ×1
csv ×1
databricks ×1
databricks-community-edition ×1
image ×1
ipython ×1
linux ×1
mnist ×1
nltk ×1
pip ×1
r ×1
subprocess ×1
tensorflow ×1
typeerror ×1
ubuntu ×1
ubuntu-16.04 ×1
xgboost ×1