我正在尝试使用dask读取csv文件,它给了我如下错误。但问题是我要我ARTICLE_ID成为object(string)。有人可以帮助我成功读取数据吗?
追溯如下:
ValueError: Mismatched dtypes found in `pd.read_csv`/`pd.read_table`.
+------------+--------+----------+
| Column | Found | Expected |
+------------+--------+----------+
| ARTICLE_ID | object | int64 |
+------------+--------+----------+
The following columns also raised exceptions on conversion:
ARTICLE_ID:
ValueError("invalid literal for int() with base 10: ' July 2007 and 31 March 2008. Diagnostic practices of the medical practitioners for establishing the diagnosis of different types of EPTB were studied. Results: For the diagnosi\\\\'",)
Usually this is due to dask's dtype …Run Code Online (Sandbox Code Playgroud) 我的数据如下所示:
+-------+-------+------+----------+
|book_id|user_id|rating|prediction|
+-------+-------+------+----------+
| 148| 588| 4| 3.953999|
| 148| 28767| 3| 2.5816362|
| 148| 41282| 3| 4.185532|
| 148| 18313| 4| 3.6297297|
| 148| 11272| 3| 3.0962112|
+-------+-------+------+----------+
Run Code Online (Sandbox Code Playgroud)
我想通过四舍五入预测列中的值来创建一个新的列名“pred_class”。我运行这个代码:
results.withColumn('pred_class',round(results['prediction']))
Run Code Online (Sandbox Code Playgroud)
它给了我这样的错误:
类型错误:类型列没有定义圆形方法
任何人都可以帮助我吗?谢谢!