小编Cof*_*Liu的帖子

Dask read_csv-在`pd.read_csv` /`pd.read_table`中发现不匹配的dtypes

我正在尝试使用dask读取csv文件，它给了我如下错误。但问题是我要我ARTICLE_ID成为object(string)。有人可以帮助我成功读取数据吗？

追溯如下：

ValueError: Mismatched dtypes found in `pd.read_csv`/`pd.read_table`.

+------------+--------+----------+

| Column     | Found  | Expected |

+------------+--------+----------+

| ARTICLE_ID | object | int64    |

+------------+--------+----------+

The following columns also raised exceptions on conversion:

ARTICLE_ID:


ValueError("invalid literal for int() with base 10: ' July 2007 and 31 March 2008. Diagnostic practices of the medical practitioners for establishing the diagnosis of different types of EPTB were studied. Results: For the diagnosi\\\\'",)

Usually this is due to dask's dtype …

Run Code Online (Sandbox Code Playgroud)

python dataframe dask

Cof*_*Liu

2018 09-25

5
推荐指数

1
解决办法

3260
查看次数

类型错误：类型列没有定义 round 方法

我的数据如下所示：

+-------+-------+------+----------+
|book_id|user_id|rating|prediction|
+-------+-------+------+----------+
|    148|    588|     4|  3.953999|
|    148|  28767|     3| 2.5816362|
|    148|  41282|     3|  4.185532|
|    148|  18313|     4| 3.6297297|
|    148|  11272|     3| 3.0962112|
+-------+-------+------+----------+

Run Code Online (Sandbox Code Playgroud)

我想通过四舍五入预测列中的值来创建一个新的列名“pred_class”。我运行这个代码：

results.withColumn('pred_class',round(results['prediction']))

Run Code Online (Sandbox Code Playgroud)

它给了我这样的错误：

类型错误：类型列没有定义圆形方法

任何人都可以帮助我吗？谢谢！

apache-spark-sql pyspark

Cof*_*Liu

2018 09-28

1
推荐指数

1
解决办法

2620
查看次数

标签统计

apache-spark-sql ×1

dask ×1

dataframe ×1

pyspark ×1

python ×1

Dask read_csv-在`pd.read_csv` /`pd.read_table`中发现不匹配的dtypes

类型错误：类型列没有定义 __round__ 方法

标签 统计

小编Cof_Liu的帖子

类型错误：类型列没有定义 round 方法

标签统计