Dask read_csv-在`pd.read_csv` /`pd.read_table`中发现不匹配的dtypes

Question

Dask read_csv-在`pd.read_csv` /`pd.read_table`中发现不匹配的dtypes

我正在尝试使用dask读取csv文件，它给了我如下错误。但问题是我要我ARTICLE_ID成为object(string)。有人可以帮助我成功读取数据吗？

追溯如下：

ValueError: Mismatched dtypes found in `pd.read_csv`/`pd.read_table`.

+------------+--------+----------+

| Column     | Found  | Expected |

+------------+--------+----------+

| ARTICLE_ID | object | int64    |

+------------+--------+----------+

The following columns also raised exceptions on conversion:

ARTICLE_ID:


ValueError("invalid literal for int() with base 10: ' July 2007 and 31 March 2008. Diagnostic practices of the medical practitioners for establishing the diagnosis of different types of EPTB were studied. Results: For the diagnosi\\\\'",)

Usually this is due to dask's dtype inference failing, and
*may* be fixed by specifying dtypes manually by adding:

dtype={'ARTICLE_ID': 'object'}

to the call to `read_csv`/`read_table`.

Run Code Online (Sandbox Code Playgroud)

Answer 1

mdu*_*ant 9

该消息表明您将呼叫从

df = dd.read_csv('mylocation.csv', ...)

Run Code Online (Sandbox Code Playgroud)

至

df = dd.read_csv('mylocation.csv', ..., dtype={'ARTICLE_ID': 'object'})

Run Code Online (Sandbox Code Playgroud)

您应该在此处更改文件位置以及之前使用的其他任何参数。如果仍然无法解决问题，请更新您的问题。

当一个文件夹中有多个文件，并且正在读取一个文件夹中的所有文件时，某些文件将没有特定的列。您将如何处理此案？ (2认同)
另外 `dtype='object' ` 作为 read_csv 的参数也有效。 (2认同)

归档时间：	7 年，5 月前
查看次数：	3260 次
最近记录：	6 年，3 月前