read_csv转换器用于未知列

Question

read_csv转换器用于未知列

我正在尝试读取一个csv文件,该文件在每个单元格中包含多个值,并且我想将它们编码为单个int格式的字节以存储在pandas单元格中(例如(1,1) - > 771).为此,我想使用read_csv函数的converter参数.问题是我不知道前面列的名称,传递给转换器的值应该是一个以列名作为键的dict.实际上我想用相同的转换器函数转换所有列.为此,最好写一下:

read_csv(fhand, converter=my_endocing_function)

Run Code Online (Sandbox Code Playgroud)

比:

read_csv(fhand, converters={'col1':my_endocing_function,
                            'col2':my_endocing_function,
                            'col3':my_endocing_function,})

Run Code Online (Sandbox Code Playgroud)

有可能吗？现在解决我正在做的问题:

dataframe = read_csv(fhand)
enc_func = numpy.vectorize(encoder.encode_genotype)
dataframe = dataframe.apply(enc_func, axis=1)

Run Code Online (Sandbox Code Playgroud)

但我想这种方法可能效率较低.顺便说一句,我对to_string方法使用的格式化程序有类似的疑虑.

Answer 1

Wes*_*ney 3

您可以传递整数 (0, 1, 2) 而不是名称。从文档字符串：

converters : dict. optional
    Dict of functions for converting values in certain columns. Keys can either
    be integers or column labels

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，11 月前
查看次数：	4251 次
最近记录：	9 年，1 月前