我在Python中使用numpy库将CSV文件数据导入到ndarray如下:
data = np.genfromtxt('mydata.csv',
delimiter='\,', dtype=None, names=True)
Run Code Online (Sandbox Code Playgroud)
结果提供以下列名称:
print(data.dtype.names)
('row_label',
'MyDataColumn1_0',
'MyDataColumn1_1')
Run Code Online (Sandbox Code Playgroud)
原始列名称是:
row_label, My-Data-Column-1.0, My-Data-Column-1.1
Run Code Online (Sandbox Code Playgroud)
似乎NumPy是强制我的列名采用C风格的变量名格式.然而,在许多情况下,我的Python脚本需要根据列名访问列,因此我需要确保列名保持不变.要完成此任务,NumPy需要保留原始列名称,否则我需要将列名称转换NumPy为正在使用的格式.
有没有办法在导入期间保留原始列名?
如果没有,是否有一种简单的方法来转换列标签使用格式NumPy正在使用,最好使用一些NumPy功能?
如果设置names=True,则数据文件的第一行将通过此函数传递:
validate_names = NameValidator(excludelist=excludelist,
deletechars=deletechars,
case_sensitive=case_sensitive,
replace_space=replace_space)
Run Code Online (Sandbox Code Playgroud)
这些是您可以提供的选项:
excludelist : sequence, optional
A list of names to exclude. This list is appended to the default list
['return','file','print']. Excluded names are appended an underscore:
for example, `file` would become `file_`.
deletechars : str, optional
A string combining invalid characters that must be deleted from the
names.
defaultfmt : str, optional
A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
Whether to automatically strip white spaces from the variables.
replace_space : char, optional
Character(s) used in replacement of white spaces in the variables
names. By default, use a '_'.
Run Code Online (Sandbox Code Playgroud)
也许您可以尝试提供您自己deletechars的空字符串字符串。但是你最好修改并传递这个:
defaultdeletechars = set("""~!@#$%^&*()-=+~\|]}[{';: /?.>,<""")
Run Code Online (Sandbox Code Playgroud)
只需从该集合中取出句点和减号,并将其传递为:
np.genfromtxt(..., names=True, deletechars="""~!@#$%^&*()=+~\|]}[{';: /?>,<""")
Run Code Online (Sandbox Code Playgroud)
这是来源:https : //github.com/numpy/numpy/blob/master/numpy/lib/_iotools.py#l245