对字段之间具有不等数量空格的字符串进行标记

gat*_*rad 1 python split tokenize

我是tryint来标记文件中的条目.但是line.split("")由于文件之间的空格数不等,我无法使用该选项.我正在从我的文件中复制几行:

"08-09-2010 21:21:46      00:22:7f:a6:9b:69                                 -79"
"08-09-2010 21:21:46      04:4f:aa:b4:49:49                                 -79"
"08-09-2010 21:21:46      04:4f:aa:31:4e:59   tikona 18002090044            -83"
"08-09-2010 21:21:46      00:22:7f:26:9b:69   tikona 18002090044            -74"
"08-09-2010 21:21:46      04:4f:aa:34:0d:c9   tikona 18002090044            -82"
"08-09-2010 21:21:46      04:4f:aa:71:4e:59                                 -85"
"08-09-2010 21:21:46      04:4f:aa:34:21:89   tikona 18002090044            -75"
"08-09-2010 21:21:46      04:4f:aa:34:49:49   tikona 18002090044            -77"
"08-09-2010 21:21:46      04:4f:aa:74:0d:c9                                 -85"
"08-09-2010 21:22:47      18 APs were seen
"
Run Code Online (Sandbox Code Playgroud)

我需要访问第一列(这是一个datetime对象)第二列(00:22...)和最后一列(-79等).我可以轻松访问第一列和第二列,但不能访问最后一列.当我这样做时info=line.spilt(""),由于第三列可能或可能没有条目,我无法确定令牌号.

我如何访问第4列?有没有办法可以使用info[i].contains(" -")

mar*_*cog 7

列看起来是固定宽度的,在这种情况下,您可以使用字符串切片,然后可以.strip()删除尾随空格:

>>> for line in data.split('\n'):
...     print (line[1:25].strip(), line[26:45].strip(), line[46:69].strip(), line[70:-1].strip())
... 
('08-09-2010 21:21:46', '00:22:7f:a6:9b:69', '', '-79')
('08-09-2010 21:21:46', '04:4f:aa:b4:49:49', '', '-79')
('08-09-2010 21:21:46', '04:4f:aa:31:4e:59', 'tikona 18002090044', '-83')
('08-09-2010 21:21:46', '00:22:7f:26:9b:69', 'tikona 18002090044', '-74')
('08-09-2010 21:21:46', '04:4f:aa:34:0d:c9', 'tikona 18002090044', '-82')
('08-09-2010 21:21:46', '04:4f:aa:71:4e:59', '', '-85')
('08-09-2010 21:21:46', '04:4f:aa:34:21:89', 'tikona 18002090044', '-75')
('08-09-2010 21:21:46', '04:4f:aa:34:49:49', 'tikona 18002090044', '-77')
('08-09-2010 21:21:46', '04:4f:aa:74:0d:c9', '', '-85')
('08-09-2010 21:22:47', '18 APs were seen', '', '')
('', '', '', '')
Run Code Online (Sandbox Code Playgroud)

('', '', '', '')来自最终输入线之中".

如果列不是固定宽度,那么您仍然可以使用索引.split()获取最后一-1.虽然你应该.split()谨慎使用,因为当"正确"完成时有点乱.我建议使用双空格作为分隔符来处理18 APs were seen大小写,但请注意,这会更改第二列的索引.

>>> for line in data.split('\n'):
...     fields = line.split('  ')
...     print (fields[0], fields[3], fields[-1])
... 
('"08-09-2010 21:21:46', '00:22:7f:a6:9b:69', ' -79"')
('"08-09-2010 21:21:46', '04:4f:aa:b4:49:49', ' -79"')
('"08-09-2010 21:21:46', '04:4f:aa:31:4e:59', '-83"')
('"08-09-2010 21:21:46', '00:22:7f:26:9b:69', '-74"')
('"08-09-2010 21:21:46', '04:4f:aa:34:0d:c9', '-82"')
('"08-09-2010 21:21:46', '04:4f:aa:71:4e:59', ' -85"')
('"08-09-2010 21:21:46', '04:4f:aa:34:21:89', '-75"')
('"08-09-2010 21:21:46', '04:4f:aa:34:49:49', '-77"')
('"08-09-2010 21:21:46', '04:4f:aa:74:0d:c9', ' -85"')
('"08-09-2010 21:22:47', '18 APs were seen', '18 APs were seen')
('"08-09-2010 21:21:46', '00:22:7f:26:9b:69', '-74"')
Traceback (most recent call last):
  File "<input>", line 3, in <module>
IndexError: list index out of range
Run Code Online (Sandbox Code Playgroud)

IndexError是由于你的上一个输入行.如果这是真正的输入,您应该捕获此错误.