gat*_*rad 1 python split tokenize
我是tryint来标记文件中的条目.但是line.split("")由于文件之间的空格数不等,我无法使用该选项.我正在从我的文件中复制几行:
"08-09-2010 21:21:46 00:22:7f:a6:9b:69 -79"
"08-09-2010 21:21:46 04:4f:aa:b4:49:49 -79"
"08-09-2010 21:21:46 04:4f:aa:31:4e:59 tikona 18002090044 -83"
"08-09-2010 21:21:46 00:22:7f:26:9b:69 tikona 18002090044 -74"
"08-09-2010 21:21:46 04:4f:aa:34:0d:c9 tikona 18002090044 -82"
"08-09-2010 21:21:46 04:4f:aa:71:4e:59 -85"
"08-09-2010 21:21:46 04:4f:aa:34:21:89 tikona 18002090044 -75"
"08-09-2010 21:21:46 04:4f:aa:34:49:49 tikona 18002090044 -77"
"08-09-2010 21:21:46 04:4f:aa:74:0d:c9 -85"
"08-09-2010 21:22:47 18 APs were seen
"
Run Code Online (Sandbox Code Playgroud)
我需要访问第一列(这是一个datetime对象)第二列(00:22...)和最后一列(-79等).我可以轻松访问第一列和第二列,但不能访问最后一列.当我这样做时info=line.spilt(""),由于第三列可能或可能没有条目,我无法确定令牌号.
我如何访问第4列?有没有办法可以使用info[i].contains(" -")?
列看起来是固定宽度的,在这种情况下,您可以使用字符串切片,然后可以.strip()删除尾随空格:
>>> for line in data.split('\n'):
... print (line[1:25].strip(), line[26:45].strip(), line[46:69].strip(), line[70:-1].strip())
...
('08-09-2010 21:21:46', '00:22:7f:a6:9b:69', '', '-79')
('08-09-2010 21:21:46', '04:4f:aa:b4:49:49', '', '-79')
('08-09-2010 21:21:46', '04:4f:aa:31:4e:59', 'tikona 18002090044', '-83')
('08-09-2010 21:21:46', '00:22:7f:26:9b:69', 'tikona 18002090044', '-74')
('08-09-2010 21:21:46', '04:4f:aa:34:0d:c9', 'tikona 18002090044', '-82')
('08-09-2010 21:21:46', '04:4f:aa:71:4e:59', '', '-85')
('08-09-2010 21:21:46', '04:4f:aa:34:21:89', 'tikona 18002090044', '-75')
('08-09-2010 21:21:46', '04:4f:aa:34:49:49', 'tikona 18002090044', '-77')
('08-09-2010 21:21:46', '04:4f:aa:74:0d:c9', '', '-85')
('08-09-2010 21:22:47', '18 APs were seen', '', '')
('', '', '', '')
Run Code Online (Sandbox Code Playgroud)
将('', '', '', '')来自最终输入线之中".
如果列不是固定宽度,那么您仍然可以使用索引.split()获取最后一列-1.虽然你应该.split()谨慎使用,因为当"正确"完成时有点乱.我建议使用双空格作为分隔符来处理18 APs were seen大小写,但请注意,这会更改第二列的索引.
>>> for line in data.split('\n'):
... fields = line.split(' ')
... print (fields[0], fields[3], fields[-1])
...
('"08-09-2010 21:21:46', '00:22:7f:a6:9b:69', ' -79"')
('"08-09-2010 21:21:46', '04:4f:aa:b4:49:49', ' -79"')
('"08-09-2010 21:21:46', '04:4f:aa:31:4e:59', '-83"')
('"08-09-2010 21:21:46', '00:22:7f:26:9b:69', '-74"')
('"08-09-2010 21:21:46', '04:4f:aa:34:0d:c9', '-82"')
('"08-09-2010 21:21:46', '04:4f:aa:71:4e:59', ' -85"')
('"08-09-2010 21:21:46', '04:4f:aa:34:21:89', '-75"')
('"08-09-2010 21:21:46', '04:4f:aa:34:49:49', '-77"')
('"08-09-2010 21:21:46', '04:4f:aa:74:0d:c9', ' -85"')
('"08-09-2010 21:22:47', '18 APs were seen', '18 APs were seen')
('"08-09-2010 21:21:46', '00:22:7f:26:9b:69', '-74"')
Traceback (most recent call last):
File "<input>", line 3, in <module>
IndexError: list index out of range
Run Code Online (Sandbox Code Playgroud)
这IndexError是由于你的上一个输入行.如果这是真正的输入,您应该捕获此错误.