ukb*_*baz 6 python data-analysis dataframe python-3.x pandas
我有日志文件,其中有许多行以下列形式:
LogLevel [13/10/2015 00:30:00.650] [Message Text]
Run Code Online (Sandbox Code Playgroud)
我的目标是将日志文件中的每一行转换为一个漂亮的数据框架.我已经厌倦了这样做,通过分割[字符上的线条,但是我仍然没有得到一个整洁的数据帧.
我的代码:
level = []
time = []
text = []
with open(filename) as inf:
for line in inf:
parts = line.split('[')
if len(parts) > 1:
level = parts[0]
time = parts[1]
text = parts[2]
print (parts[0],parts[1],parts[2])
s1 = pd.Series({'Level':level, 'Time': time, 'Text':text})
df = pd.DataFrame(s1).reset_index()
Run Code Online (Sandbox Code Playgroud)
继承我打印的数据框:
Info 10/08/16 10:56:09.843] In Function CCatalinaPrinter::ItemDescription()]
Info 10/08/16 10:56:09.843] Sending UPC Description Message ]
Run Code Online (Sandbox Code Playgroud)
如何改进这个以去除空白和另一个']'字符
谢谢
jez*_*ael 10
您可以使用read_csv
分隔符\s*\[
- 空格[
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""LogLevel [13/10/2015 00:30:00.650] [Message Text]
LogLevel [13/10/2015 00:30:00.650] [Message Text]
LogLevel [13/10/2015 00:30:00.650] [Message Text]
LogLevel [13/10/2015 00:30:00.650] [Message Text]"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), sep="\s*\[", names=['Level','Time','Text'], engine='python')
Run Code Online (Sandbox Code Playgroud)
然后取出]
用strip
和转换列Time
to_datetime
:
df.Time = pd.to_datetime(df.Time.str.strip(']'), format='%d/%m/%Y %H:%M:%S.%f')
df.Text = df.Text.str.strip(']')
print (df)
Level Time Text
0 LogLevel 2015-10-13 00:30:00.650 Message Text
1 LogLevel 2015-10-13 00:30:00.650 Message Text
2 LogLevel 2015-10-13 00:30:00.650 Message Text
3 LogLevel 2015-10-13 00:30:00.650 Message Text
print (df.dtypes)
Level object
Time datetime64[ns]
Text object
dtype: object
Run Code Online (Sandbox Code Playgroud)