我无法想出一种方法来进一步减少该程序的内存使用量.这是我迄今为止最有效的实现:
columns = ['eventName', 'sessionId', "eventTime", "items", "currentPage", "browserType"]
df = pd.DataFrame(columns=columns)
l = []
for i, file in enumerate(glob.glob("*.log")):
print("Going through log file #%s named %s..." % (i+1, file))
with open(file) as myfile:
l += [json.loads(line) for line in myfile]
tempdata = pd.DataFrame(l)
for column in tempdata.columns:
if not column in columns:
try:
tempdata.drop(column, axis=1, inplace=True)
except ValueError:
print ("oh no! We've got a problem with %s column! It don't exist!" % (badcolumn))
l = []
df = df.append(tempdata, …Run Code Online (Sandbox Code Playgroud)