正如这里所述,Polars 为 LazyFrames 引入了自动缓存机制,该机制在逻辑计划中多次出现,因此用户不必主动执行缓存。
然而,在尝试检查他们的新机制时,我遇到了自动缓存未最佳执行的情况:
没有显式缓存:
import polars as pl
df1 = pl.DataFrame({'id': [0,5,6]}).lazy()
df2 = pl.DataFrame({'id': [0,8,6]}).lazy()
df3 = pl.DataFrame({'id': [7,8,6]}).lazy()
df4 = df1.join(df2, on='id')
print(pl.concat([df4.join(df3, on='id'), df1,
df4]).explain())
Run Code Online (Sandbox Code Playgroud)
我们得到了逻辑计划:
UNION
PLAN 0:
INNER JOIN:
LEFT PLAN ON: [col("id")]
INNER JOIN:
LEFT PLAN ON: [col("id")]
CACHE[id: a4bcf9591fefc837, count: 3]
DF ["id"]; PROJECT 1/1 COLUMNS; SELECTION: "None"
RIGHT PLAN ON: [col("id")]
CACHE[id: 8cee8e3a6f454983, count: 1]
DF ["id"]; PROJECT 1/1 COLUMNS; SELECTION: "None"
END INNER JOIN
RIGHT PLAN ON: …Run Code Online (Sandbox Code Playgroud)