用熊猫处理路径的最佳方法

Question

用熊猫处理路径的最佳方法

当我有一个pd.DataFramewith 路径时，我最终做了很多.map(lambda path: Path(path).{method_name}，或者apply(axis=1)例如：

(
    pd.DataFrame({'base_dir': ['dir_A', 'dir_B'], 'file_name': ['file_0', 'file_1']})
    .assign(full_path=lambda df: df.apply(lambda row: Path(row.base_dir) / row.file_name, axis=1))
)
  base_dir file_name     full_path
0    dir_A    file_0  dir_A/file_0
1    dir_B    file_1  dir_B/file_1

Run Code Online (Sandbox Code Playgroud)

这对我来说似乎很奇怪，尤其是因为pathlib确实实现了，/所以类似的东西df.base_dir / df.file_name会更加 Pythonic 和自然。

我还没有找到path在 Pandas 中实现的任何类型，有什么我遗漏的吗？

编辑

我发现最好一次做一次，astype(path)然后至少对路径连接进行pathlib矢量化：

(
    pd.DataFrame({'base_dir': ['dir_A', 'dir_B'], 'file_name': ['file_0', 'file_1']})
    # this is where I would expect `astype({'base_dir': Path})`
    .assign(**{col_name:lambda df: df[col_name].map(Path) for col_name in ["base_dir", "file_name"]})
    .assign(full_path=lambda df: df.base_dir / df.file_name)
)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Roy*_*012 1

看起来最简单的方法是：

df.base_dir.map(Path) / df.file_name.map(Path)

Run Code Online (Sandbox Code Playgroud)

它节省了对 lambda 函数的需求，但您仍然需要映射到“Path”。

或者，只需执行以下操作：

df.base_dir.str.cat(df.file_name, sep="/")

Run Code Online (Sandbox Code Playgroud)

后者不能在 Windows 上运行（谁在乎，对吧？:)，但可能运行得更快。

归档时间：	5 年，4 月前
查看次数：	572 次
最近记录：	4 年，6 月前