Sam*_*ron 28 .net c# filesystems
我在.NET中编写目录扫描程序.
对于每个文件/目录,我需要以下信息.
class Info {
public bool IsDirectory;
public string Path;
public DateTime ModifiedDate;
public DateTime CreatedDate;
}
Run Code Online (Sandbox Code Playgroud)
我有这个功能:
static List<Info> RecursiveMovieFolderScan(string path){
var info = new List<Info>();
var dirInfo = new DirectoryInfo(path);
foreach (var dir in dirInfo.GetDirectories()) {
info.Add(new Info() {
IsDirectory = true,
CreatedDate = dir.CreationTimeUtc,
ModifiedDate = dir.LastWriteTimeUtc,
Path = dir.FullName
});
info.AddRange(RecursiveMovieFolderScan(dir.FullName));
}
foreach (var file in dirInfo.GetFiles()) {
info.Add(new Info()
{
IsDirectory = false,
CreatedDate = file.CreationTimeUtc,
ModifiedDate = file.LastWriteTimeUtc,
Path = file.FullName
});
}
return info;
}
Run Code Online (Sandbox Code Playgroud)
原来这个实现很慢.有什么方法可以加快速度吗?我正在考虑使用FindFirstFileW手动编码,但是如果有更快的内置方式,我希望避免这种情况.
Sam*_*ron 39
这种需要稍微调整的实现速度要快5到10倍.
static List<Info> RecursiveScan2(string directory) {
IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);
WIN32_FIND_DATAW findData;
IntPtr findHandle = INVALID_HANDLE_VALUE;
var info = new List<Info>();
try {
findHandle = FindFirstFileW(directory + @"\*", out findData);
if (findHandle != INVALID_HANDLE_VALUE) {
do {
if (findData.cFileName == "." || findData.cFileName == "..") continue;
string fullpath = directory + (directory.EndsWith("\\") ? "" : "\\") + findData.cFileName;
bool isDir = false;
if ((findData.dwFileAttributes & FileAttributes.Directory) != 0) {
isDir = true;
info.AddRange(RecursiveScan2(fullpath));
}
info.Add(new Info()
{
CreatedDate = findData.ftCreationTime.ToDateTime(),
ModifiedDate = findData.ftLastWriteTime.ToDateTime(),
IsDirectory = isDir,
Path = fullpath
});
}
while (FindNextFile(findHandle, out findData));
}
} finally {
if (findHandle != INVALID_HANDLE_VALUE) FindClose(findHandle);
}
return info;
}
Run Code Online (Sandbox Code Playgroud)
扩展方法:
public static class FILETIMEExtensions {
public static DateTime ToDateTime(this System.Runtime.InteropServices.ComTypes.FILETIME filetime ) {
long highBits = filetime.dwHighDateTime;
highBits = highBits << 32;
return DateTime.FromFileTimeUtc(highBits + (long)filetime.dwLowDateTime);
}
}
Run Code Online (Sandbox Code Playgroud)
互操作定义是:
[DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
public static extern IntPtr FindFirstFileW(string lpFileName, out WIN32_FIND_DATAW lpFindFileData);
[DllImport("kernel32.dll", CharSet = CharSet.Unicode)]
public static extern bool FindNextFile(IntPtr hFindFile, out WIN32_FIND_DATAW lpFindFileData);
[DllImport("kernel32.dll")]
public static extern bool FindClose(IntPtr hFindFile);
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct WIN32_FIND_DATAW {
public FileAttributes dwFileAttributes;
internal System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
internal System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
internal System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public int nFileSizeHigh;
public int nFileSizeLow;
public int dwReserved0;
public int dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
Run Code Online (Sandbox Code Playgroud)
.NET文件枚举方法的历史很长很慢.问题是没有一种枚举大型目录结构的即时方法.即使是这里接受的答案也有GC分配的问题.
我能做的最好的事情就是包含在我的库中,并作为CSharpTest.Net.IO命名空间中的FileFile(source)类公开.此类可以枚举文件和文件夹,而无需不必要的GC分配和字符串编组.
用法很简单,RaiseOnAccessDenied属性将跳过用户无权访问的目录和文件:
private static long SizeOf(string directory)
{
var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
fcounter.RaiseOnAccessDenied = false;
long size = 0, total = 0;
fcounter.FileFound +=
(o, e) =>
{
if (!e.IsDirectory)
{
Interlocked.Increment(ref total);
size += e.Length;
}
};
Stopwatch sw = Stopwatch.StartNew();
fcounter.Find();
Console.WriteLine("Enumerated {0:n0} files totaling {1:n0} bytes in {2:n3} seconds.",
total, size, sw.Elapsed.TotalSeconds);
return size;
}
Run Code Online (Sandbox Code Playgroud)
对于我的本地C:\驱动器,它输出以下内容:
枚举810,046个文件,总计307,707,792,662个字节,232.876秒.
您的里程可能因驱动器速度而异,但这是我发现的在托管代码中枚举文件的最快方法.event参数是FindFile.FileFoundEventArgs类型的变异类,因此请确保不保留对它的引用,因为它的值将针对引发的每个事件而更改.
您可能还会注意到DateTime的公开仅以UTC格式显示.原因是转换到当地时间是半昂贵的.您可以考虑使用UTC时间来提高性能,而不是将这些转换为本地时间.
根据您尝试削减函数的时间,可能值得您直接调用Win32 API函数,因为现有的API会执行大量额外处理来检查您可能不感兴趣的内容.
如果您还没有这样做,并且假设您不打算为Mono项目做出贡献,我强烈建议您下载Reflector,并了解Microsoft如何实现您当前使用的API调用.这将让您了解需要打电话的内容以及可以省略的内容.
例如,您可以选择创建一个带有yield目录名称的迭代器而不是一个返回列表的函数,这样您就不会在所有不同级别的代码中对同一个名称列表进行两次或三次迭代.