我尝试使用Sitecore 7索引PDF文件.我安装了IFilter,但我收到了爬虫日志下一个错误:
ManagedPoolThread #17 09:24:20 WARN LuceneIndexOperations : Update : Could not build document data 4433434-3443-3223-91c4-233232. Skipping.
Exception: System.Runtime.InteropServices.COMException
Message: Error HRESULT E_FAIL has been returned from a call to a COM component.
Source: mscorlib
at System.Runtime.InteropServices.ComTypes.IPersistFile.Load(String pszFileName, Int32 dwMode)
at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterReader..ctor(String fileName)
at Sitecore.ContentSearch.ComputedFields.MediaItemIFilterTextExtractor.ComputeFieldValue(IIndexable indexable)
at Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor.ComputeFieldValue(IIndexable indexable)
at Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder.AddComputedIndexFields()
at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.GetIndexData(IIndexable indexable, IIndexable latestVersion, IProviderUpdateContext context)
at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.BuildDataToIndex(IProviderUpdateContext context, IIndexable version, IIndexable latestVersion)
at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.<>c__DisplayClass7.<Update>b__0(Item version)
Run Code Online (Sandbox Code Playgroud)
我必须做什么工作,因为在Sitecore文档中他们说它必须开箱即用.
小智 5
我遇到了同样的问题,我收到了Sitecore支持下一个回复(之后工作正常):
1)将所有Adobe iFilter .dll文件复制到"\ System32\Inetsrv"文件夹中.这是Windows Server上IIS的工作目录.默认情况下,Adobe iFilter .dll文件存储在"C:\ Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin"文件夹中.您还可以使用"IFilter Explorer"工具检测存储.dll文件的文件夹:http: //www.citeknet.com/Products/IFilters/IFilterExplorer/tabid/62/Default.aspx 有关详细信息,请参阅截图:http: //screencast.com/t/xmWukanM+
2)删除"Website/App_Data/MediaCache"文件夹下的所有文件;
3)重建Sitecore搜索索引(Sitecore - >控制面板 - >索引 - >索引管理器);
4)清除Sitecore缓存(http:// {hostname} /sitecore/admin/cache.aspx工具); 5)重启IIS;