Tim*_*old 1 c# text xps extraction
我需要从XPS文档中提取特定页面的文本.提取的文本应该用字符串写.我需要这个使用Microsofts SpeechLib读出提取的文本.请仅在C#中举例说明.
谢谢
San*_*jay 10
将引用添加到ReachFramework
和WindowsBase
以下using
语句:
using System.Windows.Xps.Packaging;
Run Code Online (Sandbox Code Playgroud)
然后使用此代码:
XpsDocument _xpsDocument=new XpsDocument("/path",System.IO.FileAccess.Read);
IXpsFixedDocumentSequenceReader fixedDocSeqReader
=_xpsDocument.FixedDocumentSequenceReader;
IXpsFixedDocumentReader _document = fixedDocSeqReader.FixedDocuments[0];
IXpsFixedPageReader _page
= _document.FixedPages[documentViewerElement.MasterPageNumber];
StringBuilder _currentText = new StringBuilder();
System.Xml.XmlReader _pageContentReader = _page.XmlReader;
if (_pageContentReader != null)
{
while (_pageContentReader.Read())
{
if (_pageContentReader.Name == "Glyphs")
{
if (_pageContentReader.HasAttributes)
{
if (_pageContentReader.GetAttribute("UnicodeString") != null )
{
_currentText.
Append(_pageContentReader.
GetAttribute("UnicodeString"));
}
}
}
}
}
string _fullPageText = _currentText.ToString();
Run Code Online (Sandbox Code Playgroud)
文本存在于Glyphs
- > UnicodeString
字符串属性中.您必须使用XMLReader
固定页面.