我的目标是在PDF中的注释中查找给定模式的JavaScript.为此,我提供了以下代码:
public static void main(String[] args) {
try {
// Reads and parses a PDF document
PdfReader reader = new PdfReader("Test.pdf");
// For each PDF page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
// Get a page a PDF page
PdfDictionary page = reader.getPageN(i);
// Get all the annotations of page i
PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);
// If page does not have annotations
if (page.getAsArray(PdfName.ANNOTS) == null) {
continue;
}
// For each annotation
for (int j = 0; j < annotsArray.size(); ++j) {
// For current annotation
PdfDictionary curAnnot = annotsArray.getAsDict(j);
// check if has JS as described below
PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A);
// test if it is a JavaScript action
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
// what here?
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
Run Code Online (Sandbox Code Playgroud)
据我所知,比较字符串是由StringCompare库完成的.问题是它比较了两个字符串,但我很想知道注释中的JavaScript操作是否以(或包含)此字符串开头:if (this.hostContainer) { try {
那么,如何检查注释中的JavaScript是否包含上述字符串?
使用JS的EDIT示例页面位于:pdf with JS
JavaScript 操作在 ISO 32000-1 中定义如下:
\n\n\n\n\n12.6.4.16 JavaScript 操作
\n\n调用 JavaScript 操作后,符合要求的处理器应执行用 JavaScript 编程语言编写的脚本。根据脚本的性质,文档中的各种交互式表单字段可能会更新其值或更改其视觉外观。Mozilla Development Center\xe2\x80\x99s 客户端 JavaScript 参考和 Adobe JavaScript for Acrobat API 参考(请参阅参考书目)详细介绍了 JavaScript 脚本的内容和效果。表 217 显示了特定于此类型动作的动作字典条目。
\n\n表 217 \xe2\x80\x93 特定于 JavaScript 操作的其他条目
\n\n键\n 类型\n 值
\n\nS \n name\n (必需)此字典描述的操作类型;JavaScript 操作应为 JavaScript。
\n\nJS \n 文本字符串或\n 文本流\n (必需)包含要执行的 JavaScript 脚本的文本字符串或文本流。\n 应使用 PDFDocEncoding 或 Unicode 编码(后者由 Unicode 前缀 U+FEFF 标识)对字符串或流的内容进行编码。
\n\n为了支持在JavaScript脚本中使用参数化函数调用,PDF 文档\xe2\x80\x99s 名称字典中的 JavaScript 条目(请参阅 7.7.4,\xe2\x80\x9cName Dictionary\xe2\x80\x9d)可能包含将名称字符串映射到文档级 JavaScript 操作的名称树。打开文档时,应执行此名称树中的所有操作,定义供文档中其他脚本使用的 JavaScript 函数。
\n
因此,如果您有兴趣知道注释中的 JavaScript 操作是否以此字符串开头(或包含):if (this.hostContainer) { try {
在这种情况下
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){\n // what here?\n }\n
Run Code Online (Sandbox Code Playgroud)\n\n您可能需要首先检查是否AnnotationAction.Get(PdfName.JS)
是 aPdfString
或 a PdfStream
,在任何一种情况下都以字符串形式检索内容,然后检查它或它调用的任何函数(该函数可能在 JavaScript 名称树中定义)是否包含您搜索的字符串使用通常的字符串比较方法。
我获取了您的代码,对其进行了一些清理(特别是它是 C# 和 Java 的混合),并添加了如上所述的代码,检查注释操作元素中的直接 JavaScript 代码:
\n\nSystem.out.println("file.pdf - Looking for special JavaScript actions.");\n// Reads and parses a PDF document\nPdfReader reader = new PdfReader(resource);\n\n// For each PDF page\nfor (int i = 1; i <= reader.getNumberOfPages(); i++)\n{\n System.out.printf("\\nPage %d\\n", i);\n // Get a page a PDF page\n PdfDictionary page = reader.getPageN(i);\n // Get all the annotations of page i\n PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);\n\n // If page does not have annotations\n if (annotsArray == null)\n {\n System.out.printf("No annotations.\\n", i);\n continue;\n }\n\n // For each annotation\n for (int j = 0; j < annotsArray.size(); ++j)\n {\n System.out.printf("Annotation %d - ", j);\n\n // For current annotation\n PdfDictionary curAnnot = annotsArray.getAsDict(j);\n\n // check if has JS as described below\n PdfDictionary annotationAction = curAnnot.getAsDict(PdfName.A);\n if (annotationAction == null)\n {\n System.out.print("no action");\n }\n // test if it is a JavaScript action\n else if (PdfName.JAVASCRIPT.equals(annotationAction.get(PdfName.S)))\n {\n PdfObject scriptObject = annotationAction.getDirectObject(PdfName.JS);\n if (scriptObject == null)\n {\n System.out.print("missing JS entry");\n continue;\n }\n final String script;\n if (scriptObject.isString())\n script = ((PdfString)scriptObject).toUnicodeString();\n else if (scriptObject.isStream())\n {\n try ( ByteArrayOutputStream baos = new ByteArrayOutputStream() )\n {\n ((PdfStream)scriptObject).writeContent(baos);\n script = baos.toString("ISO-8859-1");\n }\n }\n else\n {\n System.out.println("malformed JS entry");\n continue;\n }\n\n if (script.contains("if (this.hostContainer) { try {"))\n System.out.print("contains test string - ");\n\n System.out.printf("\\n---\\n%s\\n---", script);\n // what here?\n }\n else\n {\n System.out.print("no JavaScript action");\n }\n System.out.println();\n }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n(测试SearchActionJavaScript,方法testSearchJsActionInFile
)
using (PdfReader reader = new PdfReader(sourcePath))\n{\n Console.WriteLine("file.pdf - Looking for special JavaScript actions.");\n\n // For each PDF page\n for (int i = 1; i <= reader.NumberOfPages; i++)\n {\n Console.Write("\\nPage {0}\\n", i);\n // Get a page a PDF page\n PdfDictionary page = reader.GetPageN(i);\n // Get all the annotations of page i\n PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS);\n\n // If page does not have annotations\n if (annotsArray == null)\n {\n Console.WriteLine("No annotations.");\n continue;\n }\n\n // For each annotation\n for (int j = 0; j < annotsArray.Size; ++j)\n {\n Console.Write("Annotation {0} - ", j);\n\n // For current annotation\n PdfDictionary curAnnot = annotsArray.GetAsDict(j);\n\n // check if has JS as described below\n PdfDictionary annotationAction = curAnnot.GetAsDict(PdfName.A);\n if (annotationAction == null)\n {\n Console.Write("no action");\n }\n // test if it is a JavaScript action\n else if (PdfName.JAVASCRIPT.Equals(annotationAction.Get(PdfName.S)))\n {\n PdfObject scriptObject = annotationAction.GetDirectObject(PdfName.JS);\n if (scriptObject == null)\n {\n Console.WriteLine("missing JS entry");\n continue;\n }\n String script;\n if (scriptObject.IsString())\n script = ((PdfString)scriptObject).ToUnicodeString();\n else if (scriptObject.IsStream())\n {\n using (MemoryStream stream = new MemoryStream())\n {\n ((PdfStream)scriptObject).WriteContent(stream);\n script = stream.ToString();\n }\n }\n else\n {\n Console.WriteLine("malformed JS entry");\n continue;\n }\n\n if (script.Contains("if (this.hostContainer) { try {"))\n Console.Write("contains test string - ");\n\n Console.Write("\\n---\\n{0}\\n---", script);\n // what here?\n }\n else\n {\n Console.Write("no JavaScript action");\n }\n Console.WriteLine();\n }\n }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n当针对示例文件运行任一版本时,都会得到:
\n\nfile.pdf - Looking for special JavaScript actions.\n\nPage 1\nAnnotation 0 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_vii\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 1 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_ix\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 2 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_xi\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 3 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_3\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 4 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_15\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 5 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_37\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 6 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_57\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 7 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_81\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 8 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_111\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 9 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_136\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 10 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_160\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 11 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_197\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 12 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_179\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 13 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_201\', 0]);\n} catch(e) { console.println(e); }};\n---\nAnnotation 14 - contains test string - \n---\nif (this.hostContainer) { try {\nthis.hostContainer.postMessage([\'newPage\', \'pp_223\', 0]);\n} catch(e) { console.println(e); }};\n---\n\nPage 2\nNo annotations.\n\nPage 3\nNo annotations.\n
Run Code Online (Sandbox Code Playgroud)\n