Unk*_*ech 32 php ms-word read-write
是否可以在不使用COM对象的情况下在PHP中读取和写入Word(2003和2007)文件?我知道我可以:
$file = fopen('c:\file.doc', 'w+');
fwrite($file, $text);
fclose();
Run Code Online (Sandbox Code Playgroud)
但Word会将其读作HTML文件而不是本机.doc文件.
Ste*_*rig 29
读取二进制Word文档将涉及根据DOC格式的已发布文件格式规范创建解析器.我认为这不是真正可行的解决方案.
您可以使用Microsoft Office XML格式来读取和写入Word文件 - 这与Word和2003版本的Word兼容.对于阅读,您必须确保以正确的格式保存Word文档(在Word 2007中称为Word 2003 XML-Document).对于编写,您只需遵循公开可用的XML模式.我从来没有使用这种格式从PHP写出Office文档,但是我用它来读取Excel工作表(自然保存为XML-Spreadsheet 2003)并在网页上显示其数据.由于文件显然是XML数据,因此在内部导航并找出如何提取所需数据是没有问题的.
另一个选项 - 仅限Word 2007选项(如果Word 2003中未安装OpenXML文件格式) - 将重新编写OpenXML.正如databyss在这里指出的那样,DOCX文件格式只是一个包含XML文件的ZIP存档.MSDN上有很多关于OpenXML文件格式的资源,因此您应该能够弄清楚如何读取所需的数据.我认为写作会复杂得多 - 这取决于你投入多少时间.
也许您可以查看PHPExcel,它是一个能够写入Excel 2007文件并使用OpenXML标准从Excel 2007文件读取的库.您可以在尝试读取和编写OpenXML Word文档时了解所涉及的工作.
小智 18
这适用于vs <office 2007及其纯PHP,没有COM废话,仍然试图计算2007
<?php
/*****************************************************************
This approach uses detection of NUL (chr(00)) and end line (chr(13))
to decide where the text is:
- divide the file contents up by chr(13)
- reject any slices containing a NUL
- stitch the rest together again
- clean up with a regular expression
*****************************************************************/
function parseWord($userDoc)
{
$fileHandle = fopen($userDoc, "r");
$line = @fread($fileHandle, filesize($userDoc));
$lines = explode(chr(0x0D),$line);
$outtext = "";
foreach($lines as $thisline)
{
$pos = strpos($thisline, chr(0x00));
if (($pos !== FALSE)||(strlen($thisline)==0))
{
} else {
$outtext .= $thisline." ";
}
}
$outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/","",$outtext);
return $outtext;
}
$userDoc = "cv.doc";
$text = parseWord($userDoc);
echo $text;
?>
Run Code Online (Sandbox Code Playgroud)
您可以使用Antiword,它是适用于Linux和大多数流行操作系统的免费MS Word阅读器.
$document_file = 'c:\file.doc';
$text_from_doc = shell_exec('/usr/local/bin/antiword '.$document_file);
Run Code Online (Sandbox Code Playgroud)
我不知道在PHP中阅读本机Word文档,但如果你想用PHP编写Word文档,WordprocessingML(又名WordML)可能是一个很好的解决方案.您所要做的就是以正确的格式创建XML文档.我相信Word 2003和2007都支持WordML.
小智 6
只是更新代码
<?php
/*****************************************************************
This approach uses detection of NUL (chr(00)) and end line (chr(13))
to decide where the text is:
- divide the file contents up by chr(13)
- reject any slices containing a NUL
- stitch the rest together again
- clean up with a regular expression
*****************************************************************/
function parseWord($userDoc)
{
$fileHandle = fopen($userDoc, "r");
$word_text = @fread($fileHandle, filesize($userDoc));
$line = "";
$tam = filesize($userDoc);
$nulos = 0;
$caracteres = 0;
for($i=1536; $i<$tam; $i++)
{
$line .= $word_text[$i];
if( $word_text[$i] == 0)
{
$nulos++;
}
else
{
$nulos=0;
$caracteres++;
}
if( $nulos>1996)
{
break;
}
}
//echo $caracteres;
$lines = explode(chr(0x0D),$line);
//$outtext = "<pre>";
$outtext = "";
foreach($lines as $thisline)
{
$tam = strlen($thisline);
if( !$tam )
{
continue;
}
$new_line = "";
for($i=0; $i<$tam; $i++)
{
$onechar = $thisline[$i];
if( $onechar > chr(240) )
{
continue;
}
if( $onechar >= chr(0x20) )
{
$caracteres++;
$new_line .= $onechar;
}
if( $onechar == chr(0x14) )
{
$new_line .= "</a>";
}
if( $onechar == chr(0x07) )
{
$new_line .= "\t";
if( isset($thisline[$i+1]) )
{
if( $thisline[$i+1] == chr(0x07) )
{
$new_line .= "\n";
}
}
}
}
//troca por hiperlink
$new_line = str_replace("HYPERLINK" ,"<a href=",$new_line);
$new_line = str_replace("\o" ,">",$new_line);
$new_line .= "\n";
//link de imagens
$new_line = str_replace("INCLUDEPICTURE" ,"<br><img src=",$new_line);
$new_line = str_replace("\*" ,"><br>",$new_line);
$new_line = str_replace("MERGEFORMATINET" ,"",$new_line);
$outtext .= nl2br($new_line);
}
return $outtext;
}
$userDoc = "custo.doc";
$userDoc = "Cultura.doc";
$text = parseWord($userDoc);
echo $text;
?>
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
123354 次 |
最近记录: |