如何将 .bin 解码为 .pdf

Mat*_*nez 2 pdf base64 python-3.x

我从 Excel 电子表格中提取了一个嵌入对象,该电子表格是 pdf 文件,但 Excel zip 文件将嵌入对象保存为二进制文件。

我正在尝试读取二进制文件并将其返回到原始格式(pdf)。我从另一个有类似问题的问题中获取了一些代码,但是当我尝试打开 pdf adobe 时,出现错误“无法打开,因为文件已损坏...未正确解码...”

有谁知道有什么方法可以做到这一点?

with open('oleObject1.bin','rb') as f: 
    binaryData = f.read() 
print(binaryData)

with open(os.path.expanduser('test1.pdf'), 'wb') as fout:
    fout.write(base64.decodebytes(binaryData))
Run Code Online (Sandbox Code Playgroud)

链接到 github 上的目标文件

Mat*_*nez 5

谢谢瑞安,我能明白你在说什么。这是解决方案供将来参考。

str1 = b'%PDF-'  # Begin PDF
str2 = b'%%EOF'  # End PDF

with open('oleObject1.bin', 'rb') as f:
    binary_data = f.read()
print(binary_data)

# Convert BYTE to BYTEARRAY
binary_byte_array = bytearray(binary_data)

# Find where PDF begins
result1 = binary_byte_array.find(str1)
print(result1)

# Remove all characters before PDF begins
del binary_byte_array[:result1]
print(binary_byte_array)

# Find where PDF ends
result2 = binary_byte_array.find(str2)
print(result2)

# Subtract the length of the array from the position of where PDF ends (add 5 for %%OEF characters)
# and delete that many characters from end of array
print(len(binary_byte_array))
to_remove = len(binary_byte_array) - (result2 + 5)
print(to_remove)

del binary_byte_array[-to_remove:]
print(binary_byte_array)

with open(os.path.expanduser('test1.pdf'), 'wb') as fout:
    fout.write(binary_byte_array)
Run Code Online (Sandbox Code Playgroud)