我一直致力于从图像中删除 Exif 元数据,然后将它们输入到一些机器学习算法中。
我的示例图像是这个,一个 100x100 的小图像,包含超过 500kB 的元数据,下载为backpack.jpg
. 磁盘上的文件大小为 584kB。
第一件事:加载图像,将其保存回磁盘:
open System
open System.Drawing
open System.Drawing.Imaging
// Image from http://www.aedsuperstore.com/assets/images/PAD-BAG-02-T-Thumb.jpg
// downloaded as c:/temp/backpack.jpg, File size 584kB
let img = Bitmap.FromFile "c:/temp/backpack.jpg"
// Saves into a file of 563kB
img.Save "c:/temp/backpack_unchanged.jpg"
Run Code Online (Sandbox Code Playgroud)
很奇怪的是,文件大小下降了 20kB,降至 563kB,但我最初忽略了这一点(我归咎于默认编码器质量)
该镜像有一项元数据占用超过500000字节:
> img.GetPropertyItem 34675;;
val it : PropertyItem =
System.Drawing.Imaging.PropertyItem
{Id = 34675;
Len = 557168;
Type = 1s;
...
Run Code Online (Sandbox Code Playgroud)
为了删除元数据,我检查了所有属性项,并调用RemovePropertyItem
:
let ids = img.PropertyIdList
for id in ids …
Run Code Online (Sandbox Code Playgroud) 我有以下函数将csv文件转换为特定的txt模式(由CNTKTextFormat Reader预期):
open System.IO
open FSharp.Data;
open Deedle;
let convert (inFileName : string) =
let data = Frame.ReadCsv(inFileName)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
use outFile = new StreamWriter(outFileName, false)
data.Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map(fun (k,v) -> v) |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
outFile.WriteLine(sprintf "|labels %A |features %s" label body)
printf "%A" label
| _ …
Run Code Online (Sandbox Code Playgroud)