在R中聚类照片?

Shr*_*nik 8 r image image-processing

我在这里有一个一般的R问题:

通常使用数码相机,我们倾向于点击许多可能重复的图像,并且在Picassa上共享时会浪费在线空间,或者在尝试删除一些不需要的图像时会产生开销.

是否可以使用R聚类照片?我的意思是Matlab中有一些用于图像处理的聚类功能,但这种功能是否可用,或者在R中是否有任何建议?

如果有任何关于这个主题,请提供一些想法.

dim*_*ura 11

如果你看看CRAN,有各种(我数约10个)包来读取图像数据.当然,还有各种包进行聚类.理论上, you could just plug the raw image data into the clustering algorithms, but in practice that wouldn't work very well. In terms of speed, it would be very slow, and in terms of accuracy, it would probably be pretty bad too. Modern techniques to cluster image data rely on specialized features extracted from images and operate on that. The best features are application dependent, but some of the best known are SIFT, SURF, and HOG. Older techniques relied on histograms of colors of the image as features, and that is quite doable with the aforementioned R packages, but it is not very accurate - it can hardly distinguish between a picture of the sea and a picture of a blue room.

那么该怎么办?这取决于你的最终目标,真的.一种方法是使用各种开源特征提取器之一,将数据保存为文本或其他R可读格式,然后像往常一样在R中进行数据处理.

一个很好的开源C库来提取具有cli接口的功能是vlfeat.如果您使用此功能,我建议在三个颜色通道上使用密集SIFT提取.然后通过连接的SIFT向量表示每个图像,并应用您最喜欢的聚类技术(可以处理数千个维度的向量).这几乎不会给你最先进的表现,但这是一个开始.

此页面具有各种功能提取器的参考实现,但仅限二进制.

注意:根据我的经验,R对于大型高维数据集(大小在GB范围内)不能很好地扩展.我喜欢R to death,但是使用C++来做这件事.