我是Map-reduce的新手,我想了解什么是序列文件数据输入?我在Hadoop书中学习,但我很难理解.
我必须找到类似的URL
' http://teethwhitening360.com/teeth-whitening-treatments/18/ '
' http://teethwhitening360.com/laser-teeth-whitening/22/ '
' http://teethwhitening360.com/teeth-whitening-products/21 / '' http://unwanted-hair-removal.blogspot.com/2008/03/breakthroughs-in-unwanted-hair-remo '
' http://unwanted-hair-removal.blogspot.com/2008/ 03 /不想要的脱毛-products.html放在 '
' http://unwanted-hair-removal.blogspot.com/2008/03/unwanted-hair-removal-by-shaving.ht "
并将它们分组或聚集.我的问题:
我将不胜感激任何建议.