Chr*_*oth 11
在现代Apache Hadoop版本中,各种HDFS限制由fs-limits名称中的配置属性控制,所有配置属性都具有合理的默认值.这个问题专门询问了目录中的孩子数量.这是由定义的dfs.namenode.fs-limits.max-directory-items,其默认值是1048576.
有关配置属性及其默认值的完整列表,请参阅hdfs-default.xml中的Apache Hadoop文档fs-limits.为方便起见,在此复制粘贴:
<property>
<name>dfs.namenode.fs-limits.max-component-length</name>
<value>255</value>
<description>Defines the maximum number of bytes in UTF-8 encoding in each
component of a path. A value of 0 will disable the check.</description>
</property>
<property>
<name>dfs.namenode.fs-limits.max-directory-items</name>
<value>1048576</value>
<description>Defines the maximum number of items that a directory may
contain. Cannot set the property to a value less than 1 or more than
6400000.</description>
</property>
<property>
<name>dfs.namenode.fs-limits.min-block-size</name>
<value>1048576</value>
<description>Minimum block size in bytes, enforced by the Namenode at create
time. This prevents the accidental creation of files with tiny block
sizes (and thus many blocks), which can degrade
performance.</description>
</property>
<property>
<name>dfs.namenode.fs-limits.max-blocks-per-file</name>
<value>1048576</value>
<description>Maximum number of blocks per file, enforced by the Namenode on
write. This prevents the creation of extremely large files which can
degrade performance.</description>
</property>
<property>
<name>dfs.namenode.fs-limits.max-xattrs-per-inode</name>
<value>32</value>
<description>
Maximum number of extended attributes per inode.
</description>
</property>
<property>
<name>dfs.namenode.fs-limits.max-xattr-size</name>
<value>16384</value>
<description>
The maximum combined size of the name and value of an extended attribute
in bytes. It should be larger than 0, and less than or equal to maximum
size hard limit which is 32768.
</description>
</property>
Run Code Online (Sandbox Code Playgroud)
所有这些设置都使用Apache Hadoop社区决定的合理默认值.通常建议用户不要在非常特殊的情况下调整这些值.
来自http://blog.cloudera.com/blog/2009/02/the-small-files-problem/:
HDFS中的每个文件,目录和块都表示为namenode内存中的一个对象,根据经验,每个对象占用150个字节.因此,每个使用一个块的1000万个文件将使用大约3千兆字节的内存.超出此级别的扩展是当前硬件的问题.当然十亿个文件是不可行的.