Tuesday, December 24, 2013

Hadoop- How to set HDFS quota

HDFS supports disk quotas just like other filesystems. As a Hadoop Administrator, you can use "dfsadmin" command to specify a limit on the physical size a file or directory in HDFS.

For example, to set quota on /user/tony directory:

$ hadoop fs -count -q /user/tony
        none             inf            none             inf            8            4              12257 /user/tony

$ hdfs dfsadmin -setSpaceQuota 10G /user/tony

$ hadoop fs -count -q /user/tony
        none             inf     10737418240     10737393196            8            4              12257 /user/tony

Column one (none) is the file count quota which is not set. Column two (inf) means infinite number of files may be still created in this directory. The third column (10737418240 - 10G) means the space quota, and the fourth column means (10737393196 - 9.9G) space left. If the quota is exceeded, any attempt to put new files into this directory are denied and an error message is returned. For example:
$ hadoop fs -put ./shell-cmd.txt /user/tony/
put: The DiskSpace quota of /user/tony is exceeded: quota = 10 B = 10 B but diskspace consumed = 25044 B = 24.46 KB

HDFS quota accounting:
Because HDFS is a distributed filesystem and many clients can be writing data to a directory at once, it would be difficult to evaluate each byte written against the remaining quota. What HDFS does is it assumes an entire block will be filled when it's allocated, which can create unintuitive error messages. Let's give an example. Let's say "/user/tony" has a quota of 2M, writing a 8KB file with a block size of 128M will cause a quota violation, because HDFS thinks you are actually writing 3 x 128 = 384 MB instead if 3 x 8 = 24KB (assume replication factor is 3).

$ hdfs dfsadmin -setSpaceQuota 2M /user/tony
$ hadoop fs -count -q /user/tony
        none             inf         2097152         2072108            8            4              12257 /user/tony
$ hdfs getconf -confKey dfs.blocksize
134217728
$ hdfs getconf -confKey dfs.replication
3
$ fallocate -l 8K foo
hadoop fs -put ./foo /user/tony
13/12/24 16:01:51 WARN hdfs.DFSClient: DataStreamer Exception

org.apche.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /user/tony is exceeded: quota = 2097152 B = 2 MB but diskspace consumed = 402678228 B = 384.02 MB
at org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:161)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1633)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1369)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:351)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:2662)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2326)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio

To remove quota:
$ hdfs dfsadmin -clrSpaceQuota /user/tony

To set file count quotas, you can use: hdfs dfsadmin -setQuota number path and hdfs dfsadmin -clrQuota path, respectively.

No comments: