Hadoop File System Commands

In this blog post I will explain different HDFS commands to access HDFS which are commonly used while working as a Big Data Developer Training.

  • Hadoop provides command line interface to access HDFS.
  • Most of the commands similar to UNIX file system commands
[npntraining@centos8 Desktop]$ hdfs -help
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
       where COMMAND is one of:
  dfs                  run a filesystem command on the file systems supported in Hadoop.
  classpath            prints the classpath
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  journalnode          run the DFS journalnode
  zkfc                 run the ZK Failover Controller daemon
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  haadmin              run a DFS HA admin client
  fsck                 run a DFS filesystem checking utility
  balancer             run a cluster balancing utility
  jmxget               get JMX exported values from NameNode or DataNode.
  mover                run a utility to move block replicas across
                       storage types
  oiv                  apply the offline fsimage viewer to an fsimage
  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
  oev                  apply the offline edits viewer to an edits file
  fetchdt              fetch a delegation token from the NameNode
  getconf              get config values from configuration
  groups               get the groups which users belong to
  snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
  lsSnapshottableDir   list all snapshottable dirs owned by the current user
                                                Use -help to see options
  portmap              run a portmap service
  nfs3                 run an NFS version 3 gateway
  cacheadmin           configure the HDFS cache
  crypto               configure HDFS encryption zones
  storagepolicies      list/get/set block storage policies
  version              print the version

List of all HDFS Commands

[naveen@npntraining ~]$ hdfs dfs -help
Usage: hadoop fs [generic options]
    [-appendToFile <localsrc> ... <dst>]
    [-cat [-ignoreCrc] <src> ...]
    [-checksum <src> ...]
    [-chgrp [-R] GROUP PATH...]
    [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
    [-chown [-R] [OWNER][:[GROUP]] PATH...]
    [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
    [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
    [-count [-q] [-h] <path> ...]
    [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
    [-createSnapshot <snapshotDir> [<snapshotName>]]
    [-deleteSnapshot <snapshotDir> <snapshotName>]
    [-df [-h] [<path> ...]]
    [-du [-s] [-h] <path> ...]
    [-expunge]
    [-find <path> ... <expression> ...]
    [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
    [-getfacl [-R] <path>]
    [-getfattr [-R] {-n name | -d} [-e en] <path>]
    [-getmerge [-nl] <src> <localdst>]
    [-help [cmd ...]]
    [-ls [-d] [-h] [-R] [<path> ...]]
    [-mkdir [-p] <path> ...]
    [-moveFromLocal <localsrc> ... <dst>]
    [-moveToLocal <src> <localdst>]
    [-mv <src> ... <dst>]
    [-put [-f] [-p] [-l] <localsrc> ... <dst>]
    [-renameSnapshot <snapshotDir> <oldName> <newName>]
    [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
    [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
    [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
    [-setfattr {-n name [-v value] | -x name} <path>]
    [-setrep [-R] [-w] <rep> <path> ...]
    [-stat [format] <path> ...]
    [-tail [-f] <file>]
    [-test -[defsz] <path>]
    [-text [-ignoreCrc] <src> ...]
    [-touchz <path> ...]
    [-truncate [-w] <length> <path> ...]
    [-usage [cmd ...]]

-ls

This command is used for listing the directories and files present under under a specific directory in an HDFS.

Usage : hdfs dfs [generic options] -ls [-d] [-h] [-R] [ …]

[npntraining ~]$ hdfs dfs -ls /user/$USERNAME/hdfs_commands

Note

  • -d is used to list the directories as plain files.
  • -h is used to print file size in human readable format.
  • -R is used to recursively list the content of the directories.

-mkdir

This command is similar to that of Unix mkdir and is used to create a directory in HDFS.

Usage :

hdfs dfs -mkdir [-p] /hdfs-path
Options Description
-p Mention not to fail if the directory already exists.
[npntraining ~]$ hdfs dfs -mkdir /user/$USER/hdfs_commands

Note

  • If the directory already exists or if intermediate directories doesn’t exists then it will throws an error. In order to overcome that error we will be using -p (parent), which not only ignores if the directory already exists but also create the intermediate directories if they doesn’t exists
[npntraining ~]$ hdfs dfs -ls /

-cat

This command is used for displaying the contents of a file on the console.

Usage : hdfs dfs [-cat [-ignoreCrc] …]

Note

  • -ignoreCrc option will disable the checksum verification.

-copyFromLocal &

This command is used to copy files from the local file system to the HDFS file system.

Usage

hdfs dfs [generic options] -copyFromLocal [-f] [-p] [-l]

  • -f overwrites the destination if it already exists.
  • -p preserves access and modification times, ownership and the permissions
  • -d : Skip creation of temporary file with the suffix .COPYING.

Using -copyFromLocal

[npntraining ~]$ hdfs dfs -copyFromLocal <localfile_path1> <localfile_path2> /<hdfs-path>

-put

Usage :

hdfs dfs -put [-f] [-p] [-l] [-d] [ — | .. ].

  • -f overwrites the destination if it already exists.
  • -p preserves access and modification times, ownership and the permissions
  • -d skips creation of temporary file with the suffix .COPYING.
  • -l allows Data Node to lazily persist the file to disk, Forces a replication factor of 1.
[npntraining~]$> hdfs dfs -put file1.txt hdfs://localhost.localdomain:9000/file1.txt

Important Note:

The fundamental difference between -copyFromLocal and -put is the

  1. -put can take input from stdin.
[npntraining]$> echo "Hello Naveen" | hdfs dfs -put - /file1.txt

-moveFromLocal

This command will move the file from local file system to HDFS. It ensures that the local copy is deleted.

Usage

hdfs dfs -moveFromLocal

[npntraining]$> hdfs dfs -mv /<hdfs-path> /<hdfs-path>

Similar to –copyFromLocal command, but it follows cut and paste approach but within a file system

[npntraining]$ > hdfs dfs -mv file1.txt /file1.txt

-get

This command is used to copy files from HDFS to the local file system.

Usage

hdfs dfs [generic options] -get [-f][-p] [-ignoreCrc] [-Crc]

[npntraining]$> hdfs dfs -get /<hdfs-path> <localfile_path1>
[npntraining]$> hdfs dfs -copyToLocal /<hdfs-path> <localfile_path1>

-copyToLocal

This command is similar to get command except that destination is restricted to the local file system. This command is used to move the file from HDFS to local file system.

Usage

hdfs dfs -copyToLocal [-f] [-p][-ignoreCrc] [-Crc] URI

-rm

This command is used to remove a file or directory from HDFS.

Usage

hdfs dfs -rm [-f] [-r |-R] [-skipTrash] [-safely] URI [URI …]

  • –rm option will remove only files but directories can’t be deleted by this command.
  • –skipTrash option is used to bypass the trash then it immediately deletes the source.
  • –f option is used to mention that if there is no file existing.
  • –r option is used to recursively delete directories
  • -safely option will require safety confirmation before deleting directory with total number of files greater than hadoop.shell.delete.limit.num.files (in core-site.xml, default: 100). It can be used with -skipTrash to prevent accidental deletion of large directories.
[npntraining]$> hdfs dfs –rm /file.txt
[npntraining]$> hdfs dfs –rmr /directory

-setrep

This command changes the replication factor for the particular file/s or directory recursively.

Usage

hdfs dfs -setrep [-R] [-w] <numReplicas> <path>
  • –w option will wait till replication process to complete because replication will normally take long time to complete.
  • -R option is accepted for backwards compatibility. It has no effect.
[npntraining]$ > hdfs dfs -setrep 4 /file1.txt

Whenever we change replication factor automatic cluster re-balancing is going to happen.

-touchz

This command is used to create a file of zero length. An error is returned if the file exists with non-zero length.

Usage

hdfs dfs -touchz URI [URI …]
[npntraining]$ > hdfs dfs -touchz /file1.txt

-stat

Print statistics of the file/directory specified in the path with format. Format may contain following options:

  1. %b: Filesize in blocks.

  2. %F: File type.

  3. %g: group name of owner.

  4. %n: name of the file.

  5. %o: block size occupied by the file/directory.

  6. %r: number of replicas.

  7. %u: username of owner.

[npntraining]$ > hdfs dfs -stat "%n %o %u %g %r" /file.txt

-getmerge

Takes a source directory files as input and concatenates files in src into the destination local file.

[npntraining]$ > hdfs dfs -getmerge /<hdfsdir> file.txt

-appendToFile

Appends the content of local files to HDFS files

[npntraining]$ > hdfs dfs -appendToFile /<local_file> /<local_file> /<hdfs-file>

-text

-test [-ezd]

The -test is used for file test operations

  1. –e: If the path exists, returns 0.
  2. –z: If the file is zero length, returns 0
  3. –d:If the path is directory, return 0;
[npntraining]$ > hdfs dfs -test -e /file.txt
[npntraining]$ > echo $?

-du

This command is used to check disk usage

hdfs dfs -du -h /

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *