{"id":5602,"date":"2021-11-08T14:02:36","date_gmt":"2021-11-08T14:02:36","guid":{"rendered":"https:\/\/www.npntraining.com\/blog\/?p=5602"},"modified":"2022-01-01T19:11:02","modified_gmt":"2022-01-01T19:11:02","slug":"hadoop-file-system-commands","status":"publish","type":"post","link":"https:\/\/www.npntraining.com\/blog\/hadoop-file-system-commands\/","title":{"rendered":"Hadoop File System Commands"},"content":{"rendered":"<p>In this blog post I will explain different HDFS commands to access HDFS which are commonly used while working as a Big Data Developer Training.<\/p>\n<ul>\n<li>Hadoop provides command line interface to access HDFS.<\/li>\n<li>Most of the commands similar to <strong>UNIX file system commands<\/strong><\/li>\n<\/ul>\n<pre><code class=\"language-shell\">[npntraining@centos8 Desktop]$ hdfs -help\nUsage: hdfs [--config confdir] [--loglevel loglevel] COMMAND\n       where COMMAND is one of:\n  dfs                  run a filesystem command on the file systems supported in Hadoop.\n  classpath            prints the classpath\n  namenode -format     format the DFS filesystem\n  secondarynamenode    run the DFS secondary namenode\n  namenode             run the DFS namenode\n  journalnode          run the DFS journalnode\n  zkfc                 run the ZK Failover Controller daemon\n  datanode             run a DFS datanode\n  dfsadmin             run a DFS admin client\n  haadmin              run a DFS HA admin client\n  fsck                 run a DFS filesystem checking utility\n  balancer             run a cluster balancing utility\n  jmxget               get JMX exported values from NameNode or DataNode.\n  mover                run a utility to move block replicas across\n                       storage types\n  oiv                  apply the offline fsimage viewer to an fsimage\n  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage\n  oev                  apply the offline edits viewer to an edits file\n  fetchdt              fetch a delegation token from the NameNode\n  getconf              get config values from configuration\n  groups               get the groups which users belong to\n  snapshotDiff         diff two snapshots of a directory or diff the\n                       current directory contents with a snapshot\n  lsSnapshottableDir   list all snapshottable dirs owned by the current user\n                                                Use -help to see options\n  portmap              run a portmap service\n  nfs3                 run an NFS version 3 gateway\n  cacheadmin           configure the HDFS cache\n  crypto               configure HDFS encryption zones\n  storagepolicies      list\/get\/set block storage policies\n  version              print the version\n<\/code><\/pre>\n<p><strong>List of all HDFS Commands<\/strong><\/p>\n<pre><code class=\"language-shell\">[naveen@npntraining ~]$ hdfs dfs -help\nUsage: hadoop fs [generic options]\n    [-appendToFile &lt;localsrc&gt; ... &lt;dst&gt;]\n    [-cat [-ignoreCrc] &lt;src&gt; ...]\n    [-checksum &lt;src&gt; ...]\n    [-chgrp [-R] GROUP PATH...]\n    [-chmod [-R] &lt;MODE[,MODE]... | OCTALMODE&gt; PATH...]\n    [-chown [-R] [OWNER][:[GROUP]] PATH...]\n    [-copyFromLocal [-f] [-p] [-l] &lt;localsrc&gt; ... &lt;dst&gt;]\n    [-copyToLocal [-p] [-ignoreCrc] [-crc] &lt;src&gt; ... &lt;localdst&gt;]\n    [-count [-q] [-h] &lt;path&gt; ...]\n    [-cp [-f] [-p | -p[topax]] &lt;src&gt; ... &lt;dst&gt;]\n    [-createSnapshot &lt;snapshotDir&gt; [&lt;snapshotName&gt;]]\n    [-deleteSnapshot &lt;snapshotDir&gt; &lt;snapshotName&gt;]\n    [-df [-h] [&lt;path&gt; ...]]\n    [-du [-s] [-h] &lt;path&gt; ...]\n    [-expunge]\n    [-find &lt;path&gt; ... &lt;expression&gt; ...]\n    [-get [-p] [-ignoreCrc] [-crc] &lt;src&gt; ... &lt;localdst&gt;]\n    [-getfacl [-R] &lt;path&gt;]\n    [-getfattr [-R] {-n name | -d} [-e en] &lt;path&gt;]\n    [-getmerge [-nl] &lt;src&gt; &lt;localdst&gt;]\n    [-help [cmd ...]]\n    [-ls [-d] [-h] [-R] [&lt;path&gt; ...]]\n    [-mkdir [-p] &lt;path&gt; ...]\n    [-moveFromLocal &lt;localsrc&gt; ... &lt;dst&gt;]\n    [-moveToLocal &lt;src&gt; &lt;localdst&gt;]\n    [-mv &lt;src&gt; ... &lt;dst&gt;]\n    [-put [-f] [-p] [-l] &lt;localsrc&gt; ... &lt;dst&gt;]\n    [-renameSnapshot &lt;snapshotDir&gt; &lt;oldName&gt; &lt;newName&gt;]\n    [-rm [-f] [-r|-R] [-skipTrash] &lt;src&gt; ...]\n    [-rmdir [--ignore-fail-on-non-empty] &lt;dir&gt; ...]\n    [-setfacl [-R] [{-b|-k} {-m|-x &lt;acl_spec&gt;} &lt;path&gt;]|[--set &lt;acl_spec&gt; &lt;path&gt;]]\n    [-setfattr {-n name [-v value] | -x name} &lt;path&gt;]\n    [-setrep [-R] [-w] &lt;rep&gt; &lt;path&gt; ...]\n    [-stat [format] &lt;path&gt; ...]\n    [-tail [-f] &lt;file&gt;]\n    [-test -[defsz] &lt;path&gt;]\n    [-text [-ignoreCrc] &lt;src&gt; ...]\n    [-touchz &lt;path&gt; ...]\n    [-truncate [-w] &lt;length&gt; &lt;path&gt; ...]\n    [-usage [cmd ...]]<\/code><\/pre>\n<h2>-ls<\/h2>\n<blockquote>\n<p>This command is used for listing the directories and files present under under a specific directory in an HDFS.<\/p>\n<\/blockquote>\n<p><strong><em>Usage<\/em><\/strong> : hdfs dfs [generic options] -ls [-d] [-h] [-R] [<path> \u2026]<\/p>\n<pre><code>[npntraining ~]$ hdfs dfs -ls \/user\/$USERNAME\/hdfs_commands<\/code><\/pre>\n<p><strong>Note<\/strong><\/p>\n<ul>\n<li>-d is used to list the directories as plain files.<\/li>\n<li>-h is used to print file size in human readable format.<\/li>\n<li>-R is used to recursively list the content of the directories.<\/li>\n<\/ul>\n<h2>-mkdir<\/h2>\n<blockquote>\n<p>This command is similar to that of Unix mkdir and is used to create a directory in HDFS.<\/p>\n<\/blockquote>\n<p><strong>Usage<\/strong> : <\/p>\n<pre><code>hdfs dfs -mkdir [-p] \/hdfs-path<\/code><\/pre>\n<table>\n<thead>\n<tr>\n<th><strong>Options<\/strong><\/th>\n<th>Description<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>-p<\/td>\n<td>Mention not to fail if the directory already exists.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<pre><code>[npntraining ~]$ hdfs dfs -mkdir \/user\/$USER\/hdfs_commands<\/code><\/pre>\n<p><strong>Note<\/strong><\/p>\n<ul>\n<li>If the directory already exists or if intermediate directories doesn\u2019t exists then it will throws an error. In order to overcome that error we will be using <strong>-p<\/strong> (parent), which not only ignores if the directory already exists but also create the intermediate directories if they doesn\u2019t exists<\/li>\n<\/ul>\n<pre><code>[npntraining ~]$ hdfs dfs -ls \/<\/code><\/pre>\n<h2>-cat<\/h2>\n<blockquote>\n<p>This command is used for displaying the contents of a file on the console.<\/p>\n<\/blockquote>\n<p><strong>Usage<\/strong> : hdfs dfs [-cat [-ignoreCrc] <src> \u2026]<\/p>\n<p><strong>Note<\/strong><\/p>\n<ul>\n<li>-ignoreCrc option will disable the checksum verification.<\/li>\n<\/ul>\n<h2>-copyFromLocal &amp;<\/h2>\n<blockquote>\n<p>This command is used to copy files from the local file system to the HDFS file system.<\/p>\n<\/blockquote>\n<p><strong>Usage<\/strong><\/p>\n<p>hdfs dfs [generic options] -copyFromLocal [-f] [-p] [-l] <localsrc> \u2026 <dst><\/p>\n<ul>\n<li>-f overwrites the destination if it already exists.<\/li>\n<li>-p preserves access and modification times, ownership and the permissions<\/li>\n<li>-d : Skip creation of temporary file with the suffix .<em>COPYING<\/em>.<\/li>\n<\/ul>\n<p><u>Using -copyFromLocal<\/u><\/p>\n<pre><code class=\"language-shell\">[npntraining ~]$ hdfs dfs -copyFromLocal &lt;localfile_path1&gt; &lt;localfile_path2&gt; \/&lt;hdfs-path&gt;<\/code><\/pre>\n<h2>-put<\/h2>\n<p><strong>Usage<\/strong> : <\/p>\n<p>hdfs dfs -put [-f] [-p] [-l] [-d] [ \u2014 | <localsrc> .. ]. <dst><\/p>\n<ul>\n<li>-f overwrites the destination if it already exists.<\/li>\n<li>-p preserves access and modification times, ownership and the permissions<\/li>\n<li>-d skips creation of temporary file with the suffix .<em>COPYING<\/em>.<\/li>\n<li>-l allows Data Node to lazily persist the file to disk, Forces a replication factor of 1.<\/li>\n<\/ul>\n<pre><code>[npntraining~]$&gt; hdfs dfs -put file1.txt hdfs:\/\/localhost.localdomain:9000\/file1.txt<\/code><\/pre>\n<p><strong>Important Note:<\/strong><\/p>\n<p>The fundamental difference between -copyFromLocal and -put is the <\/p>\n<ol>\n<li>-put  can take input from stdin.<\/li>\n<\/ol>\n<pre><code>[npntraining]$&gt; echo &quot;Hello Naveen&quot; | hdfs dfs -put - \/file1.txt<\/code><\/pre>\n<h2>-moveFromLocal<\/h2>\n<blockquote>\n<p>This command will move the file from local file system to HDFS. It ensures that the local copy is deleted.<\/p>\n<\/blockquote>\n<p><strong>Usage<\/strong><\/p>\n<p>hdfs dfs -moveFromLocal <localsrc> <dst><\/p>\n<pre><code>[npntraining]$&gt; hdfs dfs -mv \/&lt;hdfs-path&gt; \/&lt;hdfs-path&gt;<\/code><\/pre>\n<p>Similar to \u2013copyFromLocal command, but it follows cut and paste approach but within a file system<\/p>\n<pre><code>[npntraining]$ &gt; hdfs dfs -mv file1.txt \/file1.txt<\/code><\/pre>\n<h2>-get<\/h2>\n<blockquote>\n<p>This command is used to copy files from HDFS to the local file system.<\/p>\n<\/blockquote>\n<p><strong>Usage<\/strong><\/p>\n<p>hdfs dfs [generic options] -get [-f][-p] [-ignoreCrc] [-Crc] <src> \u2026 <localdst><\/p>\n<pre><code>[npntraining]$&gt; hdfs dfs -get \/&lt;hdfs-path&gt; &lt;localfile_path1&gt;<\/code><\/pre>\n<pre><code>[npntraining]$&gt; hdfs dfs -copyToLocal \/&lt;hdfs-path&gt; &lt;localfile_path1&gt;<\/code><\/pre>\n<h2>-copyToLocal<\/h2>\n<blockquote>\n<p>This command is similar to get command except that destination is restricted to the local file system. This command is used to move the file from HDFS to local file system.<\/p>\n<\/blockquote>\n<p><strong>Usage<\/strong><\/p>\n<p>hdfs dfs -copyToLocal [-f] [-p][-ignoreCrc] [-Crc] URI <localdst><\/p>\n<h2>-rm<\/h2>\n<blockquote>\n<p>This command is used to remove a file or directory from HDFS.<\/p>\n<\/blockquote>\n<p><strong>Usage<\/strong><\/p>\n<p>hdfs dfs -rm [-f] [-r |-R] [-skipTrash] [-safely] URI [URI \u2026]<\/p>\n<ul>\n<li>\u2013rm option will remove only files but directories can\u2019t be deleted by this command.<\/li>\n<li>\u2013skipTrash option is used to bypass the trash then it immediately deletes the source.<\/li>\n<li>\u2013f option is used to mention that if there is no file existing.<\/li>\n<li>\u2013r option is used to recursively delete directories<\/li>\n<li>-safely option will require safety confirmation before deleting directory with total number of files greater than hadoop.shell.delete.limit.num.files (in core-site.xml, default: 100). It can be used with -skipTrash to prevent accidental deletion of large directories.<\/li>\n<\/ul>\n<pre><code>[npntraining]$&gt; hdfs dfs \u2013rm \/file.txt<\/code><\/pre>\n<pre><code>[npntraining]$&gt; hdfs dfs \u2013rmr \/directory<\/code><\/pre>\n<h2>-setrep<\/h2>\n<blockquote>\n<p>This command changes the replication factor for the  particular file\/s  or directory recursively. <\/p>\n<\/blockquote>\n<p><strong>Usage<\/strong><\/p>\n<pre><code>hdfs dfs -setrep [-R] [-w] &lt;numReplicas&gt; &lt;path&gt;<\/code><\/pre>\n<ul>\n<li>\u2013w option will wait till replication process to complete because replication will normally  take long time to complete.<\/li>\n<li>-R option is accepted for backwards compatibility. It has no effect.<\/li>\n<\/ul>\n<pre><code>[npntraining]$ &gt; hdfs dfs -setrep 4 \/file1.txt<\/code><\/pre>\n<p>Whenever we change replication factor automatic cluster re-balancing is going to happen.<\/p>\n<h2><strong>-touch<\/strong>z<\/h2>\n<blockquote>\n<p>This command is used to create a file of zero length. An error is returned if the file exists with non-zero length.<\/p>\n<\/blockquote>\n<p><strong>Usage<\/strong><\/p>\n<pre><code>hdfs dfs -touchz URI [URI \u2026]<\/code><\/pre>\n<pre><code>[npntraining]$ &gt; hdfs dfs -touchz \/file1.txt<\/code><\/pre>\n<h2><strong>-stat<\/strong><\/h2>\n<p>Print statistics of the file\/directory specified in the path with format. Format may contain following options:<\/p>\n<ol>\n<li>\n<p>%b: Filesize in blocks.<\/p>\n<\/li>\n<li>\n<p>%F: File type.<\/p>\n<\/li>\n<li>\n<p>%g: group name of owner.<\/p>\n<\/li>\n<li>\n<p>%n: name of the file.<\/p>\n<\/li>\n<li>\n<p>%o: block size occupied by the file\/directory.<\/p>\n<\/li>\n<li>\n<p>%r: number of replicas.<\/p>\n<\/li>\n<li>\n<p>%u: username of owner.<\/p>\n<\/li>\n<\/ol>\n<pre><code>[npntraining]$ &gt; hdfs dfs -stat &quot;%n %o %u %g %r&quot; \/file.txt<\/code><\/pre>\n<h2><strong>-getmerge<\/strong><\/h2>\n<p>Takes a source directory files as input and concatenates files in src into the destination local file.<\/p>\n<pre><code>[npntraining]$ &gt; hdfs dfs -getmerge \/&lt;hdfsdir&gt; file.txt<\/code><\/pre>\n<h2><strong>-appendToFile<\/strong><\/h2>\n<p>Appends the content of local files to HDFS files<\/p>\n<pre><code>[npntraining]$ &gt; hdfs dfs -appendToFile \/&lt;local_file&gt; \/&lt;local_file&gt; \/&lt;hdfs-file&gt;<\/code><\/pre>\n<p><strong>-text<\/strong><\/p>\n<h2><strong>-test [-ezd]<\/strong><\/h2>\n<p>The -test is used for file test operations<\/p>\n<ol>\n<li>\u2013e: If the path exists, returns 0.<\/li>\n<li>\u2013z: If the file is zero length, returns 0<\/li>\n<li>\u2013d:If the path is directory, return 0;<\/li>\n<\/ol>\n<pre><code>[npntraining]$ &gt; hdfs dfs -test -e \/file.txt\n[npntraining]$ &gt; echo $?<\/code><\/pre>\n<h2><strong>-du<\/strong><\/h2>\n<p>This command is used to check disk usage<\/p>\n<pre><code>hdfs dfs -du -h \/<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In this blog post I will explain different HDFS commands to access HDFS which are commonly used while working as a Big Data Developer Training. Hadoop provides command line interface&hellip;<\/p>\n","protected":false},"author":1,"featured_media":5603,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[58,59,60,65,66],"tags":[],"class_list":["post-5602","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hadoop","category-hive","category-java-programming","category-selenium","category-statistics"],"_links":{"self":[{"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/posts\/5602","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/comments?post=5602"}],"version-history":[{"count":21,"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/posts\/5602\/revisions"}],"predecessor-version":[{"id":7851,"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/posts\/5602\/revisions\/7851"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/media\/5603"}],"wp:attachment":[{"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/media?parent=5602"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/categories?post=5602"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.npntraining.com\/blog\/wp-json\/wp\/v2\/tags?post=5602"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}