Saturday, April 4, 2015

Easy way to recover the deleted files/dir in Hadoop hdfs

Easy way to recover the deleted files/dir in hdfs
In some cause, accidently files or dir will be deleted,
Is there any way to recover to get back.??

   By default Hadoop will delete the files/dir forever. It has Trash feature, which is not enabled by default.

   By configuring #fs.trash.interval and #fs.trash.checkpoint.interval in Hadoop core-site.xml will move the deleted files/dir into .Trash folder.

   location of .Trash folder is in HDFS /user/$USER/.Trash

configuring core-site.xml

<property>
<name>fs.trash.interval</name>
<value>120</value> 
</property>

<property>
<name>fs.trash.checkpoint.interval</name>
<value>45</value>
</property>

   In above configuration, all the deleted files / dir will me moved to .Trash folder and keep the data for two hours.
   checkpoint intervel check will performed for every 45 min and deletes all the file/dir which more then 2 hours old from .Trash folder.

restart the hadoop
    once you modify the core-site.xml , stop and start the hadoop

   here is the example of remove dir command
hadoop@solai# hadoop fs -rmr /testTrash
15/04/05 01:10:14 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 120 minutes, Emptier interval = 45 minutes. Moved: 'hdfs://127.0.0.1:9000/testTrash' to trash at: hdfs://127.0.0.1:9000/user/bdalab/.Trash/Current
   you can clearly get the message say that deletd folder will be moved to /user/bdalab/.Trash/Current and will keep the data for 2 hours with check point interval 45 min.

list the deleted files/dir in .Trash folder using -ls
hadoop@solai# hadoop fs -ls hdfs://127.0.0.1:9000/user/bdalab/.Trash/Current/testTrash
you can view the content (by -cat) or move the files to original path.
hadoop@solai# hadoop fs -mv hdfs://127.0.0.1:9000/user/bdalab/.Trash/Current/testTrash /testTrash

No comments: