Thursday, July 23, 2015

Hadoop - Decompress gz files from reducer output

Due to space constraints on our Hadoop clusters we had store output files in compressed gz format.

STORE finaldata INTO '/output/${month}/${folder}.gz/' USING PigStorage('\t'); 

Once we started storing our output files in the compressed format, we had to change the way we merge the output files to extract it to the local system.The getmerge command copied the files in compressed format. 

hadoop fs -getmerge /output/${month}/${folder}.gz/part-* /output/handoff.txt

In order to decompress the files and copy it over to local system we had to use the text command..

hadoop fs -text /output/${month}/${folder}.gz/part-* > /output/handoff.txt

No comments: