Due to space constraints on our Hadoop clusters we had store output files in compressed gz format.
STORE finaldata INTO '/output/${month}/${folder}.gz/' USING PigStorage('\t');
Once we started storing our output files in the compressed format, we had to change the way we merge the output files to extract it to the local system.The getmerge command copied the files in compressed format.
hadoop fs -getmerge /output/${month}/${folder}.gz/part-* /output/handoff.txt
In order to decompress the files and copy it over to local system we had to use the text command..
hadoop fs -text /output/${month}/${folder}.gz/part-* > /output/handoff.txt
STORE finaldata INTO '/output/${month}/${folder}.gz/' USING PigStorage('\t');
Once we started storing our output files in the compressed format, we had to change the way we merge the output files to extract it to the local system.The getmerge command copied the files in compressed format.
hadoop fs -getmerge /output/${month}/${folder}.gz/part-* /output/handoff.txt
In order to decompress the files and copy it over to local system we had to use the text command..
hadoop fs -text /output/${month}/${folder}.gz/part-* > /output/handoff.txt
No comments:
Post a Comment