Skip to content

HarkiratGhotra/Creation_of_Linux_structure_file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

SHEL SCRIPTING

Licensing Information: READ LICENSE

This project is related to create 10,000 files with 2000 rows each with 50 columns and then moving those files on Hadoop

  • Here I have created input file with numbers and then convert those numbers in rows and columns order, then I converted those files into CSV

Steps to create files with multiple records in structured way

  • First create a "filename.sh" in your directory
vi filename.sh
  • For creating n number of files with m numbers of records in output file in the form of rows and column
#for creating 'n' number of files, here n = 10,000
for ((i=1;i<=10000;i++))
        do
#for creating m number of records, here m = 100,000
        for((j=1;j<=100000;j++))
                do
#inserting each record into file
                echo $j >>create_input_file$i;
                done;
#'awk' separates the column based on row ('nr'), here nr = 2000, which will create another column after inserting '2000' row in one column
        awk -v nr=2000 '{a[NR]=$0} END{for (i=1;i<=nr;i++) {for (j=i;j<=NR;j+=nr  )printf "%d  ",a[j]; print" "}}' create_input_file$i >output_file$i
        done;
  • Delete the input file, which I have created, as I need only output file, so in order to remove unused file, removing create_input_file
#deleting input file, as we only require output file
for delete_input_file in create_input_file*
        do
        rm $delete_input_file;
        done;
  • Converting the outfile into CSV
#converting the output into CSV
for csv in output_file*
        do
        sed -i -e 's/  /,/g' $csv;
        done;

#removing the last comma from each row
for last_comma in output_file*
        do
        sed -i -e 's/,[[:blank:]]*$//g' $last_comma;
        done;
  • compressing the output file as we need to combine all the output file and send it to Hadoop
#compressing the output file
tar -zcvf output.archive.tar.gz output*
  • Creating a directory in Hadoop and copyFromLocal on Hadoop
#crating the directory in hadoop "assignment_linux"
hadoop fs -mkdir /user/cloudera/assignment_linux

#cp the "output.archive.tar.gz" into hadoop
hadoop fs -put output.archive.tar.gz /user/cloudera/assignment_linux

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages