Page 1 of 1

How to Split the Huge Data?

Posted: Wed Mar 20, 2013 4:23 am
by pkll
Hi,

I have 4GB Source data,i splitted 2GB,2GB By using like this

my filename is CALL_HISTORY_DETAILS.txt

c:/NRD> du -sh CALL_HIST_DETAILS.txt---->(It contains the file size(4GB))
c:/NRD>wc -l CALL_HIST_DETAILS.txt------->total count is 9868002

C:/NRD>head -4934001 CALL_HIST_DETAILS.txt > test.txt

C:/NRD>tail -4934001 CALL_HIST_DETAILS.txt > test1.txt

When i used tail(test1.txt) command is working fine and it is able to read the data. But,when i used head(test.txt) command it is not working and it is unable to read the data it is showing error.

Let me know why head command(test.txt) is not working? Is this currect process to split the data?

Posted: Wed Mar 20, 2013 4:44 am
by crystal_pup
Can you paste the exact error that you are getting on using the head command on the test.txt file?

Posted: Wed Mar 20, 2013 5:20 am
by pkll
Hi crystal,
i am getting below error..

Sequential_File_0,0: Error reading on import.
Sequential_File_0,0: Consumed more than 100,000 bytes looking for record delimiter; aborting
Sequential_File_0,0: Import error at record 0.
Sequential_File_0,0: The runLocally() of the operator failed.

But,Tail command is working fine...

is there any another alternative process for split the data?

Use Split command to split the file

Posted: Wed Mar 20, 2013 5:38 am
by anbu

Code: Select all

split -4934001 CALL_HIST_DETAILS.txt test

Posted: Wed Mar 20, 2013 5:54 am
by prasson_ibm
Hi,
You can try with sed command

Code: Select all

   END=`wc -l CALL_HIST_DETAILS.txt|awk -F" " '{print $1}'`
   sed -n '1,4934001p' CALL_HIST_DETAILS.txt > test.txt
   sed -n '4934002,'$END'p' CALL_HIST_DETAILS.txt > test1.txt

Posted: Wed Mar 20, 2013 6:23 am
by anbu
Few changes to prasson_ibm's code.

Code: Select all

END=`wc -l < CALL_HIST_DETAILS.txt` 
   sed -n '1,4934001{p;4934001q;}' CALL_HIST_DETAILS.txt > test.txt 
   sed -n '4934002,$p' CALL_HIST_DETAILS.txt > test1.txt

Posted: Wed Mar 20, 2013 6:36 am
by daignault
Try defining the sequential file correctly. You have not defined a correct delimiter for parsing the columns or you have not defined the blocks of data. Fix your definition and the reading of the data by Datastage will be correct.

Regards

Ray D

Posted: Wed Mar 20, 2013 7:33 am
by chulett
So this is your unzipped file?

Posted: Wed Mar 20, 2013 9:38 am
by prasannakumarkk
i guess. see am guessing, i guess. ok
Sequential_File_0,0: Consumed more than 100,000 bytes looking for record delimiter; aborting
Sequential_File_0,0: Import error at record 0
What is the first record in your head file. Does it have column header.
Do it have more than 100,000 character. And what is the record delimiter that you have specified in the format of the sequential file stage. And do you have new line character in the first record.

Remove the first line and run

Posted: Wed Mar 20, 2013 9:40 am
by prasannakumarkk
Also in the director, for the run, check the monitor. How many records are read from the sequential file stage. It should be zero. Correct?