Page 1 of 1

How to Omit last record from a sequential file

Posted: Wed Oct 12, 2005 10:15 am
by dprasanth
Hi,
I am uploading a sequential file with around 5 million records to the oracle database. I am using a transformer to do some processing.
This is a sample data format

45080469|DET|45080464|60087074|NULL|2|Sep 1 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080470|DET|45080465|60087074|NULL|2|Oct 5 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080471|DET|45080466|60087074|NULL|2|Jan 9 2001 12:00AM|A|M|SYSTEM|09-06-2002|N|END

The third column is the Customer id and it is NOT NULL column and it is defined as Integer with length 20. But the last record of this file
is as follows

50000006|TRA|END.
It is a trailer record. Is there any way in which I can tell datastage to omit this record. I am
getting the following oracle error because of the above record

RNK_SEQOCI_V2..Sequential_File_14.DSLink17: nls_read_delimited() - row 4910396, column CustomerID, required column missing

Any help will be appreciated. Thanks in Advance.

Regards,
Dprasanth

Posted: Wed Oct 12, 2005 11:19 am
by ucf007
you need to preprocess the file using OS (unix) commands or a datastage routine, personally I use a DS routine to copy the selected lines (count(lines)-1) to a new file...

Posted: Wed Oct 12, 2005 7:28 pm
by vmcburney
You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.

Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.

What is happening at the moment? Is this last row giving you a warning or abort message?

Posted: Thu Oct 13, 2005 1:05 am
by dprasanth
vmcburney wrote:You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.

Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.

What is happening at the moment? Is this last row giving you a warning or abort message?
Thanks for both the replies..

The last row is aborting the job. I will try using the reject mode as you mentioned

Posted: Thu Oct 13, 2005 1:29 am
by dprasanth
ucf007 wrote:you need to preprocess the file using OS (unix) commands or a datastage routine, personally I use a DS routine to copy the selected lines (count(lines)-1) to a new file...
As I am new to datastage, I really don't know how to add this DS routine as a preprocess, do you mind explaining me this. I mean where exactly in designer and how will I specify it? Sorry for bothering.

Regards,
Dprasanth

Posted: Thu Oct 13, 2005 1:54 am
by dprasanth
vmcburney wrote:You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.

Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.

What is happening at the moment? Is this last row giving you a warning or abort message?
Hi,
The last record of the file doesn't have the key column CUSTOMER-ID itself.. How do I include as a constraint.. I mean to say that if a column itself is missing then put that as reject record...

Posted: Thu Oct 13, 2005 6:25 am
by chulett
Vincent is talking about the Sequential File stage in a Parallel job, which will automagically reject / not process any rows where the metadata doesn't match without raising a fuss.

A Server job is different and will complain when there are mismatches. You can handle your problem by switching to the 'Columns' tab in the stage and then scrolling over to the right. There is a column called something like 'Missing' and allows you to override the default behaviour of Error when a column is missing from a file. Change everything from CUSTOMER-ID to the end from Error to Replace and it will read that last record in with the missing columns set to null. Then all you need is a constraint in the job to not process a row where the CUSTOMER_ID is null.

Posted: Thu Oct 13, 2005 6:29 am
by dprasanth
chulett wrote:Vincent is talking about the Sequential File stage in a Parallel job, which will automagically reject / not process any rows where the metadata doesn't match without raising a fuss.

A Server job is different and will complain when there are mismatches. You can handle your problem by switching to the 'Columns' tab in the stage and then scrolling over to the right. There is a column called something like 'Missing' and allows you to override the default behaviour of Error when a column is missing from a file. Change everything from CUSTOMER-ID to the end from Error to Replace and it will read that last record in with the missing columns set to null. Then all you need is a constraint in the job to not process a row where the CUSTOMER_ID is null.
I did try yesterday to put a constraint and use a reject link. But still it was complaining. So I have written a small shell script that will trim the trailer record. I will use that shell script in DS. I will try out what you have told me right now. Hopefully that should work

Posted: Thu Oct 13, 2005 11:24 am
by trobinson
Sounds like overkill. If this is a Server job using a sequential stage, check the Stage uses Filter Command and add this filter
grep -v TRA

Provided the string TRA only shows up in the trailer record, the grep will return everything not a trailer.

Posted: Thu Oct 13, 2005 7:54 pm
by djm
A minor refinement to the grep suggestion. Something like the following:

Code: Select all

grep -v -e '^[0-9]*\|TRA\|END$'
would output everything bar that which matches the format of the trailer record, so TRA could appear in other data records.

Posted: Thu Oct 13, 2005 8:09 pm
by kcbland
Write a shell script to line count (wc -l) the file and then issue head -n??? where ??? is the result from the line count operation minus 1. Put this in the filter portion of the sequential stage.

Posted: Thu Oct 13, 2005 8:25 pm
by clshore
It seems from your example that the second column of Trailer
records always contains 'TRA'?
So, since you are using a Transformer, why not just add a constraint like this:

col2 <> "TRA"

Carter

Posted: Thu Oct 13, 2005 9:32 pm
by vmcburney
This thread needs to be moved to the Server forum.

Posted: Fri Oct 14, 2005 1:57 am
by dprasanth
kcbland wrote:Write a shell script to line count (wc -l) the file and then issue head -n??? where ??? is the result from the line count operation minus 1. Put this in the filter portion of the sequential stage.
Hi there, I tried the same logic yesterday and it worked

FILE=$1
ROWS=`wc -l $FILE | awk '{print $1}'`
(( ROWS=$ROWS -1))
head -$ROWS $FILE > $FILE.formated, I used this shell script and it worked. Thanks a lot for all your responses