How to Omit last record from a sequential file

dprasanth · Post by **dprasanth** » Wed Oct 12, 2005 10:15 am

Hi,
I am uploading a sequential file with around 5 million records to the oracle database. I am using a transformer to do some processing.
This is a sample data format

45080469|DET|45080464|60087074|NULL|2|Sep 1 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080470|DET|45080465|60087074|NULL|2|Oct 5 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080471|DET|45080466|60087074|NULL|2|Jan 9 2001 12:00AM|A|M|SYSTEM|09-06-2002|N|END

The third column is the Customer id and it is NOT NULL column and it is defined as Integer with length 20. But the last record of this file
is as follows

50000006|TRA|END.
It is a trailer record. Is there any way in which I can tell datastage to omit this record. I am
getting the following oracle error because of the above record

RNK_SEQOCI_V2..Sequential_File_14.DSLink17: nls_read_delimited() - row 4910396, column CustomerID, required column missing

Any help will be appreciated. Thanks in Advance.

Regards,
Dprasanth

ucf007 · Post by **ucf007** » Wed Oct 12, 2005 11:19 am

you need to preprocess the file using OS (unix) commands or a datastage routine, personally I use a DS routine to copy the selected lines (count(lines)-1) to a new file...

vmcburney · Post by **vmcburney** » Wed Oct 12, 2005 7:28 pm

You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.

Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.

What is happening at the moment? Is this last row giving you a warning or abort message?

dprasanth · Post by **dprasanth** » Thu Oct 13, 2005 1:05 am

vmcburney wrote:You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.

Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.

What is happening at the moment? Is this last row giving you a warning or abort message?

Thanks for both the replies..

The last row is aborting the job. I will try using the reject mode as you mentioned

dprasanth · Post by **dprasanth** » Thu Oct 13, 2005 1:29 am

ucf007 wrote:you need to preprocess the file using OS (unix) commands or a datastage routine, personally I use a DS routine to copy the selected lines (count(lines)-1) to a new file...

As I am new to datastage, I really don't know how to add this DS routine as a preprocess, do you mind explaining me this. I mean where exactly in designer and how will I specify it? Sorry for bothering.

Regards,
Dprasanth

dprasanth · Post by **dprasanth** » Thu Oct 13, 2005 1:54 am

vmcburney wrote:You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.

Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.

What is happening at the moment? Is this last row giving you a warning or abort message?

Hi,
The last record of the file doesn't have the key column CUSTOMER-ID itself.. How do I include as a constraint.. I mean to say that if a column itself is missing then put that as reject record...

chulett · Post by **chulett** » Thu Oct 13, 2005 6:25 am

Vincent is talking about the Sequential File stage in a Parallel job, which will automagically reject / not process any rows where the metadata doesn't match without raising a fuss.

A Server job is different and will complain when there are mismatches. You can handle your problem by switching to the 'Columns' tab in the stage and then scrolling over to the right. There is a column called something like 'Missing' and allows you to override the default behaviour of Error when a column is missing from a file. Change everything from CUSTOMER-ID to the end from Error to Replace and it will read that last record in with the missing columns set to null. Then all you need is a constraint in the job to not process a row where the CUSTOMER_ID is null.

dprasanth · Post by **dprasanth** » Thu Oct 13, 2005 6:29 am

chulett wrote:Vincent is talking about the Sequential File stage in a Parallel job, which will automagically reject / not process any rows where the metadata doesn't match without raising a fuss.

A Server job is different and will complain when there are mismatches. You can handle your problem by switching to the 'Columns' tab in the stage and then scrolling over to the right. There is a column called something like 'Missing' and allows you to override the default behaviour of Error when a column is missing from a file. Change everything from CUSTOMER-ID to the end from Error to Replace and it will read that last record in with the missing columns set to null. Then all you need is a constraint in the job to not process a row where the CUSTOMER_ID is null.

I did try yesterday to put a constraint and use a reject link. But still it was complaining. So I have written a small shell script that will trim the trailer record. I will use that shell script in DS. I will try out what you have told me right now. Hopefully that should work

trobinson · Post by **trobinson** » Thu Oct 13, 2005 11:24 am

Sounds like overkill. If this is a Server job using a sequential stage, check the Stage uses Filter Command and add this filter
grep -v TRA

Provided the string TRA only shows up in the trailer record, the grep will return everything not a trailer.

djm · Post by **djm** » Thu Oct 13, 2005 7:54 pm

A minor refinement to the grep suggestion. Something like the following:

Code: Select all

grep -v -e '^[0-9]*\|TRA\|END$'

would output everything bar that which matches the format of the trailer record, so TRA could appear in other data records.

kcbland · Post by **kcbland** » Thu Oct 13, 2005 8:09 pm

Write a shell script to line count (wc -l) the file and then issue head -n??? where ??? is the result from the line count operation minus 1. Put this in the filter portion of the sequential stage.

clshore · Post by **clshore** » Thu Oct 13, 2005 8:25 pm

It seems from your example that the second column of Trailer
records always contains 'TRA'?
So, since you are using a Transformer, why not just add a constraint like this:

col2 <> "TRA"

Carter

vmcburney · Post by **vmcburney** » Thu Oct 13, 2005 9:32 pm

This thread needs to be moved to the Server forum.

dprasanth · Post by **dprasanth** » Fri Oct 14, 2005 1:57 am

kcbland wrote:Write a shell script to line count (wc -l) the file and then issue head -n??? where ??? is the result from the line count operation minus 1. Put this in the filter portion of the sequential stage.

Hi there, I tried the same logic yesterday and it worked

FILE=$1
ROWS=`wc -l $FILE | awk '{print $1}'`
(( ROWS=$ROWS -1))
head -$ROWS $FILE > $FILE.formated, I used this shell script and it worked. Thanks a lot for all your responses