How to Omit last record from a sequential file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dprasanth
Charter Member
Charter Member
Posts: 90
Joined: Fri Oct 07, 2005 7:55 am

How to Omit last record from a sequential file

Post by dprasanth »

Hi,
I am uploading a sequential file with around 5 million records to the oracle database. I am using a transformer to do some processing.
This is a sample data format

45080469|DET|45080464|60087074|NULL|2|Sep 1 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080470|DET|45080465|60087074|NULL|2|Oct 5 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080471|DET|45080466|60087074|NULL|2|Jan 9 2001 12:00AM|A|M|SYSTEM|09-06-2002|N|END

The third column is the Customer id and it is NOT NULL column and it is defined as Integer with length 20. But the last record of this file
is as follows

50000006|TRA|END.
It is a trailer record. Is there any way in which I can tell datastage to omit this record. I am
getting the following oracle error because of the above record

RNK_SEQOCI_V2..Sequential_File_14.DSLink17: nls_read_delimited() - row 4910396, column CustomerID, required column missing

Any help will be appreciated. Thanks in Advance.

Regards,
Dprasanth
ucf007
Charter Member
Charter Member
Posts: 18
Joined: Fri Feb 27, 2004 2:25 pm

Post by ucf007 »

you need to preprocess the file using OS (unix) commands or a datastage routine, personally I use a DS routine to copy the selected lines (count(lines)-1) to a new file...
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.

Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.

What is happening at the moment? Is this last row giving you a warning or abort message?
dprasanth
Charter Member
Charter Member
Posts: 90
Joined: Fri Oct 07, 2005 7:55 am

Post by dprasanth »

vmcburney wrote:You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.

Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.

What is happening at the moment? Is this last row giving you a warning or abort message?
Thanks for both the replies..

The last row is aborting the job. I will try using the reject mode as you mentioned
dprasanth
Charter Member
Charter Member
Posts: 90
Joined: Fri Oct 07, 2005 7:55 am

Post by dprasanth »

ucf007 wrote:you need to preprocess the file using OS (unix) commands or a datastage routine, personally I use a DS routine to copy the selected lines (count(lines)-1) to a new file...
As I am new to datastage, I really don't know how to add this DS routine as a preprocess, do you mind explaining me this. I mean where exactly in designer and how will I specify it? Sorry for bothering.

Regards,
Dprasanth
dprasanth
Charter Member
Charter Member
Posts: 90
Joined: Fri Oct 07, 2005 7:55 am

Post by dprasanth »

vmcburney wrote:You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.

Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.

What is happening at the moment? Is this last row giving you a warning or abort message?
Hi,
The last record of the file doesn't have the key column CUSTOMER-ID itself.. How do I include as a constraint.. I mean to say that if a column itself is missing then put that as reject record...
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Vincent is talking about the Sequential File stage in a Parallel job, which will automagically reject / not process any rows where the metadata doesn't match without raising a fuss.

A Server job is different and will complain when there are mismatches. You can handle your problem by switching to the 'Columns' tab in the stage and then scrolling over to the right. There is a column called something like 'Missing' and allows you to override the default behaviour of Error when a column is missing from a file. Change everything from CUSTOMER-ID to the end from Error to Replace and it will read that last record in with the missing columns set to null. Then all you need is a constraint in the job to not process a row where the CUSTOMER_ID is null.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dprasanth
Charter Member
Charter Member
Posts: 90
Joined: Fri Oct 07, 2005 7:55 am

Post by dprasanth »

chulett wrote:Vincent is talking about the Sequential File stage in a Parallel job, which will automagically reject / not process any rows where the metadata doesn't match without raising a fuss.

A Server job is different and will complain when there are mismatches. You can handle your problem by switching to the 'Columns' tab in the stage and then scrolling over to the right. There is a column called something like 'Missing' and allows you to override the default behaviour of Error when a column is missing from a file. Change everything from CUSTOMER-ID to the end from Error to Replace and it will read that last record in with the missing columns set to null. Then all you need is a constraint in the job to not process a row where the CUSTOMER_ID is null.
I did try yesterday to put a constraint and use a reject link. But still it was complaining. So I have written a small shell script that will trim the trailer record. I will use that shell script in DS. I will try out what you have told me right now. Hopefully that should work
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

Sounds like overkill. If this is a Server job using a sequential stage, check the Stage uses Filter Command and add this filter
grep -v TRA

Provided the string TRA only shows up in the trailer record, the grep will return everything not a trailer.
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

A minor refinement to the grep suggestion. Something like the following:

Code: Select all

grep -v -e '^[0-9]*\|TRA\|END$'
would output everything bar that which matches the format of the trailer record, so TRA could appear in other data records.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Write a shell script to line count (wc -l) the file and then issue head -n??? where ??? is the result from the line count operation minus 1. Put this in the filter portion of the sequential stage.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

It seems from your example that the second column of Trailer
records always contains 'TRA'?
So, since you are using a Transformer, why not just add a constraint like this:

col2 <> "TRA"

Carter
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

This thread needs to be moved to the Server forum.
dprasanth
Charter Member
Charter Member
Posts: 90
Joined: Fri Oct 07, 2005 7:55 am

Post by dprasanth »

kcbland wrote:Write a shell script to line count (wc -l) the file and then issue head -n??? where ??? is the result from the line count operation minus 1. Put this in the filter portion of the sequential stage.
Hi there, I tried the same logic yesterday and it worked

FILE=$1
ROWS=`wc -l $FILE | awk '{print $1}'`
(( ROWS=$ROWS -1))
head -$ROWS $FILE > $FILE.formated, I used this shell script and it worked. Thanks a lot for all your responses
Post Reply