How to Omit last record from a sequential file
Moderators: chulett, rschirm, roy
How to Omit last record from a sequential file
Hi,
I am uploading a sequential file with around 5 million records to the oracle database. I am using a transformer to do some processing.
This is a sample data format
45080469|DET|45080464|60087074|NULL|2|Sep 1 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080470|DET|45080465|60087074|NULL|2|Oct 5 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080471|DET|45080466|60087074|NULL|2|Jan 9 2001 12:00AM|A|M|SYSTEM|09-06-2002|N|END
The third column is the Customer id and it is NOT NULL column and it is defined as Integer with length 20. But the last record of this file
is as follows
50000006|TRA|END.
It is a trailer record. Is there any way in which I can tell datastage to omit this record. I am
getting the following oracle error because of the above record
RNK_SEQOCI_V2..Sequential_File_14.DSLink17: nls_read_delimited() - row 4910396, column CustomerID, required column missing
Any help will be appreciated. Thanks in Advance.
Regards,
Dprasanth
I am uploading a sequential file with around 5 million records to the oracle database. I am using a transformer to do some processing.
This is a sample data format
45080469|DET|45080464|60087074|NULL|2|Sep 1 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080470|DET|45080465|60087074|NULL|2|Oct 5 2000 12:00AM|E|M|SYSTEM|09-06-2002|N|END
45080471|DET|45080466|60087074|NULL|2|Jan 9 2001 12:00AM|A|M|SYSTEM|09-06-2002|N|END
The third column is the Customer id and it is NOT NULL column and it is defined as Integer with length 20. But the last record of this file
is as follows
50000006|TRA|END.
It is a trailer record. Is there any way in which I can tell datastage to omit this record. I am
getting the following oracle error because of the above record
RNK_SEQOCI_V2..Sequential_File_14.DSLink17: nls_read_delimited() - row 4910396, column CustomerID, required column missing
Any help will be appreciated. Thanks in Advance.
Regards,
Dprasanth
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.
Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.
What is happening at the moment? Is this last row giving you a warning or abort message?
Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.
What is happening at the moment? Is this last row giving you a warning or abort message?
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Thanks for both the replies..vmcburney wrote:You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.
Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.
What is happening at the moment? Is this last row giving you a warning or abort message?
The last row is aborting the job. I will try using the reject mode as you mentioned
As I am new to datastage, I really don't know how to add this DS routine as a preprocess, do you mind explaining me this. I mean where exactly in designer and how will I specify it? Sorry for bothering.ucf007 wrote:you need to preprocess the file using OS (unix) commands or a datastage routine, personally I use a DS routine to copy the selected lines (count(lines)-1) to a new file...
Regards,
Dprasanth
Hi,vmcburney wrote:You have a option within the parallel sequential file stage to "Read First Rows" that limits the number of rows read from the file. You would need to use a Unix command to count the number of rows, subtract 1 and then pass this into the job as a job parameter, set the "Read First Rows" to this value.
Or you could set your Reject Mode to Continue and let the stage discard the last incomplete row.
What is happening at the moment? Is this last row giving you a warning or abort message?
The last record of the file doesn't have the key column CUSTOMER-ID itself.. How do I include as a constraint.. I mean to say that if a column itself is missing then put that as reject record...
Vincent is talking about the Sequential File stage in a Parallel job, which will automagically reject / not process any rows where the metadata doesn't match without raising a fuss.
A Server job is different and will complain when there are mismatches. You can handle your problem by switching to the 'Columns' tab in the stage and then scrolling over to the right. There is a column called something like 'Missing' and allows you to override the default behaviour of Error when a column is missing from a file. Change everything from CUSTOMER-ID to the end from Error to Replace and it will read that last record in with the missing columns set to null. Then all you need is a constraint in the job to not process a row where the CUSTOMER_ID is null.
A Server job is different and will complain when there are mismatches. You can handle your problem by switching to the 'Columns' tab in the stage and then scrolling over to the right. There is a column called something like 'Missing' and allows you to override the default behaviour of Error when a column is missing from a file. Change everything from CUSTOMER-ID to the end from Error to Replace and it will read that last record in with the missing columns set to null. Then all you need is a constraint in the job to not process a row where the CUSTOMER_ID is null.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I did try yesterday to put a constraint and use a reject link. But still it was complaining. So I have written a small shell script that will trim the trailer record. I will use that shell script in DS. I will try out what you have told me right now. Hopefully that should workchulett wrote:Vincent is talking about the Sequential File stage in a Parallel job, which will automagically reject / not process any rows where the metadata doesn't match without raising a fuss.
A Server job is different and will complain when there are mismatches. You can handle your problem by switching to the 'Columns' tab in the stage and then scrolling over to the right. There is a column called something like 'Missing' and allows you to override the default behaviour of Error when a column is missing from a file. Change everything from CUSTOMER-ID to the end from Error to Replace and it will read that last record in with the missing columns set to null. Then all you need is a constraint in the job to not process a row where the CUSTOMER_ID is null.
A minor refinement to the grep suggestion. Something like the following:
would output everything bar that which matches the format of the trailer record, so TRA could appear in other data records.
Code: Select all
grep -v -e '^[0-9]*\|TRA\|END$'
Write a shell script to line count (wc -l) the file and then issue head -n??? where ??? is the result from the line count operation minus 1. Put this in the filter portion of the sequential stage.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
This thread needs to be moved to the Server forum.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Hi there, I tried the same logic yesterday and it workedkcbland wrote:Write a shell script to line count (wc -l) the file and then issue head -n??? where ??? is the result from the line count operation minus 1. Put this in the filter portion of the sequential stage.
FILE=$1
ROWS=`wc -l $FILE | awk '{print $1}'`
(( ROWS=$ROWS -1))
head -$ROWS $FILE > $FILE.formated, I used this shell script and it worked. Thanks a lot for all your responses