Stored Proc/Unix Script/DS Routine : which is better ?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
abby
Participant
Posts: 13
Joined: Fri Dec 30, 2005 1:00 am

Stored Proc/Unix Script/DS Routine : which is better ?

Post by abby »

Hi All,

I'm facing an issue to perform a validation in my job.

The job reads an input file and validate one of its fields against a look-up table. While validating I need to see if this value exists in the look-up table and if so, I need to fetch other column's value from the lookup table and mark the record as valid else the record in question is invalid. Also, the input file size will be of the order of 70,000 records.

Now, I have three options to perform this validation :

a. Call a Strored Procedure for each input row and get the validation done ( I have a readymade stord proc for this, which needs some parameters like the value to be looked-up,the look-up table name and its columns).

b. Call a Unix Script which would perform the validation against the extracted data from the look-up table and return me the desired value. This return value will help me decide to accept or reject the record.

c. Call a DS Routine from the job which will validate the data as required.

Issues I anticipate with these approaches are:

First approach : May result in job hanging as it calls a DB object from job as many times as the number of records (here 70,000) since we are already facing job hanging issues even in simple select queries. Also,I need to pass the input link data to stored procedure as parameter value. Not sure how to do this.

Second approach : Is it advisable to call a unix script thousand times for validating data.Also, this again requires to pass the input link data as a parameter value to the unix script. Will there be any job performance issue?

Third approach : It will require reading file (look-up file data) from within a routine to perform validation against input link data. Is it advisable to use file handling withing DS Routines?

Please advise which is a better option.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Fourth, and best, option is to use DataStage.

A standard lookup will not only determine existence but also return any columns from that row that you specify.

Based on the success of failure of the lookup, you can make any further decision required such as setting a value in an IsValid variable or column.

The input file size is irrelevant; they are processed one at a time in a stream.

Best performance will be had if you pre-load the 70,000 records into a hashed file - but load only the columns that you will actually need - the key and the column whose value you want to return. That way you will have a small row size and therefore a small hashed file, which is more likely to be cached in memory successfully.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply