Performance with Numeric and Date Validation

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dhiren
Premium Member
Premium Member
Posts: 43
Joined: Thu Aug 11, 2005 12:24 am

Performance with Numeric and Date Validation

Post by dhiren »

Hi,
I have a job that validates around 100 columns in the input data file whether they are numeric or date as below

Infile -----Transformer-----> Output File

From the transformer, the error in validation is captured in the Error File

----------------------
Numeric Validation
----------------------
If the input column is non-numeric OR greater than the range Dec (28,12), then it is captured as an Error_Code for that particular column. An example of numeric validation is shown below:

stgUnitParShares =

If IsNull(Lnk_from_lot_receive.UNITS_PAR_SHARES) Then ''
Else If (Not(NUM(Lnk_from_lot_receive.UNITS_PAR_SHARES))
Or (Num(Lnk_from_lot_receive.UNITS_PAR_SHARES) = 1
And Lnk_from_lot_receive.UNITS_PAR_SHARES > 9999999999999999.999999999999))
Then 'Err_DataType_UnitParShares'
Else ''

------------------
Date Validation
------------------
Similarly I have written a Date Routine 'CheckValidDate' to validate if the incoming column is date or not with the help of following Code. There are 15 such columns for date

stgSettleDt=

If Not(IsNull(Lnk_from_lot_receive.SETTLEMENT_DT)) And StringIsSpace(CheckValidDate(Lnk_from_lot_receive.SETTLEMENT_DT)) Then 'DataTypeError_SettlementDate' Else ''

CheckValidDate is a custom routine using ICONV function to check if the incoming date (CCYYMMDD) is valid.

If the Length of all concatenated stage variables like stgUnitParShares is greater than > 1 , then all such variables are concatenated and put into Error File

----------
Problem
----------
The problem is "The combination of these validations is taking an average time of 7 Mins for a sample file of 30000 Rows. (70 Rows/ Sec)

We will be having actual files upto half a million rows.
Any tips to improve the performance of this job will be very much helpful. Thanks in advance.

Thanks and Regards,
Dhiren
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The odds are that your job is bottlenecked on CPU; especially if your source and target files are sequential ones as you've indicated.

If you have a multi-cpu system or one with CPU capacity left, then I would highly recommend using 2 transform stages and splitting the CPU consuming processings across the two stages. If you enable inter-process row buffering you will have split the load across 2 pids instead of 1 and effectively doubled your available processing power for this job. Note, this only works if you have CPU capacity left and are not running flat out on your system.
arjun_004
Participant
Posts: 12
Joined: Wed Feb 07, 2007 2:10 am
Location: Mumbai
Contact:

checking for valid data using routines

Post by arjun_004 »

ArndW wrote:The odds are that your job is bottlenecked on CPU; especially if your source and target files are sequential ones as you've indicated.

If you have a multi-cpu system or one with CPU capacity left ...
Hi Everyone,

I was just looking for the option how to validate the data by using the
the datastage routine and i came across CheckValidDate routine as mentioned earlier in this message and i am not able to find out this routine in my datastage edition.

I need your inputs to find out this routine or How to develope the routine to validate the data types i.e. date

Thanks & Regards
Thanks & Regards
Arjun Kumar
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

It doesnt get shipped with the product. Its a user written routine. Ray wrote a IsValid() routine for server jobs. You can search for that.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply