Performance with Numeric and Date Validation

dhiren · Post by **dhiren** » Tue Feb 20, 2007 9:47 am

Hi,
I have a job that validates around 100 columns in the input data file whether they are numeric or date as below

Infile -----Transformer-----> Output File

From the transformer, the error in validation is captured in the Error File

----------------------
Numeric Validation
----------------------
If the input column is non-numeric OR greater than the range Dec (28,12), then it is captured as an Error_Code for that particular column. An example of numeric validation is shown below:

stgUnitParShares =

If IsNull(Lnk_from_lot_receive.UNITS_PAR_SHARES) Then ''
Else If (Not(NUM(Lnk_from_lot_receive.UNITS_PAR_SHARES))
Or (Num(Lnk_from_lot_receive.UNITS_PAR_SHARES) = 1
And Lnk_from_lot_receive.UNITS_PAR_SHARES > 9999999999999999.999999999999))
Then 'Err_DataType_UnitParShares'
Else ''

------------------
Date Validation
------------------
Similarly I have written a Date Routine 'CheckValidDate' to validate if the incoming column is date or not with the help of following Code. There are 15 such columns for date

stgSettleDt=

If Not(IsNull(Lnk_from_lot_receive.SETTLEMENT_DT)) And StringIsSpace(CheckValidDate(Lnk_from_lot_receive.SETTLEMENT_DT)) Then 'DataTypeError_SettlementDate' Else ''

CheckValidDate is a custom routine using ICONV function to check if the incoming date (CCYYMMDD) is valid.

If the Length of all concatenated stage variables like stgUnitParShares is greater than > 1 , then all such variables are concatenated and put into Error File

----------
Problem
----------
The problem is "The combination of these validations is taking an average time of 7 Mins for a sample file of 30000 Rows. (70 Rows/ Sec)

We will be having actual files upto half a million rows.
Any tips to improve the performance of this job will be very much helpful. Thanks in advance.

Thanks and Regards,
Dhiren

ArndW · Post by **ArndW** » Tue Feb 20, 2007 10:24 am

The odds are that your job is bottlenecked on CPU; especially if your source and target files are sequential ones as you've indicated.

If you have a multi-cpu system or one with CPU capacity left, then I would highly recommend using 2 transform stages and splitting the CPU consuming processings across the two stages. If you enable inter-process row buffering you will have split the load across 2 pids instead of 1 and effectively doubled your available processing power for this job. Note, this only works if you have CPU capacity left and are not running flat out on your system.

arjun_004 · Post by **arjun_004** » Wed Mar 21, 2007 8:51 am

ArndW wrote:The odds are that your job is bottlenecked on CPU; especially if your source and target files are sequential ones as you've indicated.

If you have a multi-cpu system or one with CPU capacity left ...

Hi Everyone,

I was just looking for the option how to validate the data by using the
the datastage routine and i came across CheckValidDate routine as mentioned earlier in this message and i am not able to find out this routine in my datastage edition.

I need your inputs to find out this routine or How to develope the routine to validate the data types i.e. date

Thanks & Regards

DSguru2B · Post by **DSguru2B** » Wed Mar 21, 2007 8:59 am

It doesnt get shipped with the product. Its a user written routine. Ray wrote a IsValid() routine for server jobs. You can search for that.

DSXchange

Performance with Numeric and Date Validation

Performance with Numeric and Date Validation

checking for valid data using routines