Hi,
I have a job that validates around 100 columns in the input data file whether they are numeric or date as below
Infile -----Transformer-----> Output File
From the transformer, the error in validation is captured in the Error File
----------------------
Numeric Validation
----------------------
If the input column is non-numeric OR greater than the range Dec (28,12), then it is captured as an Error_Code for that particular column. An example of numeric validation is shown below:
stgUnitParShares =
If IsNull(Lnk_from_lot_receive.UNITS_PAR_SHARES) Then ''
Else If (Not(NUM(Lnk_from_lot_receive.UNITS_PAR_SHARES))
Or (Num(Lnk_from_lot_receive.UNITS_PAR_SHARES) = 1
And Lnk_from_lot_receive.UNITS_PAR_SHARES > 9999999999999999.999999999999))
Then 'Err_DataType_UnitParShares'
Else ''
------------------
Date Validation
------------------
Similarly I have written a Date Routine 'CheckValidDate' to validate if the incoming column is date or not with the help of following Code. There are 15 such columns for date
stgSettleDt=
If Not(IsNull(Lnk_from_lot_receive.SETTLEMENT_DT)) And StringIsSpace(CheckValidDate(Lnk_from_lot_receive.SETTLEMENT_DT)) Then 'DataTypeError_SettlementDate' Else ''
CheckValidDate is a custom routine using ICONV function to check if the incoming date (CCYYMMDD) is valid.
If the Length of all concatenated stage variables like stgUnitParShares is greater than > 1 , then all such variables are concatenated and put into Error File
----------
Problem
----------
The problem is "The combination of these validations is taking an average time of 7 Mins for a sample file of 30000 Rows. (70 Rows/ Sec)
We will be having actual files upto half a million rows.
Any tips to improve the performance of this job will be very much helpful. Thanks in advance.
Thanks and Regards,
Dhiren
Performance with Numeric and Date Validation
Moderators: chulett, rschirm, roy
The odds are that your job is bottlenecked on CPU; especially if your source and target files are sequential ones as you've indicated.
If you have a multi-cpu system or one with CPU capacity left, then I would highly recommend using 2 transform stages and splitting the CPU consuming processings across the two stages. If you enable inter-process row buffering you will have split the load across 2 pids instead of 1 and effectively doubled your available processing power for this job. Note, this only works if you have CPU capacity left and are not running flat out on your system.
If you have a multi-cpu system or one with CPU capacity left, then I would highly recommend using 2 transform stages and splitting the CPU consuming processings across the two stages. If you enable inter-process row buffering you will have split the load across 2 pids instead of 1 and effectively doubled your available processing power for this job. Note, this only works if you have CPU capacity left and are not running flat out on your system.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
checking for valid data using routines
Hi Everyone,ArndW wrote:The odds are that your job is bottlenecked on CPU; especially if your source and target files are sequential ones as you've indicated.
If you have a multi-cpu system or one with CPU capacity left ...
I was just looking for the option how to validate the data by using the
the datastage routine and i came across CheckValidDate routine as mentioned earlier in this message and i am not able to find out this routine in my datastage edition.
I need your inputs to find out this routine or How to develope the routine to validate the data types i.e. date
Thanks & Regards
Thanks & Regards
Arjun Kumar
Arjun Kumar