Aggregator mis behave and slow down when we correct it

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
thamark
Premium Member
Premium Member
Posts: 43
Joined: Thu Jan 29, 2004 12:12 am
Location: US

Aggregator mis behave and slow down when we correct it

Post by thamark »

I have job to calculate the no of records and sum of the amount field to ensure that the no of record extracted and inserted and same, but i am facing one issue on the same as follows.

1) Amount is differs between the extraction and insertion, if no data type is defined for the output columns at the aggregator

2) Amount matches exactly, If i change the datatype to Decimal(18,2) but the job slows down drastically(4 mins to 1:30:20).

I can't have this solution, since time window for whole load is less than what this job takes to complete.

I hope i will get a solution from you guys.

Thanks in advance
thamark

This is known problem to many now i need a solution
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Perhaps the default datatype Double (38,10) might be chosed to get your result not presice as expected. And the implict conversion from double to Decimal might have caused the slow down of your job drastically in the second case.
You can try Double(18,2) in the output of aggregator and use a modify stage to convert Double to decimal explicitly(If required).
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

I saw this performance degradation in version 7.1 after running some comparative tests between aggregators, transformers and build-ops.

kumar is right in that it is the data type conversion that slows the aggregator down significantly.

The build-op was fastest (by a considerable amount), then the transformer with the aggregator slowest.

The tests aggregated about half a dozen decimal fields from a dataset and wrote the results to a dataset.

The other advantage with a build-op or transformer is that you can count rows as well as performing calculations in the same stage without having to add a field containing a default value of one (as is required with a single aggregator that can only count or calculate).

I have not run the same tests on version 7.5 and don't have the exact timings to hand.

You don't say what volumes of data you are processing (input and output) or whether you are using a hash or sort type of aggregator. Have you ensured that the data is correctly and evenly partitioned?
thamark
Premium Member
Premium Member
Posts: 43
Joined: Thu Jan 29, 2004 12:12 am
Location: US

Thanks for response

Post by thamark »

Kumar i will try the option you have mentioned and let you guys know the result.

Thom,

Can you give bit more info on the build-op.

Thanks & Regards
Thamark
thamark
Premium Member
Premium Member
Posts: 43
Joined: Thu Jan 29, 2004 12:12 am
Location: US

Aggregator issue

Post by thamark »

I tried the with the option of defining output column in the aggregator as Double and convert the same in the modify stage.

This gives the runtime error as follows.

derivation in modify stage is as follows

DB_AMT:DECIMAL= decimal_from_dfloat(DB_AMT)

modify(0),3: Un-handled conversion error on field "DB_AMT " from source type "dfloat" to destination type "decimal[1,0]":
source value="2.30891e+13"; the result is non-nullable and there is no handle_null to specify a default value.
Un-handled conversion error on field "CR_AMT " from source type "dfloat" to destination type "decimal[1,0]":
source value="-2.30904e+13"; the result is non-nullable and there is no handle_null to specify a default value.

Please do let me know, if you have anything for the same.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

decimal[1,0] is a tiny number (at most one digit). I think you need to correct a typing error in your metadata. You are getting much larger numbers (with up to 14 digits) to load into this field.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

To avoid 2dn and 4th warning, you can change your target field as Nullable - yes or use the same modify stage to handle_null().
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
thamark
Premium Member
Premium Member
Posts: 43
Joined: Thu Jan 29, 2004 12:12 am
Location: US

Aggregator

Post by thamark »

ray.wurlod wrote:decimal[1,0] is a tiny number (at most one digit). I think you need to correct a typing error in your metadata. You are getting much larger numbers (with up to 14 digits) to load into this fi ...
Yes i do define the columns data type as Decima(18,2), but i am not able give the same in specification

DB_AMT:DECIMAL = decimal_from_dfloat(DB_AMT)

I get error if i give as follows

DB_AMT:DECIMAL(18,2) = decimal_from_dfloat(DB_AMT)

This is working fine when i am doing this conversion in the transformer, but i would like to avoid transformer.

I think i am doing some mistake in the specification.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try using square brackets.

Code: Select all

DB_AMT:DECIMAL[18,2] = decimal_from_dfloat(DB_AMT) 
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thamark
Premium Member
Premium Member
Posts: 43
Joined: Thu Jan 29, 2004 12:12 am
Location: US

not accurate

Post by thamark »

ray.wurlod wrote:Try using square brackets.

Code: Select all

DB_AMT:DECIMAL[18,2] = decimal_from_dfloat(DB_AMT) 
I did the conversion using the transformer, but the output is not consistent accross the runs

see the output as follows.

Run 1
Col1(count), Col2(sum), Col3(sum)
38237290, 0023089122060890.50,-0023090433853699.81
Run 2
Col1(count), Col2(sum), Col3(sum)
38237290, 0023089122060890.37,-0023090433853699.81
Run 3
Col1(count), Col2(sum), Col3(sum)
38237290, 0023089122060890.33,-0023090433853699.92

When i define the decimal output(18,2) property in the aggregator i am getting the consistent output, but it completes in 1hr instead or 6 mins max.

Please suggest me something

Thanks & Regards
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did you try doing the conversion using a Modify stage?

Was it a Transformer stage or a BASIC Transformer stage? A BASIC Transformer stage has a limit to the precision that can be represented, which may explain the rounding errors.

Indeed, you can still get rounding errors with decimal[18,2] if, at some point during the proceedings, the data type is float or dfloat: no computer can completely accurately represent floating point numbers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thamark
Premium Member
Premium Member
Posts: 43
Joined: Thu Jan 29, 2004 12:12 am
Location: US

Not accurate

Post by thamark »

ray.wurlod wrote:Did you try doing the conversion using a Modify stage?

Was it a Transformer stage or a BASIC Transformer stage? A BASIC Transformer stage has a limit to the precision that can be represented, whic ...
Ray

I am using the Transformer stage not a BASIC Transforment stage.

I tried this options using the MODIFY stage as well and result is not pleasing one.

Output
38237290, 0023089122060890.12,-0023090433853699.27


Actual should be
38237290, 0023089122060884.77,-0023090433853698.46


I can go ahead and do the aggregation in the Database, but there are some places where i have data in the file.

Accuracy and Completing this in ASAP are important. Do you have any other options.

Thanks & Regards
Thamark
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

As Ray says you get rounding errors if at some point the datatypes are float. This is the csae with the aggregator that uses Doubles. When you say you want a decimal output the double gets converted.

You have other options as previously mentioned - use a transformer or a build op with non floating point datatypes throughout.
Post Reply