Page 1 of 1

Aggregator mis behave and slow down when we correct it

Posted: Thu Jun 08, 2006 10:51 pm
by thamark
I have job to calculate the no of records and sum of the amount field to ensure that the no of record extracted and inserted and same, but i am facing one issue on the same as follows.

1) Amount is differs between the extraction and insertion, if no data type is defined for the output columns at the aggregator

2) Amount matches exactly, If i change the datatype to Decimal(18,2) but the job slows down drastically(4 mins to 1:30:20).

I can't have this solution, since time window for whole load is less than what this job takes to complete.

I hope i will get a solution from you guys.

Thanks in advance
thamark

This is known problem to many now i need a solution

Posted: Thu Jun 08, 2006 11:00 pm
by kumar_s
Perhaps the default datatype Double (38,10) might be chosed to get your result not presice as expected. And the implict conversion from double to Decimal might have caused the slow down of your job drastically in the second case.
You can try Double(18,2) in the output of aggregator and use a modify stage to convert Double to decimal explicitly(If required).

Posted: Fri Jun 09, 2006 5:56 am
by thompsonp
I saw this performance degradation in version 7.1 after running some comparative tests between aggregators, transformers and build-ops.

kumar is right in that it is the data type conversion that slows the aggregator down significantly.

The build-op was fastest (by a considerable amount), then the transformer with the aggregator slowest.

The tests aggregated about half a dozen decimal fields from a dataset and wrote the results to a dataset.

The other advantage with a build-op or transformer is that you can count rows as well as performing calculations in the same stage without having to add a field containing a default value of one (as is required with a single aggregator that can only count or calculate).

I have not run the same tests on version 7.5 and don't have the exact timings to hand.

You don't say what volumes of data you are processing (input and output) or whether you are using a hash or sort type of aggregator. Have you ensured that the data is correctly and evenly partitioned?

Thanks for response

Posted: Fri Jun 09, 2006 7:58 am
by thamark
Kumar i will try the option you have mentioned and let you guys know the result.

Thom,

Can you give bit more info on the build-op.

Thanks & Regards
Thamark

Aggregator issue

Posted: Wed Jun 14, 2006 9:00 pm
by thamark
I tried the with the option of defining output column in the aggregator as Double and convert the same in the modify stage.

This gives the runtime error as follows.

derivation in modify stage is as follows

DB_AMT:DECIMAL= decimal_from_dfloat(DB_AMT)

modify(0),3: Un-handled conversion error on field "DB_AMT " from source type "dfloat" to destination type "decimal[1,0]":
source value="2.30891e+13"; the result is non-nullable and there is no handle_null to specify a default value.
Un-handled conversion error on field "CR_AMT " from source type "dfloat" to destination type "decimal[1,0]":
source value="-2.30904e+13"; the result is non-nullable and there is no handle_null to specify a default value.

Please do let me know, if you have anything for the same.

Posted: Wed Jun 14, 2006 10:37 pm
by ray.wurlod
decimal[1,0] is a tiny number (at most one digit). I think you need to correct a typing error in your metadata. You are getting much larger numbers (with up to 14 digits) to load into this field.

Posted: Wed Jun 14, 2006 11:07 pm
by kumar_s
To avoid 2dn and 4th warning, you can change your target field as Nullable - yes or use the same modify stage to handle_null().

Aggregator

Posted: Thu Jun 15, 2006 4:24 pm
by thamark
ray.wurlod wrote:decimal[1,0] is a tiny number (at most one digit). I think you need to correct a typing error in your metadata. You are getting much larger numbers (with up to 14 digits) to load into this fi ...
Yes i do define the columns data type as Decima(18,2), but i am not able give the same in specification

DB_AMT:DECIMAL = decimal_from_dfloat(DB_AMT)

I get error if i give as follows

DB_AMT:DECIMAL(18,2) = decimal_from_dfloat(DB_AMT)

This is working fine when i am doing this conversion in the transformer, but i would like to avoid transformer.

I think i am doing some mistake in the specification.

Posted: Thu Jun 15, 2006 5:28 pm
by ray.wurlod
Try using square brackets.

Code: Select all

DB_AMT:DECIMAL[18,2] = decimal_from_dfloat(DB_AMT) 

not accurate

Posted: Thu Jun 15, 2006 5:35 pm
by thamark
ray.wurlod wrote:Try using square brackets.

Code: Select all

DB_AMT:DECIMAL[18,2] = decimal_from_dfloat(DB_AMT) 
I did the conversion using the transformer, but the output is not consistent accross the runs

see the output as follows.

Run 1
Col1(count), Col2(sum), Col3(sum)
38237290, 0023089122060890.50,-0023090433853699.81
Run 2
Col1(count), Col2(sum), Col3(sum)
38237290, 0023089122060890.37,-0023090433853699.81
Run 3
Col1(count), Col2(sum), Col3(sum)
38237290, 0023089122060890.33,-0023090433853699.92

When i define the decimal output(18,2) property in the aggregator i am getting the consistent output, but it completes in 1hr instead or 6 mins max.

Please suggest me something

Thanks & Regards

Posted: Thu Jun 15, 2006 8:42 pm
by ray.wurlod
Did you try doing the conversion using a Modify stage?

Was it a Transformer stage or a BASIC Transformer stage? A BASIC Transformer stage has a limit to the precision that can be represented, which may explain the rounding errors.

Indeed, you can still get rounding errors with decimal[18,2] if, at some point during the proceedings, the data type is float or dfloat: no computer can completely accurately represent floating point numbers.

Not accurate

Posted: Fri Jun 16, 2006 10:02 am
by thamark
ray.wurlod wrote:Did you try doing the conversion using a Modify stage?

Was it a Transformer stage or a BASIC Transformer stage? A BASIC Transformer stage has a limit to the precision that can be represented, whic ...
Ray

I am using the Transformer stage not a BASIC Transforment stage.

I tried this options using the MODIFY stage as well and result is not pleasing one.

Output
38237290, 0023089122060890.12,-0023090433853699.27


Actual should be
38237290, 0023089122060884.77,-0023090433853698.46


I can go ahead and do the aggregation in the Database, but there are some places where i have data in the file.

Accuracy and Completing this in ASAP are important. Do you have any other options.

Thanks & Regards
Thamark

Posted: Mon Jun 19, 2006 1:19 am
by thompsonp
As Ray says you get rounding errors if at some point the datatypes are float. This is the csae with the aggregator that uses Doubles. When you say you want a decimal output the double gets converted.

You have other options as previously mentioned - use a transformer or a build op with non floating point datatypes throughout.