Aggregator mis behave and slow down when we correct it

thamark · Post by **thamark** » Thu Jun 08, 2006 10:51 pm

I have job to calculate the no of records and sum of the amount field to ensure that the no of record extracted and inserted and same, but i am facing one issue on the same as follows.

1) Amount is differs between the extraction and insertion, if no data type is defined for the output columns at the aggregator

2) Amount matches exactly, If i change the datatype to Decimal(18,2) but the job slows down drastically(4 mins to 1:30:20).

I can't have this solution, since time window for whole load is less than what this job takes to complete.

I hope i will get a solution from you guys.

Thanks in advance
thamark

This is known problem to many now i need a solution

kumar_s · Post by **kumar_s** » Thu Jun 08, 2006 11:00 pm

Perhaps the default datatype Double (38,10) might be chosed to get your result not presice as expected. And the implict conversion from double to Decimal might have caused the slow down of your job drastically in the second case.
You can try Double(18,2) in the output of aggregator and use a modify stage to convert Double to decimal explicitly(If required).

thompsonp · Post by **thompsonp** » Fri Jun 09, 2006 5:56 am

I saw this performance degradation in version 7.1 after running some comparative tests between aggregators, transformers and build-ops.

kumar is right in that it is the data type conversion that slows the aggregator down significantly.

The build-op was fastest (by a considerable amount), then the transformer with the aggregator slowest.

The tests aggregated about half a dozen decimal fields from a dataset and wrote the results to a dataset.

The other advantage with a build-op or transformer is that you can count rows as well as performing calculations in the same stage without having to add a field containing a default value of one (as is required with a single aggregator that can only count or calculate).

I have not run the same tests on version 7.5 and don't have the exact timings to hand.

You don't say what volumes of data you are processing (input and output) or whether you are using a hash or sort type of aggregator. Have you ensured that the data is correctly and evenly partitioned?

thamark · Post by **thamark** » Fri Jun 09, 2006 7:58 am

Kumar i will try the option you have mentioned and let you guys know the result.

Thom,

Can you give bit more info on the build-op.

Thanks & Regards
Thamark

thamark · Post by **thamark** » Wed Jun 14, 2006 9:00 pm

I tried the with the option of defining output column in the aggregator as Double and convert the same in the modify stage.

This gives the runtime error as follows.

derivation in modify stage is as follows

DB_AMT:DECIMAL= decimal_from_dfloat(DB_AMT)

modify(0),3: Un-handled conversion error on field "DB_AMT " from source type "dfloat" to destination type "decimal[1,0]":
source value="2.30891e+13"; the result is non-nullable and there is no handle_null to specify a default value.
Un-handled conversion error on field "CR_AMT " from source type "dfloat" to destination type "decimal[1,0]":
source value="-2.30904e+13"; the result is non-nullable and there is no handle_null to specify a default value.

Please do let me know, if you have anything for the same.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 14, 2006 10:37 pm

decimal[1,0] is a tiny number (at most one digit). I think you need to correct a typing error in your metadata. You are getting much larger numbers (with up to 14 digits) to load into this field.

kumar_s · Post by **kumar_s** » Wed Jun 14, 2006 11:07 pm

To avoid 2dn and 4th warning, you can change your target field as Nullable - yes or use the same modify stage to handle_null().

thamark · Post by **thamark** » Thu Jun 15, 2006 4:24 pm

ray.wurlod wrote:decimal[1,0] is a tiny number (at most one digit). I think you need to correct a typing error in your metadata. You are getting much larger numbers (with up to 14 digits) to load into this fi ...

Yes i do define the columns data type as Decima(18,2), but i am not able give the same in specification

DB_AMT:DECIMAL = decimal_from_dfloat(DB_AMT)

I get error if i give as follows

DB_AMT:DECIMAL(18,2) = decimal_from_dfloat(DB_AMT)

This is working fine when i am doing this conversion in the transformer, but i would like to avoid transformer.

I think i am doing some mistake in the specification.

ray.wurlod · Post by **ray.wurlod** » Thu Jun 15, 2006 5:28 pm

Try using square brackets.

Code: Select all

DB_AMT:DECIMAL[18,2] = decimal_from_dfloat(DB_AMT)

thamark · Post by **thamark** » Thu Jun 15, 2006 5:35 pm

ray.wurlod wrote:Try using square brackets.
Code: Select all
DB_AMT:DECIMAL[18,2] = decimal_from_dfloat(DB_AMT) 

I did the conversion using the transformer, but the output is not consistent accross the runs

see the output as follows.

Run 1
Col1(count), Col2(sum), Col3(sum)
38237290, 0023089122060890.50,-0023090433853699.81
Run 2
Col1(count), Col2(sum), Col3(sum)
38237290, 0023089122060890.37,-0023090433853699.81
Run 3
Col1(count), Col2(sum), Col3(sum)
38237290, 0023089122060890.33,-0023090433853699.92

When i define the decimal output(18,2) property in the aggregator i am getting the consistent output, but it completes in 1hr instead or 6 mins max.

Please suggest me something

Thanks & Regards

ray.wurlod · Post by **ray.wurlod** » Thu Jun 15, 2006 8:42 pm

Did you try doing the conversion using a Modify stage?

Was it a Transformer stage or a BASIC Transformer stage? A BASIC Transformer stage has a limit to the precision that can be represented, which may explain the rounding errors.

Indeed, you can still get rounding errors with decimal[18,2] if, at some point during the proceedings, the data type is float or dfloat: no computer can completely accurately represent floating point numbers.

thamark · Post by **thamark** » Fri Jun 16, 2006 10:02 am

ray.wurlod wrote:Did you try doing the conversion using a Modify stage?

Was it a Transformer stage or a BASIC Transformer stage? A BASIC Transformer stage has a limit to the precision that can be represented, whic ...

Ray

I am using the Transformer stage not a BASIC Transforment stage.

I tried this options using the MODIFY stage as well and result is not pleasing one.

Output
38237290, 0023089122060890.12,-0023090433853699.27

Actual should be
38237290, 0023089122060884.77,-0023090433853698.46

I can go ahead and do the aggregation in the Database, but there are some places where i have data in the file.

Accuracy and Completing this in ASAP are important. Do you have any other options.

Thanks & Regards
Thamark

thompsonp · Post by **thompsonp** » Mon Jun 19, 2006 1:19 am

As Ray says you get rounding errors if at some point the datatypes are float. This is the csae with the aggregator that uses Doubles. When you say you want a decimal output the double gets converted.

You have other options as previously mentioned - use a transformer or a build op with non floating point datatypes throughout.

DSXchange

Aggregator mis behave and slow down when we correct it

Aggregator mis behave and slow down when we correct it

Thanks for response

Aggregator issue

Aggregator

not accurate

Not accurate