Page 1 of 1

Large numbers hanging BASIC routine

Posted: Mon Jan 21, 2008 4:34 pm
by Daddy Doma
Hi All,

I've got some complex calculations to perform that require a loop process. My prototype was built in MS Access and handled the functionality with very few performance issues. I'm trying to implement in DataStage using a server job and routine.

I pass my routine two arguments (ExpectedUsage and SOH) and my code is:

Code: Select all

Ans = 0

**** Determine the starting probability distribution based on zero SOH.

ProbDist = Exp(-ExpectedUsage)

**** Set the initial loop counter to zero.
LoopCounter = 0

**** Climb the cumulative probability distribution curve until the SOH value is reached OR the result exceeds 1 (i.e. 100%).

LOOP UNTIL (LoopCounter >= SOH) OR (ProbDist >= 1)

**** Increment the Loop Counter by one.  Set the Factorial value to one.
	LoopCounter = LoopCounter + 1  
	
**** This section determines the Factorial of the LoopCounter.  Factorial is used in the ProbDist equation.
**** The Factorial function does not exist in vBasic, so this code has been developed.

	Factorial = 1

	FOR FactCounter = 1 To LoopCounter
     	Factorial = FactCounter * Factorial
	NEXT

**** The ProbDist equation is "ProbDist + (Exp(-ExpectedUsage) * Pwr(ExpectedUsage,LoopCounter)) / Factorial".
**** The following section works out the components and then determines the overall ProbDist.  
**** Because ProbDist occurs within a loop, the value is increased each time with the new value for the LoopCounter.

**** Calculate the result of base 'e' raised to the power designated by the value of ExpectedUsage.  ExpectedUsage is set as negative.
	
	ExpectedUsageExp = Exp(-ExpectedUsage)
	
**** Calculate the value of ExpectedUsage when raised to the (power of the) LoopCounter value.
	
	ExpectedUsagePwr = Pwr(ExpectedUsage,LoopCounter)

**** Determine the probability distribution for ExpectedUsage.
	
	ProbDist = ProbDist + (ExpectedUsageExp * ExpectedUsagePwr) / Factorial

REPEAT

**** When the loop is exited, the ProbDist result is adjusted to 1 (i.e. 100%).  The confidence level can not be greater then 100%.

IF ProbDist > 1 THEN
	ProbDist = 1
END

Ans = ProbDist
I've checked this routine against my prototype and the answers are correct, but I the job is hanging on (a few) very large numbers. Specifically, when I try to work out a confidence level for Expected Usage = 13,861 with a SOH of 140,368.

Any help?

Thanks,
Zac

Posted: Mon Jan 21, 2008 5:12 pm
by kcbland
The first thing that stands out is that for every loop pass the factorial is recalculated. When the SOH is large, this is inefficient. You're better off keeping the running factorial and just add the next value in with each pass. This efficiency will probably help out a bunch.

Code: Select all

**** This section determines the Factorial of the LoopCounter.  Factorial is used in the ProbDist equation. 
**** The Factorial function does not exist in vBasic, so this code has been developed. 

   Factorial = 1 

   FOR FactCounter = 1 To LoopCounter 
        Factorial = FactCounter * Factorial 
   NEXT 

Posted: Mon Jan 21, 2008 5:31 pm
by Daddy Doma
The first thing that stands out is that for every loop pass the factorial is recalculated. When the SOH is large, this is inefficient. You're better off keeping the running factorial and just add the next value in with each pass. This efficiency will probably help out a bunch.
Thanks Ken, you were totally right - our routine now breaks a lot quicker!

The problem we have now is the routine returns "0.NaNQ" for high values, i.e. ExpectedUsage=750 and SOH=490.

My MS Access prototype returns 1 (i.e. 100%) for these values - why does DataStage not do the same?

FYI - To give you an idea of what we are trying to acheive, take the following example:

If we expect to use 61 items and we have 83 stock on hand, then we are 99.7% confident that our current stock on hand will satisfy the expected usage.
Alternatively, if we expect to use 10 items and we only have 8 on hand, then we are only 33% confident we can satisfy expected usage.

Posted: Mon Jan 21, 2008 7:33 pm
by kcbland
Research EXACTNUMERIC and PRECISION, as they relate to large and small numbers. This may be your solution, as I don't have time right now to troubleshoot your function myself. My guess is that you have some really tiny or huge number math going on.

Posted: Mon Jan 21, 2008 7:41 pm
by ray.wurlod
Put the following declaration at the top of your code:

Code: Select all

PRECISION 14
Let us know if that improves things.

Otherwise, since you've marked the job type as parallel (even though it's posted in the server forum), you could write the routine in C++ and take advantage of double or even quad precision floating point arithmetic.

Posted: Wed Jan 23, 2008 3:39 pm
by Daddy Doma
Hi Ray,

First of all, my apologies for not marking topic as a Server job. It is a server job, simple sequential file to transformer to sequential file design. I've edited the original post to correct.

We tried the PRECISION argument with no success - same results.

For no good reason other then 'just because' we are not allowed to implement external parallel routines at this client.

However, we've identified that any results returning "0.NaNQ" can be considered a zero value. So we are able to use the results thanks to Ken's suggestion for performance improvement.

I'll call that a 'workaround' ;-)

Thanks for the help,
Zac.

Posted: Wed Jan 23, 2008 7:30 pm
by ray.wurlod
If you have the time it would be interesting to diagnose whether the problem is occurring in Exp() or Pwr() functions or in the final arithmetic.

I have a suspicion that you are hitting a precision limit here, that something, somewhere, only works with numbers that can be described in 32 bits. Dividing by large factorials might also be limited by the precision of the machine (the DataStage Engine - remember, it's a 32-bit application).

What value is EXACTNUMERIC in your uvconfig file?

Posted: Wed Jan 23, 2008 8:59 pm
by Daddy Doma
Hi Ray,

EXACTNUMERIC was set as 15.

I experimented with different values all the way up to 57 and couldn't get rid of the '0.NanQ'.

When playing in Excel (which has an inbuild FACT function) I can work out the factorial up to 170 (7.2574E+306). 171 returns '#NUM!'.

Zac.

Posted: Thu Jan 24, 2008 1:46 am
by ray.wurlod
It's obvious you're hitting a precision limit somewhere. I trust that you did uvregen and restart DataStage after changing EXACTNUMERIC ?