Large numbers hanging BASIC routine

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Large numbers hanging BASIC routine

Post by Daddy Doma »

Hi All,

I've got some complex calculations to perform that require a loop process. My prototype was built in MS Access and handled the functionality with very few performance issues. I'm trying to implement in DataStage using a server job and routine.

I pass my routine two arguments (ExpectedUsage and SOH) and my code is:

Code: Select all

Ans = 0

**** Determine the starting probability distribution based on zero SOH.

ProbDist = Exp(-ExpectedUsage)

**** Set the initial loop counter to zero.
LoopCounter = 0

**** Climb the cumulative probability distribution curve until the SOH value is reached OR the result exceeds 1 (i.e. 100%).

LOOP UNTIL (LoopCounter >= SOH) OR (ProbDist >= 1)

**** Increment the Loop Counter by one.  Set the Factorial value to one.
	LoopCounter = LoopCounter + 1  
	
**** This section determines the Factorial of the LoopCounter.  Factorial is used in the ProbDist equation.
**** The Factorial function does not exist in vBasic, so this code has been developed.

	Factorial = 1

	FOR FactCounter = 1 To LoopCounter
     	Factorial = FactCounter * Factorial
	NEXT

**** The ProbDist equation is "ProbDist + (Exp(-ExpectedUsage) * Pwr(ExpectedUsage,LoopCounter)) / Factorial".
**** The following section works out the components and then determines the overall ProbDist.  
**** Because ProbDist occurs within a loop, the value is increased each time with the new value for the LoopCounter.

**** Calculate the result of base 'e' raised to the power designated by the value of ExpectedUsage.  ExpectedUsage is set as negative.
	
	ExpectedUsageExp = Exp(-ExpectedUsage)
	
**** Calculate the value of ExpectedUsage when raised to the (power of the) LoopCounter value.
	
	ExpectedUsagePwr = Pwr(ExpectedUsage,LoopCounter)

**** Determine the probability distribution for ExpectedUsage.
	
	ProbDist = ProbDist + (ExpectedUsageExp * ExpectedUsagePwr) / Factorial

REPEAT

**** When the loop is exited, the ProbDist result is adjusted to 1 (i.e. 100%).  The confidence level can not be greater then 100%.

IF ProbDist > 1 THEN
	ProbDist = 1
END

Ans = ProbDist
I've checked this routine against my prototype and the answers are correct, but I the job is hanging on (a few) very large numbers. Specifically, when I try to work out a confidence level for Expected Usage = 13,861 with a SOH of 140,368.

Any help?

Thanks,
Zac
Last edited by Daddy Doma on Mon Jan 21, 2008 8:24 pm, edited 1 time in total.
When you know that you are destined for greatness by virtue of your mutant heritage it is difficult to apply yourself to normal life. Why waste the effort when you know that your potential is so tremendous?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The first thing that stands out is that for every loop pass the factorial is recalculated. When the SOH is large, this is inefficient. You're better off keeping the running factorial and just add the next value in with each pass. This efficiency will probably help out a bunch.

Code: Select all

**** This section determines the Factorial of the LoopCounter.  Factorial is used in the ProbDist equation. 
**** The Factorial function does not exist in vBasic, so this code has been developed. 

   Factorial = 1 

   FOR FactCounter = 1 To LoopCounter 
        Factorial = FactCounter * Factorial 
   NEXT 
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

The first thing that stands out is that for every loop pass the factorial is recalculated. When the SOH is large, this is inefficient. You're better off keeping the running factorial and just add the next value in with each pass. This efficiency will probably help out a bunch.
Thanks Ken, you were totally right - our routine now breaks a lot quicker!

The problem we have now is the routine returns "0.NaNQ" for high values, i.e. ExpectedUsage=750 and SOH=490.

My MS Access prototype returns 1 (i.e. 100%) for these values - why does DataStage not do the same?

FYI - To give you an idea of what we are trying to acheive, take the following example:

If we expect to use 61 items and we have 83 stock on hand, then we are 99.7% confident that our current stock on hand will satisfy the expected usage.
Alternatively, if we expect to use 10 items and we only have 8 on hand, then we are only 33% confident we can satisfy expected usage.
When you know that you are destined for greatness by virtue of your mutant heritage it is difficult to apply yourself to normal life. Why waste the effort when you know that your potential is so tremendous?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Research EXACTNUMERIC and PRECISION, as they relate to large and small numbers. This may be your solution, as I don't have time right now to troubleshoot your function myself. My guess is that you have some really tiny or huge number math going on.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Put the following declaration at the top of your code:

Code: Select all

PRECISION 14
Let us know if that improves things.

Otherwise, since you've marked the job type as parallel (even though it's posted in the server forum), you could write the routine in C++ and take advantage of double or even quad precision floating point arithmetic.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

Hi Ray,

First of all, my apologies for not marking topic as a Server job. It is a server job, simple sequential file to transformer to sequential file design. I've edited the original post to correct.

We tried the PRECISION argument with no success - same results.

For no good reason other then 'just because' we are not allowed to implement external parallel routines at this client.

However, we've identified that any results returning "0.NaNQ" can be considered a zero value. So we are able to use the results thanks to Ken's suggestion for performance improvement.

I'll call that a 'workaround' ;-)

Thanks for the help,
Zac.
When you know that you are destined for greatness by virtue of your mutant heritage it is difficult to apply yourself to normal life. Why waste the effort when you know that your potential is so tremendous?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you have the time it would be interesting to diagnose whether the problem is occurring in Exp() or Pwr() functions or in the final arithmetic.

I have a suspicion that you are hitting a precision limit here, that something, somewhere, only works with numbers that can be described in 32 bits. Dividing by large factorials might also be limited by the precision of the machine (the DataStage Engine - remember, it's a 32-bit application).

What value is EXACTNUMERIC in your uvconfig file?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

Hi Ray,

EXACTNUMERIC was set as 15.

I experimented with different values all the way up to 57 and couldn't get rid of the '0.NanQ'.

When playing in Excel (which has an inbuild FACT function) I can work out the factorial up to 170 (7.2574E+306). 171 returns '#NUM!'.

Zac.
When you know that you are destined for greatness by virtue of your mutant heritage it is difficult to apply yourself to normal life. Why waste the effort when you know that your potential is so tremendous?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's obvious you're hitting a precision limit somewhere. I trust that you did uvregen and restart DataStage after changing EXACTNUMERIC ?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply