Generating dummy data between a range sequentially

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Generating dummy data between a range sequentially

Post by jreddy »

My requirement is to populate a dimension table with values 0.01 to 80.00
To implement this, my job has a column generator stage with column defined as decimal(4,2) and in the generator properties, i have set the following
Initial value: 0.01
Increment: 0.01
Limit: 80.00

since the column generator stage needs an input link, i just used a row generator stage with number of records set as 1000 (random number > number of rows expected)

The output seems to be having data from 0.01 to 4.98 and restarts at 0.01 again ... so i set the mode to sequential, but still the output data has values from 0.01 to 9.99 and restarts again.

Is there some other setup required in the Column generator stage to make sure the numbers are generated in sequence with the right increment..I have not set any values for Level or Vector (??)

Or is there another way to implement this requirement?

thanks in advance for all your suggestions
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are 8000 values between 0.01 and 80.00 if the increment is 0.01. You need to set your rows to generate property appropriately.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

Thanks Ray, I extended the number of rows to 10000, because when i put 8000, i missed the last value 80.00, and i figured since i set the upper limit to 80.00, even it generated more rows, it wont be processed.

I did get all the values from 0.01 to 80.00 but i am getting some duplicates for some values, I added a sort and a remove duplicates between the column generator and the Dataset, but they still remain.

Any suggestions on why that might be happening..
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

Actually, what i did to get rid of duplicates was to set the option 'Allow duplicates' to False in the SORT stage and it all worked fine.

thanks for your advice Ray. Appreciate it
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

There is a new problem with this same job. I have the row generator, column generator operating in sequential mode (initial value:0, increment:0.01) and then i am removing duplicates generated.

but now i realised that there are couple values that are missing consistently. For this job that generates data between 0 and 80, these values are missing always everytime i run this job.

0.14, 17.9, 72.12

and i am unable to figure out why these are missing. Noticed that column generator itself is not generating these 3 values. Running with 2 node configuration.

Has anyone had a similar problem before and has any suggestion for me on how to make sure all values are generated.

Thanks in advance
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

It's probably a silly math issue with the internal algorithms. Maybe the partitioning logic is doing something stupid like using floating point.

Have you considered generating integer values and then dividing by 100 afterwards to get back to the scale you want?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

Thanks Kenneth,

I still cant understand why, but doing what you suggested made my job work :) Must be some silly math algorithm issue as you said.

Thanks
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I'm sure Ray can give you the specific reason, but the idea is that when dealing with decimal values you have something called floating point precision. 1/7 is one type of example of an infinite series value.

0.14285714285714285714285714285714

Notice a pattern? So when you say .14 a :idea: goes off above my head. My guess is that some partitioning algorithm used somehow drops these rows because they're infinite series and not nice numbers.

If volume isn't a consideration you could have tried one node processing to remove the partitioning from the equation.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Generate integers 1 through 8000 and divide by 100 downstream.

The problem probably is related to internal storage of floating point numbers but I am unable (and, indeed, unwilling) to devote time to investigating more closely. As well, I'd probably need source code for the generator operator, which I don't have. Why not ask the question of the vendor?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply