Generating dummy data between a range sequentially
Moderators: chulett, rschirm, roy
Generating dummy data between a range sequentially
My requirement is to populate a dimension table with values 0.01 to 80.00
To implement this, my job has a column generator stage with column defined as decimal(4,2) and in the generator properties, i have set the following
Initial value: 0.01
Increment: 0.01
Limit: 80.00
since the column generator stage needs an input link, i just used a row generator stage with number of records set as 1000 (random number > number of rows expected)
The output seems to be having data from 0.01 to 4.98 and restarts at 0.01 again ... so i set the mode to sequential, but still the output data has values from 0.01 to 9.99 and restarts again.
Is there some other setup required in the Column generator stage to make sure the numbers are generated in sequence with the right increment..I have not set any values for Level or Vector (??)
Or is there another way to implement this requirement?
thanks in advance for all your suggestions
To implement this, my job has a column generator stage with column defined as decimal(4,2) and in the generator properties, i have set the following
Initial value: 0.01
Increment: 0.01
Limit: 80.00
since the column generator stage needs an input link, i just used a row generator stage with number of records set as 1000 (random number > number of rows expected)
The output seems to be having data from 0.01 to 4.98 and restarts at 0.01 again ... so i set the mode to sequential, but still the output data has values from 0.01 to 9.99 and restarts again.
Is there some other setup required in the Column generator stage to make sure the numbers are generated in sequence with the right increment..I have not set any values for Level or Vector (??)
Or is there another way to implement this requirement?
thanks in advance for all your suggestions
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Thanks Ray, I extended the number of rows to 10000, because when i put 8000, i missed the last value 80.00, and i figured since i set the upper limit to 80.00, even it generated more rows, it wont be processed.
I did get all the values from 0.01 to 80.00 but i am getting some duplicates for some values, I added a sort and a remove duplicates between the column generator and the Dataset, but they still remain.
Any suggestions on why that might be happening..
I did get all the values from 0.01 to 80.00 but i am getting some duplicates for some values, I added a sort and a remove duplicates between the column generator and the Dataset, but they still remain.
Any suggestions on why that might be happening..
There is a new problem with this same job. I have the row generator, column generator operating in sequential mode (initial value:0, increment:0.01) and then i am removing duplicates generated.
but now i realised that there are couple values that are missing consistently. For this job that generates data between 0 and 80, these values are missing always everytime i run this job.
0.14, 17.9, 72.12
and i am unable to figure out why these are missing. Noticed that column generator itself is not generating these 3 values. Running with 2 node configuration.
Has anyone had a similar problem before and has any suggestion for me on how to make sure all values are generated.
Thanks in advance
but now i realised that there are couple values that are missing consistently. For this job that generates data between 0 and 80, these values are missing always everytime i run this job.
0.14, 17.9, 72.12
and i am unable to figure out why these are missing. Noticed that column generator itself is not generating these 3 values. Running with 2 node configuration.
Has anyone had a similar problem before and has any suggestion for me on how to make sure all values are generated.
Thanks in advance
It's probably a silly math issue with the internal algorithms. Maybe the partitioning logic is doing something stupid like using floating point.
Have you considered generating integer values and then dividing by 100 afterwards to get back to the scale you want?
Have you considered generating integer values and then dividing by 100 afterwards to get back to the scale you want?
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
I'm sure Ray can give you the specific reason, but the idea is that when dealing with decimal values you have something called floating point precision. 1/7 is one type of example of an infinite series value.
0.14285714285714285714285714285714
Notice a pattern? So when you say .14 a
goes off above my head. My guess is that some partitioning algorithm used somehow drops these rows because they're infinite series and not nice numbers.
If volume isn't a consideration you could have tried one node processing to remove the partitioning from the equation.
0.14285714285714285714285714285714
Notice a pattern? So when you say .14 a
![Idea :idea:](./images/smilies/icon_idea.gif)
If volume isn't a consideration you could have tried one node processing to remove the partitioning from the equation.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Generate integers 1 through 8000 and divide by 100 downstream.
The problem probably is related to internal storage of floating point numbers but I am unable (and, indeed, unwilling) to devote time to investigating more closely. As well, I'd probably need source code for the generator operator, which I don't have. Why not ask the question of the vendor?
The problem probably is related to internal storage of floating point numbers but I am unable (and, indeed, unwilling) to devote time to investigating more closely. As well, I'd probably need source code for the generator operator, which I don't have. Why not ask the question of the vendor?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.