Problem with Sample stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
legendkiller
Participant
Posts: 60
Joined: Sun Nov 21, 2004 2:24 am

Problem with Sample stage

Post by legendkiller »

The Sample stage when running in percent mode, is not giving the correct percentages to the output files.What is the significance of the random number generator concept and the 'Seed' value?.I am not able to make out the difference it makes to the output when the seed value is specified and when not specified.
Eg:My input file contains, 21 records and sampling it to 3 output files each 10,25,40 percentages respectively.But the output is not coming as expected ie.10% of 21,25% of 21 and 40% of 21 .Why is it so?.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The sum of your 10, 25, and 40 do not add up to 100% of the records, so you will be getting less rows than 10%, 25% and 40% of the total.

Random number generators are not truly random, the algorithms are designed so that with the same "seed" number they will always generate the same sequence of pseudo-random numbers. Using a different seed will generate different results. The random number sequence decides which rows in the sample are taken randomly.
Post Reply