Page 1 of 1

Interesting Sample stage problem

Posted: Wed Nov 08, 2006 10:03 am
by abc123
I have 2 jobs.
Job1: dataset1 -> copy -> sample - > dataset2
Job2: dataset1 -> sample - > dataset2

Both jobs run fine. However, in Job1, the number of rows moving from dataset1 to copy shows 100 whereas, in Job2, number of rows moving from dataset1 to sample stage shows 20. The source file for both jobs is the same.

Any ideas about this discrepancy? Thanks.

Posted: Wed Nov 08, 2006 1:14 pm
by ray.wurlod
Are any rejects being reported in the job log of Job 2?
Do you have reject links to handle these?

Posted: Wed Nov 08, 2006 1:39 pm
by samsuf2002
how many rows u r getting in target dataset?

Posted: Wed Nov 08, 2006 3:11 pm
by abc123
There are no reject links in either of the jobs. The set up is exactly like I described in OP. I am getting 20 rows in the target in both jobs.

Posted: Wed Nov 08, 2006 4:03 pm
by talk2shaanc
This link counts also depends on the stage to which its going...
what is the count it shows when records move from copy stage to sample stage ? I am sure it would be, 20 NOT 100...

I have never observed or noticed such thing in PX, but in server jobs i noticed it many times...

Code: Select all

     hash File (having 500 records)
                      |
                     10
                      |
seq---1000--> transformer ---> o/p
In above design of server job, the hash file had 500 recs but the link count 10 meant number of records from hash file finding a match with the seq file record.

Posted: Wed Nov 08, 2006 4:15 pm
by ray.wurlod
Are the sampling characteristics set identically?

Posted: Wed Nov 08, 2006 4:45 pm
by abc123
The number of rows going from the Copy To Sample shows 100.
Here are the Sample stage properties:

Max Rows (Per Partition) = 10
Period (Per Partition) = 1
Sample Mode = Period

Posted: Wed Nov 08, 2006 5:13 pm
by ray.wurlod
Max rows per partition = 10 means 20 rows on two partitions.

It's your design - specifically this property - that is limiting the number of rows output from the stage.

You have taken a sample!

Posted: Thu Nov 09, 2006 7:54 am
by tagnihotri
Is the partitioning right ?
ray.wurlod wrote:Max rows per partition = 10 means 20 rows on two partitions.

It's your design - specifically this property - that is limiting the number of rows output from the stage.

You have taken a sample! ...

Posted: Thu Nov 09, 2006 9:04 am
by splayer
ray, thank you for your response. I still don't understand why the number of rows coming in to the sample stage varies. I can understand going out might be different because of setting on the Sample stage. The Copy stage has a partitioning set to Auto.