Interesting Sample stage problem

abc123 · Post by **abc123** » Wed Nov 08, 2006 10:03 am

I have 2 jobs.
Job1: dataset1 -> copy -> sample - > dataset2
Job2: dataset1 -> sample - > dataset2

Both jobs run fine. However, in Job1, the number of rows moving from dataset1 to copy shows 100 whereas, in Job2, number of rows moving from dataset1 to sample stage shows 20. The source file for both jobs is the same.

Any ideas about this discrepancy? Thanks.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 08, 2006 1:14 pm

Are any rejects being reported in the job log of Job 2?
Do you have reject links to handle these?

samsuf2002 · Post by **samsuf2002** » Wed Nov 08, 2006 1:39 pm

how many rows u r getting in target dataset?

abc123 · Post by **abc123** » Wed Nov 08, 2006 3:11 pm

There are no reject links in either of the jobs. The set up is exactly like I described in OP. I am getting 20 rows in the target in both jobs.

talk2shaanc · Post by **talk2shaanc** » Wed Nov 08, 2006 4:03 pm

This link counts also depends on the stage to which its going...
what is the count it shows when records move from copy stage to sample stage ? I am sure it would be, 20 NOT 100...

I have never observed or noticed such thing in PX, but in server jobs i noticed it many times...

Code: Select all

     hash File (having 500 records)
                      |
                     10
                      |
seq---1000--> transformer ---> o/p

In above design of server job, the hash file had 500 recs but the link count 10 meant number of records from hash file finding a match with the seq file record.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 08, 2006 4:15 pm

Are the sampling characteristics set identically?

abc123 · Post by **abc123** » Wed Nov 08, 2006 4:45 pm

The number of rows going from the Copy To Sample shows 100.
Here are the Sample stage properties:

Max Rows (Per Partition) = 10
Period (Per Partition) = 1
Sample Mode = Period

ray.wurlod · Post by **ray.wurlod** » Wed Nov 08, 2006 5:13 pm

Max rows per partition = 10 means 20 rows on two partitions.

It's your design - specifically this property - that is limiting the number of rows output from the stage.

You have taken a sample!

tagnihotri · Post by **tagnihotri** » Thu Nov 09, 2006 7:54 am

Is the partitioning right ?

ray.wurlod wrote:Max rows per partition = 10 means 20 rows on two partitions.

It's your design - specifically this property - that is limiting the number of rows output from the stage.

You have taken a sample! ...

splayer · Post by **splayer** » Thu Nov 09, 2006 9:04 am

ray, thank you for your response. I still don't understand why the number of rows coming in to the sample stage varies. I can understand going out might be different because of setting on the Sample stage. The Copy stage has a partitioning set to Auto.