Interesting Sample stage problem

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Interesting Sample stage problem

Post by abc123 »

I have 2 jobs.
Job1: dataset1 -> copy -> sample - > dataset2
Job2: dataset1 -> sample - > dataset2

Both jobs run fine. However, in Job1, the number of rows moving from dataset1 to copy shows 100 whereas, in Job2, number of rows moving from dataset1 to sample stage shows 20. The source file for both jobs is the same.

Any ideas about this discrepancy? Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are any rejects being reported in the job log of Job 2?
Do you have reject links to handle these?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
samsuf2002
Premium Member
Premium Member
Posts: 397
Joined: Wed Apr 12, 2006 2:28 pm
Location: Tennesse

Post by samsuf2002 »

how many rows u r getting in target dataset?
hi sam here
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

There are no reject links in either of the jobs. The set up is exactly like I described in OP. I am getting 20 rows in the target in both jobs.
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

This link counts also depends on the stage to which its going...
what is the count it shows when records move from copy stage to sample stage ? I am sure it would be, 20 NOT 100...

I have never observed or noticed such thing in PX, but in server jobs i noticed it many times...

Code: Select all

     hash File (having 500 records)
                      |
                     10
                      |
seq---1000--> transformer ---> o/p
In above design of server job, the hash file had 500 recs but the link count 10 meant number of records from hash file finding a match with the seq file record.
Shantanu Choudhary
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are the sampling characteristics set identically?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

The number of rows going from the Copy To Sample shows 100.
Here are the Sample stage properties:

Max Rows (Per Partition) = 10
Period (Per Partition) = 1
Sample Mode = Period
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Max rows per partition = 10 means 20 rows on two partitions.

It's your design - specifically this property - that is limiting the number of rows output from the stage.

You have taken a sample!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tagnihotri
Participant
Posts: 83
Joined: Sat Oct 28, 2006 6:25 am

Post by tagnihotri »

Is the partitioning right ?
ray.wurlod wrote:Max rows per partition = 10 means 20 rows on two partitions.

It's your design - specifically this property - that is limiting the number of rows output from the stage.

You have taken a sample! ...
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

ray, thank you for your response. I still don't understand why the number of rows coming in to the sample stage varies. I can understand going out might be different because of setting on the Sample stage. The Copy stage has a partitioning set to Auto.
Post Reply