Discrepency in the link count

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Discrepency in the link count

Post by ThilSe »

Hi all,

I have a job as follows:

Code: Select all

                                 SeqFileStg1 (100 rows)
                                      | 
                                      |
                                 CopyStg
                                      |
                                      | LinkName1 (100 rows)
                                      |
SeqFileStg2  -------->   LookupStg -------->     SeqFileStg
I am getting the 'Detail' job report using the "dsjob -report " command.
I am getting the count as follows:

Stage: CopyStg
Link: LinkName1, 100 rows
:
:
Stage: LookupStg
Link: LinkName1, 90 rows


The value of link count for the link "LinkName1" is differently displayed in different stages. i.e. Count(CopyStg.LinkName1) not equal to Count(LookupStg.LinkName1) though both refer to the same link.

Can someone explain why this discrepency is occuring?

Thanks and regards
Senthil
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

How many records are being fetched from the source seq file and how many are written to the target
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

The input file has 120 records and all records have matching records in the reference and 120 records are written to the output.

The counts for the these links are correct.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You need to understand how the Lookup stage works, possibly by inspecting the score (specify APT_DUMP_SCORE as True). A Lookup stage generates a composite operator; it performs two operations.

The first is to load the reference source into a virtual Data Set (120 rows in your case, loaded from the Copy stage - although this may be optimized out if it does nothing).

The second is to perform the actual lookups. In your job, it appears that only 90 distinct keys were looked up. Hence only 90 rows proceeded from the virtual Data Set into the Lookup stage.

Your assertion that the two references you gave are the same link is only true as a cursory, high-level viewpoint. There is a virtual Data Set associated with each link; the score will show this much more clearly. In some cases, such as this one, it is possible that fewer rows are consumed from the virtual Data Set than are produced into it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
devidotcom
Participant
Posts: 247
Joined: Thu Apr 27, 2006 6:38 am
Location: Hyderabad

Post by devidotcom »

I have a simliar issue...

Dataset
|
| (3 records) but showing 10 records
|
Dataset-------------> Lookup--------------------------------> sequential file
10 records 10 records
(showing 10 records) (showing 10 records)
jenny_wang
Participant
Posts: 26
Joined: Mon Nov 19, 2007 2:55 am
Location: Hangzhou

Post by jenny_wang »

check the constrain in lookup stage, try to select continue
devidotcom
Participant
Posts: 247
Joined: Thu Apr 27, 2006 6:38 am
Location: Hyderabad

Post by devidotcom »

Yes the lookup has the continue option selected
Post Reply