Discrepency in the link count

ThilSe · Post by **ThilSe** » Sun Nov 26, 2006 11:02 pm

Hi all,

I have a job as follows:

                                 SeqFileStg1 (100 rows)
                                      | 
                                      |
                                 CopyStg
                                      |
                                      | LinkName1 (100 rows)
                                      |
SeqFileStg2  -------->   LookupStg -------->     SeqFileStg

I am getting the 'Detail' job report using the "dsjob -report " command.
I am getting the count as follows:

Stage: CopyStg
Link: LinkName1, 100 rows
:
:
Stage: LookupStg
Link: LinkName1, 90 rows

The value of link count for the link "LinkName1" is differently displayed in different stages. i.e. Count(CopyStg.LinkName1) not equal to Count(LookupStg.LinkName1) though both refer to the same link.

Can someone explain why this discrepency is occuring?

Thanks and regards
Senthil

aakashahuja · Post by **aakashahuja** » Mon Nov 27, 2006 2:08 am

How many records are being fetched from the source seq file and how many are written to the target

ThilSe · Post by **ThilSe** » Mon Nov 27, 2006 2:24 am

The input file has 120 records and all records have matching records in the reference and 120 records are written to the output.

The counts for the these links are correct.

ray.wurlod · Post by **ray.wurlod** » Mon Nov 27, 2006 2:02 pm

You need to understand how the Lookup stage works, possibly by inspecting the score (specify APT_DUMP_SCORE as True). A Lookup stage generates a composite operator; it performs two operations.

The first is to load the reference source into a virtual Data Set (120 rows in your case, loaded from the Copy stage - although this may be optimized out if it does nothing).

The second is to perform the actual lookups. In your job, it appears that only 90 distinct keys were looked up. Hence only 90 rows proceeded from the virtual Data Set into the Lookup stage.

Your assertion that the two references you gave are the same link is only true as a cursory, high-level viewpoint. There is a virtual Data Set associated with each link; the score will show this much more clearly. In some cases, such as this one, it is possible that fewer rows are consumed from the virtual Data Set than are produced into it.

devidotcom · Post by **devidotcom** » Wed Nov 28, 2007 2:19 am

I have a simliar issue...

Dataset
|
| (3 records) but showing 10 records
|
Dataset-------------> Lookup--------------------------------> sequential file
10 records 10 records
(showing 10 records) (showing 10 records)

jenny_wang · Post by **jenny_wang** » Wed Nov 28, 2007 2:41 am

check the constrain in lookup stage, try to select continue

devidotcom · Post by **devidotcom** » Wed Nov 28, 2007 3:12 am

Yes the lookup has the continue option selected