erroneous link collector output

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

erroneous link collector output

Post by deswhk »

Hi,

I have the following job.


Oracle1 ---> Xfm1 ---> link collector ---> sequential file
^
|
Oracle2 ---> Xfm2 -------------|


The output from link collector is always from the previous run's output of Xfm1, which is not expected.
Example:-

1) Run 1
Output from Xfm1: 200 rows
Output from LC: 100 rows

2) Run 2
Output from Xfm1: 500 rows
Output from LC: 200 rows

3) Run 3
Output from Xfm1: 300 rows
Output from LC: 500 rows

Inter-process row buffer is enabled with the default 1024Kb and timeout at 20s.

I get this very consistently and I don't know what is causing this behavior. I tried rebuilding the job and deleting the scratch files but it still doesn't work.

Would appreciate help from anyone.
thanks.


regards,
Desmond
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not really enough information to go on. What happens inside the Transformer stages? Do you re-compile or reset between runs? How long do you wait between runs?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

Post by deswhk »

the transformer picks certain fields to be propagated and some simple transformation.
tried to recompile, reset and waited for at least 15 minutes.
I am really running out of clue as to this weird behavior.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you do it without a Link Collector? Write all rows into a hashed file, with artificial keys generated in the Transformer stages (for example A1, A2, A3, ... and B1, B2, B3, ...) to see if you get the correct results by this means?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

Post by deswhk »

I think I have figured it out why.
The 2 links into the LC complete at different time. One link is faster than the other. When one of the links has completed and the other is still in progress, the output link of LC will complete irregardless of the other link(s) that are still in progress. Therefore, the records from the link that is still in progress will not write into the LC output until the next run. These records are cached in the scratch file in the UVTEMP directory.
My resolution is to write both links into 2 separate files. Then create another job to merge them into 1 file.

Thanks a lot for your help. :)

By the way, is this considered a bug in the link collector? I feel it should start the output only when all the input links have completed.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you using Round Robin or Sort/Merge as your collection algorithm?

If it's round robin, I believe it can start generating output as soon as it sees a row on any link BUT it should not close just because end-of-data has been processed on one input. That, I think, is a bug.

With sort/merge a similar argument should apply about seeing end-of-data on one link.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
deswhk
Participant
Posts: 29
Joined: Mon Sep 03, 2007 7:45 pm

Post by deswhk »

Yup, am using round robin. The output link turns green as soon as one of the input links turn green when the other input links are still in progress.
Post Reply