erroneous link collector output

deswhk · Post by **deswhk** » Wed Jul 23, 2008 10:01 pm

Hi,

I have the following job.

Oracle1 ---> Xfm1 ---> link collector ---> sequential file
^
|
Oracle2 ---> Xfm2 -------------|

The output from link collector is always from the previous run's output of Xfm1, which is not expected.
Example:-

1) Run 1
Output from Xfm1: 200 rows
Output from LC: 100 rows

2) Run 2
Output from Xfm1: 500 rows
Output from LC: 200 rows

3) Run 3
Output from Xfm1: 300 rows
Output from LC: 500 rows

Inter-process row buffer is enabled with the default 1024Kb and timeout at 20s.

I get this very consistently and I don't know what is causing this behavior. I tried rebuilding the job and deleting the scratch files but it still doesn't work.

Would appreciate help from anyone.
thanks.

regards,
Desmond

ray.wurlod · Post by **ray.wurlod** » Wed Jul 23, 2008 11:10 pm

Not really enough information to go on. What happens inside the Transformer stages? Do you re-compile or reset between runs? How long do you wait between runs?

deswhk · Post by **deswhk** » Wed Jul 23, 2008 11:14 pm

the transformer picks certain fields to be propagated and some simple transformation.
tried to recompile, reset and waited for at least 15 minutes.
I am really running out of clue as to this weird behavior.

ray.wurlod · Post by **ray.wurlod** » Wed Jul 23, 2008 11:18 pm

Can you do it without a Link Collector? Write all rows into a hashed file, with artificial keys generated in the Transformer stages (for example A1, A2, A3, ... and B1, B2, B3, ...) to see if you get the correct results by this means?

deswhk · Post by **deswhk** » Thu Jul 24, 2008 1:03 am

I think I have figured it out why.
The 2 links into the LC complete at different time. One link is faster than the other. When one of the links has completed and the other is still in progress, the output link of LC will complete irregardless of the other link(s) that are still in progress. Therefore, the records from the link that is still in progress will not write into the LC output until the next run. These records are cached in the scratch file in the UVTEMP directory.
My resolution is to write both links into 2 separate files. Then create another job to merge them into 1 file.

Thanks a lot for your help.

By the way, is this considered a bug in the link collector? I feel it should start the output only when all the input links have completed.

ray.wurlod · Post by **ray.wurlod** » Thu Jul 24, 2008 1:57 am

Are you using Round Robin or Sort/Merge as your collection algorithm?

If it's round robin, I believe it can start generating output as soon as it sees a row on any link BUT it should not close just because end-of-data has been processed on one input. That, I think, is a bug.

With sort/merge a similar argument should apply about seeing end-of-data on one link.

deswhk · Post by **deswhk** » Thu Jul 24, 2008 2:02 am

Yup, am using round robin. The output link turns green as soon as one of the input links turn green when the other input links are still in progress.