Page 1 of 2

Records missing in the sequential file

Posted: Wed Nov 24, 2010 12:43 am
by kamesh
Hello Everyone,

I am trying to write two sets of data containg 5 records each to the same sequential file in an APPEND mode after the tranformations in a parallel job. But, my sequential file contains only 5 records after the job runs even though log shows that 5 records are consumed by each of the seq file stages. I also tried changing the job to run in single node configuration and changed all the stages execution mode to sequential without any success.
As far as I can understand this may be due to write lock of the sequential file which gets released only on completion of the job.

Please note that this scenario work perfectly in a server job when we use hashed file stage.

Code: Select all

Row generator -> Transformer -> Transformer 
                        |             |
                        v             v
                     Seq File     Seq File

Re: Records missing in the sequential file

Posted: Wed Nov 24, 2010 12:50 am
by kamesh
Ray,
I was able to get the expected result by adding the APT_EXECUTION_MODE=One process and APT_DISABLE_COMBINATION=TRUE but this would take any medium complex job to run forver. Please advice the alternatives.

Re: Records missing in the sequential file

Posted: Wed Nov 24, 2010 2:46 am
by Sreenivasulu
Hi Kamesh,
I think the job should work like it did in 'server' job when put in 'sequential mode'.
Pls try again and confirm

Regards
Sreeni

Posted: Wed Nov 24, 2010 3:25 am
by ArndW
Sequential files are 1 writer - N readers. You cannot have two processes write to the sequential file at the same time. When you do that, the process the closes the sequential file last will overwrite the other processes' information.
This has nothing to do with server or PX technologies but is fundamental to how sequential files operate.

Posted: Wed Nov 24, 2010 6:58 am
by chulett
Note that the Server job used hashed files for the target, not sequential files, a detail that makes all of the difference in the world. The former is a database table, the latter... not so much. As Arnd noted, sequential media does not support multiple writer processes, regardless of the tool used.

It may seem like it is working, but it really isn't.

Posted: Wed Nov 24, 2010 11:11 pm
by kamesh
That clarifies the missing records, thank you Arnd & Chulett!!! But, now the question is what should I use to mimic hash file behavior in parallel jobs provided my requirement remains same? I tried using dataset but that is not allowed as well.

I can think of using server shared container if and only if I don't find any other alternatives.

Posted: Wed Nov 24, 2010 11:49 pm
by ray.wurlod
Any database table. Or, as you note, a server shared container writing to a hashed file.

Posted: Thu Nov 25, 2010 12:26 am
by kamesh
Thanks Ray! I could get the expected result using the external target stage specific program in the required format (awk '{print}' >> /tmp/datafile).

Would this be an issue?

Posted: Thu Nov 25, 2010 12:29 am
by ray.wurlod
Yes, the same issue as with the Sequential File stage - only one writer for each file. Another solution would be to write to separate files then cat them together in an after-job subroutine running ExecSH.

Posted: Thu Nov 25, 2010 1:44 am
by kamesh
But I am able to get all the records in the file using the External Target Stage. I think it may be due to the fact that a separate unix process redirects the data to a file in external target stage.

Posted: Thu Nov 25, 2010 2:03 am
by ray.wurlod
Or you might just be lucky with the timing.

Posted: Thu Nov 25, 2010 7:55 am
by chulett
Which is what I meant by "seems" to work. Timing, especially with small volumes, can make it look like it is working but it won't 100% of the time.

Re: Records missing in the sequential file

Posted: Fri Dec 03, 2010 9:40 am
by tminelgin
Why not funnel each link together and then write them?

Re: Records missing in the sequential file

Posted: Wed Dec 29, 2010 5:34 am
by kamesh
tminelgin wrote:Why not funnel each link together and then write them?
I can't use funnel because the transformer and seq file are put up in a parallel shared container which is used couple of times in a single parallel job.

Re: Records missing in the sequential file

Posted: Sun Jan 02, 2011 1:41 am
by jwiles
kamesh wrote:I can't use funnel because the transformer and seq file are put up in a parallel shared container which is used couple of times in a single parallel job.
Instead of writing to the sequential file within the shared container, send the rows to the container's output, then funnel those outputs from the multiple containers in the job into a single file. A second option would be to write to multiple files and cat them together in an After Job routine.