Records missing in the sequential file

kamesh · Post by **kamesh** » Wed Nov 24, 2010 12:43 am

Hello Everyone,

I am trying to write two sets of data containg 5 records each to the same sequential file in an APPEND mode after the tranformations in a parallel job. But, my sequential file contains only 5 records after the job runs even though log shows that 5 records are consumed by each of the seq file stages. I also tried changing the job to run in single node configuration and changed all the stages execution mode to sequential without any success.
As far as I can understand this may be due to write lock of the sequential file which gets released only on completion of the job.

Please note that this scenario work perfectly in a server job when we use hashed file stage.

Code: Select all

Row generator -> Transformer -> Transformer 
                        |             |
                        v             v
                     Seq File     Seq File

kamesh · Post by **kamesh** » Wed Nov 24, 2010 12:50 am

Ray,
I was able to get the expected result by adding the APT_EXECUTION_MODE=One process and APT_DISABLE_COMBINATION=TRUE but this would take any medium complex job to run forver. Please advice the alternatives.

Sreenivasulu · Post by **Sreenivasulu** » Wed Nov 24, 2010 2:46 am

Hi Kamesh,
I think the job should work like it did in 'server' job when put in 'sequential mode'.
Pls try again and confirm

Regards
Sreeni

ArndW · Post by **ArndW** » Wed Nov 24, 2010 3:25 am

Sequential files are 1 writer - N readers. You cannot have two processes write to the sequential file at the same time. When you do that, the process the closes the sequential file last will overwrite the other processes' information.
This has nothing to do with server or PX technologies but is fundamental to how sequential files operate.

chulett · Post by **chulett** » Wed Nov 24, 2010 6:58 am

Note that the Server job used hashed files for the target, not sequential files, a detail that makes all of the difference in the world. The former is a database table, the latter... not so much. As Arnd noted, sequential media does not support multiple writer processes, regardless of the tool used.

It may seem like it is working, but it really isn't.

kamesh · Post by **kamesh** » Wed Nov 24, 2010 11:11 pm

That clarifies the missing records, thank you Arnd & Chulett!!! But, now the question is what should I use to mimic hash file behavior in parallel jobs provided my requirement remains same? I tried using dataset but that is not allowed as well.

I can think of using server shared container if and only if I don't find any other alternatives.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 24, 2010 11:49 pm

Any database table. Or, as you note, a server shared container writing to a hashed file.

kamesh · Post by **kamesh** » Thu Nov 25, 2010 12:26 am

Thanks Ray! I could get the expected result using the external target stage specific program in the required format (awk '{print}' >> /tmp/datafile).

Would this be an issue?

ray.wurlod · Post by **ray.wurlod** » Thu Nov 25, 2010 12:29 am

Yes, the same issue as with the Sequential File stage - only one writer for each file. Another solution would be to write to separate files then cat them together in an after-job subroutine running ExecSH.

kamesh · Post by **kamesh** » Thu Nov 25, 2010 1:44 am

But I am able to get all the records in the file using the External Target Stage. I think it may be due to the fact that a separate unix process redirects the data to a file in external target stage.

ray.wurlod · Post by **ray.wurlod** » Thu Nov 25, 2010 2:03 am

Or you might just be lucky with the timing.

chulett · Post by **chulett** » Thu Nov 25, 2010 7:55 am

Which is what I meant by "seems" to work. Timing, especially with small volumes, can make it look like it is working but it won't 100% of the time.

tminelgin · Post by **tminelgin** » Fri Dec 03, 2010 9:40 am

Why not funnel each link together and then write them?

kamesh · Post by **kamesh** » Wed Dec 29, 2010 5:34 am

tminelgin wrote:Why not funnel each link together and then write them?

I can't use funnel because the transformer and seq file are put up in a parallel shared container which is used couple of times in a single parallel job.

jwiles · Post by **jwiles** » Sun Jan 02, 2011 1:41 am

kamesh wrote:I can't use funnel because the transformer and seq file are put up in a parallel shared container which is used couple of times in a single parallel job.

Instead of writing to the sequential file within the shared container, send the rows to the container's output, then funnel those outputs from the multiple containers in the job into a single file. A second option would be to write to multiple files and cat them together in an After Job routine.

DSXchange

Records missing in the sequential file

Records missing in the sequential file

Re: Records missing in the sequential file

Re: Records missing in the sequential file

Re: Records missing in the sequential file

Re: Records missing in the sequential file

Re: Records missing in the sequential file