Records missing in the sequential file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kamesh
Participant
Posts: 72
Joined: Tue May 27, 2003 1:47 am

Records missing in the sequential file

Post by kamesh »

Hello Everyone,

I am trying to write two sets of data containg 5 records each to the same sequential file in an APPEND mode after the tranformations in a parallel job. But, my sequential file contains only 5 records after the job runs even though log shows that 5 records are consumed by each of the seq file stages. I also tried changing the job to run in single node configuration and changed all the stages execution mode to sequential without any success.
As far as I can understand this may be due to write lock of the sequential file which gets released only on completion of the job.

Please note that this scenario work perfectly in a server job when we use hashed file stage.

Code: Select all

Row generator -> Transformer -> Transformer 
                        |             |
                        v             v
                     Seq File     Seq File
kamesh
Participant
Posts: 72
Joined: Tue May 27, 2003 1:47 am

Re: Records missing in the sequential file

Post by kamesh »

Ray,
I was able to get the expected result by adding the APT_EXECUTION_MODE=One process and APT_DISABLE_COMBINATION=TRUE but this would take any medium complex job to run forver. Please advice the alternatives.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Re: Records missing in the sequential file

Post by Sreenivasulu »

Hi Kamesh,
I think the job should work like it did in 'server' job when put in 'sequential mode'.
Pls try again and confirm

Regards
Sreeni
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sequential files are 1 writer - N readers. You cannot have two processes write to the sequential file at the same time. When you do that, the process the closes the sequential file last will overwrite the other processes' information.
This has nothing to do with server or PX technologies but is fundamental to how sequential files operate.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Note that the Server job used hashed files for the target, not sequential files, a detail that makes all of the difference in the world. The former is a database table, the latter... not so much. As Arnd noted, sequential media does not support multiple writer processes, regardless of the tool used.

It may seem like it is working, but it really isn't.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kamesh
Participant
Posts: 72
Joined: Tue May 27, 2003 1:47 am

Post by kamesh »

That clarifies the missing records, thank you Arnd & Chulett!!! But, now the question is what should I use to mimic hash file behavior in parallel jobs provided my requirement remains same? I tried using dataset but that is not allowed as well.

I can think of using server shared container if and only if I don't find any other alternatives.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Any database table. Or, as you note, a server shared container writing to a hashed file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kamesh
Participant
Posts: 72
Joined: Tue May 27, 2003 1:47 am

Post by kamesh »

Thanks Ray! I could get the expected result using the external target stage specific program in the required format (awk '{print}' >> /tmp/datafile).

Would this be an issue?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, the same issue as with the Sequential File stage - only one writer for each file. Another solution would be to write to separate files then cat them together in an after-job subroutine running ExecSH.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kamesh
Participant
Posts: 72
Joined: Tue May 27, 2003 1:47 am

Post by kamesh »

But I am able to get all the records in the file using the External Target Stage. I think it may be due to the fact that a separate unix process redirects the data to a file in external target stage.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Or you might just be lucky with the timing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Which is what I meant by "seems" to work. Timing, especially with small volumes, can make it look like it is working but it won't 100% of the time.
-craig

"You can never have too many knives" -- Logan Nine Fingers
tminelgin
Premium Member
Premium Member
Posts: 13
Joined: Tue Oct 19, 2010 12:09 pm

Re: Records missing in the sequential file

Post by tminelgin »

Why not funnel each link together and then write them?
kamesh
Participant
Posts: 72
Joined: Tue May 27, 2003 1:47 am

Re: Records missing in the sequential file

Post by kamesh »

tminelgin wrote:Why not funnel each link together and then write them?
I can't use funnel because the transformer and seq file are put up in a parallel shared container which is used couple of times in a single parallel job.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Re: Records missing in the sequential file

Post by jwiles »

kamesh wrote:I can't use funnel because the transformer and seq file are put up in a parallel shared container which is used couple of times in a single parallel job.
Instead of writing to the sequential file within the shared container, send the rows to the container's output, then funnel those outputs from the multiple containers in the job into a single file. A second option would be to write to multiple files and cat them together in an After Job routine.
Post Reply