Writing to the same file in the same job

abc123 · Post by **abc123** » Thu May 03, 2007 7:58 am

I would like to append lines to the same sequential file from 2 different stages in the same job. In my current job, the second stage does not write eventhough I have set the update mode append. If the first stage writes 5 lines, I would like the second one to append, for example, another 3 lines so the file will end up with 8. I already know of 2 techniques:

1)Write to 2 files and create another job to union the files.
2)Do a Unix command to do it as an after-job ExecSH routine.

I was wondering if it is possible to do it in Datastage in one job.

DSguru2B · Post by **DSguru2B** » Thu May 03, 2007 8:03 am

Multiple writes to the same file is not allowed by the OS. You will get misaligned data.
If this were a server job and you had two links going in then, theoratically, there will be roundrobin process of sending records and in both the sequential file stages the option should be append. I guess you can try the same in px job by running on a single node.

chulett · Post by **chulett** » Thu May 03, 2007 8:56 am

DSguru2B wrote:Multiple writes to the same file is not allowed by the OS. You will get misaligned data.
If this were a server job and you had two links going in then, theoratically, there will be roundrobin process of sending records and in both the sequential file stages the option should be append. I guess you can try the same in px job by running on a single node.

You've just contradicted yourself. As noted, multiple writers are not supported - period. This is the nature of sequential media - no Server or Parallel job or OS process can break that rule. It just The Way It Works. There is no 'round robin' process.

The only way one job could write to a file twice is to ensure the first process completes in its entirety before the second process ever starts. Then the second can append data to the end of the first process's work. Otherwise, write to two files and concatenate post job.

DSguru2B · Post by **DSguru2B** » Thu May 03, 2007 9:22 am

I know I contradicted myself as the second thought came up.
When you have two links going out to the same file and you have links ordered, then a single row will go through the first link first and then to the second link (round robin). This way each record will be appended to a file, one at a time. It will be two seperate operations for the OS, but for the naked eye, a single process.
I have not tried it but theoratically it should work as they are considered two seperate processes by the OS.

chulett · Post by **chulett** » Thu May 03, 2007 9:32 am

DSguru2B wrote:When you have two links going out to the same file and you have links ordered, then a single row will go through the first link first and then to the second link (round robin). This way each record will be appended to a file, one at a time. It will be two seperate operations for the OS, but for the naked eye, a single process.
I have not tried it but theoratically it should work as they are considered two seperate processes by the OS.

No.

DSguru2B · Post by **DSguru2B** » Thu May 03, 2007 9:33 am

Guess what, you are right, funny ideas keep popping in my head

crouse · Post by **crouse** » Thu May 03, 2007 10:09 am

If this were a server job, just use the link collector stage to let 1 or more transformer stages write to the same seq file.

What about a funnel stage in PX (my PX naivete may be showing here)

mctny · Post by **mctny** » Thu May 03, 2007 10:55 am

crouse wrote:If this were a server job, just use the link collector stage to let 1 or more transformer stages write to the same seq file.
...

but this will not guarantee that the second link will be an append, i.e., the records from two links will be mixed. I guess same case will occur with the funnel stage in PX

ady · Post by **ady** » Thu May 03, 2007 12:49 pm

Wouldnt a job write to the same sequential file twice, if there is a delay between the two write operations ?

If the file is not written to in the same active stage ?

mctny · Post by **mctny** » Thu May 03, 2007 1:06 pm

ady wrote:Wouldnt a job write to the same sequential file twice, if there is a delay between the two write operations ?

If the file is not written to in the same active stage ?

As Craig said, the update/read/write operations to a sequential files are governed by the OS. if you open a sequential file for read/write /update then you cannot reopen it unless you closed it no matter what application you are using whether it be DataStage or a programming language.

ady · Post by **ady** » Thu May 03, 2007 1:21 pm

I have a server job , the design is

Transformer------seqfile----------Transformer---------Seqfile(the same file)

It works fine, Is it not possible in parallel only ?

chulett · Post by **chulett** » Thu May 03, 2007 1:26 pm

ady wrote:Wouldnt a job write to the same sequential file twice, if there is a delay between the two write operations ?

If the file is not written to in the same active stage ?

Yes, as I explained earlier, the two writer processes must run in a serial fashion, one after the other. Once the first completes and closes the file, the second can open, seek to the end and append.

Multiple readers, single writers.

Your job design is exactly what I meant.

ady · Post by **ady** » Thu May 03, 2007 1:37 pm

Yup ...Yup ..... got it now..... was confused earlier, because I have a few job which depend on that design

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Thu May 03, 2007 3:57 pm

Use a Funnel Stage prior to your output file. On Stage Properties select "Funnel Type = Sequence".

Per the help text: Sequence copies all records from the first input data set to the output data set, then all the records from the second input data set, etc.

This would allow you to "drain" the first input, then append the second input.

I believe that is what you wanted, correct?