Using named pipes

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gpbarsky
Participant
Posts: 160
Joined: Tue May 06, 2003 8:20 pm
Location: Argentina

Using named pipes

Post by gpbarsky »

Hi.

In the seq. file stage, there is a check box called "Stage uses named pipes".

What does this check really mean ? Is the behavior of the file like a queue with FIFO and LIFO options ?

How does it work ?

I need to know this because in collaborative process this is very important. And I didn't understand the explanation of the manual.

Thanks in advance.


Guillermo P. Barsky
Buenos Aires - Argentina
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Guillermo

A pipe is an old UNIX trick. It allows multiple processes to be input for another process to read. Mostly it is used for real time processing. If you have a web page which needs to update a database then you can append to a flat file or output to a named pipe. A DataStage job reads this named pipe and processes the data. There is some overhead to named pipes. They can hang. They can be broken. They have to be created before your web page starts writing to them. The listener needs to be running before any data gets written to the pipe.

Lets say you are a security firm like A.G. Edwards or J.P. Morgan where one transaction maybe a million dollars. I would not want to bet my job on all these pieces working together. Kind of an ugly solution.

Kim.


Kim Duke
DsWebMon - Monitor DataStage over the web
www.Duke-Consulting.com
gpbarsky
Participant
Posts: 160
Joined: Tue May 06, 2003 8:20 pm
Location: Argentina

Post by gpbarsky »

Kim:

Thank you.

As far as I understood, a named pipe is like a seq. file: one process writing the file, and other process reading it. What is the difference ?

Is a named pipe like a queue ? And how do you use it ? And how would you use a seq. file ?


Guillermo P. Barsky
Buenos Aires - Argentina
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

A pipe is a little bit like a queue, but reading from the queue is destructive. It is used to connect one/multiple process to a downstream process. Your upstream processes execute independently from the downstream process. The downstream process will do "eager" reads from the pipe, meaning it is waiting to process data the moment it arrives. If there is no data, it will sit there patiently until timeout.

It can also be used to separate a single process into multiple processes so that, if you have irregular streaming of source data into the pipe, you will experience a leveling effect as the downstream processing will not impact upstream processing. This can be demonstrated by having a job like OCI -> XFM -> OCI. If the source OCI stage is irregularly spewing data to the transformer, the OCI load could be waiting. But, during OCI loading, there could be periods of delay that ripple upstream and prevent data spewage from the source. If you inserted an XFM with a pipe style capability, you would see this elastic effect muted by the buffering effect. You separate the tightly coupled processing so that the independent elements can run at their highest efficiency.

You'll see that DS Server 6.0 has introduced improvements and stages that have this functionality.

The downside of using pipes instead of physical files is that you have no image after processing. For example, if you write your output to a pipe, and then load that pipe into a table, you will lose restart capability because of the destructive read. You also lose restart from milestones in your job stream, because you have no physical files and the pipes may have been partially exhaused. You also have to deal with timeouts and such, as well as the creation and deletion overhead administration logic.



Kenneth Bland
gpbarsky
Participant
Posts: 160
Joined: Tue May 06, 2003 8:20 pm
Location: Argentina

Post by gpbarsky »

Ken:

Thanks for your answer.

I already worked with queues in AS/400, with a FIFO (First In First Out) implementation. When I wanted to read from a queue, I had a parameter to say the time to wait; if this parameter is -1, then the program will wait doing nothing until any data appears in the queue.

Once the data apperas in the queue, it is read, and it is removed from the queue.

If I understood fine, this is the working of the pipes.

The only question, is how can I do to wait an unlimited amount of time, until any data appears. Is this possible in the DataStage ?

Thanks again.


Guillermo P. Barsky
Buenos Aires - Argentina
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Guillermo,

It is reported that DataStage 7 will allow you to build a job that will sit and run and process data as it comes in. If I understood it correctly, this is implemented as a "web service".

I am anxious to see DS v7 to see other new features.

Tony
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Guillermo

You understood correctly. Most developers implement in a FIFO manner. I have never seen otherwise. In UNIX a pipe is create with a mknod command. I am not sure in Windows how to create a pipe. In UNIX if you do a ls -l you will see a device type of "p". I am pretty sure the wait time is set when you make it. It has been a long time since I needed a pipe.

Kim.



Kim Duke
DsWebMon - Monitor DataStage over the web
www.Duke-Consulting.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In fact, the command on some operating systems to create a named pipe is mkfifo.
On others it's mknod, which isn't nearly as self-explanatory - but that maintains the UNIX mystique!

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
gpbarsky
Participant
Posts: 160
Joined: Tue May 06, 2003 8:20 pm
Location: Argentina

Post by gpbarsky »

Ray, Ken, Kim:

And the million question is: what is the implementation of "pipes" in Windows 2000, and DataStage 5.2 ?

Thanks in advance.


Guillermo P. Barsky
Buenos Aires - Argentina
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Guillermo

I have never used pipes in Windows.

Ray, there are a lot of commands like mkfifo which Sun and other vendors give out that do a mknod for you. The UNIX command mkfs (make filesystem) does a mknod. A lot of times these have the same inode at the UNIX level which means they are the same exact program. The program figures out which name it is called with and functions accordingly.

Kim.

Kim Duke
DsWebMon - Monitor DataStage over the web
www.Duke-Consulting.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ah, you've got me there! I know that it can be done, but don't have how in my head. As is usual on weekends I am sitting in an aiport lounge awaiting a flight to my next destination, so don't have access to my archives.
Might be worth asking Ascential support!

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

I would think that since a pipe is nothing more than an inter process communication channel in Unix that Windows does not have the faintest notion of what a Unix pipe is. The option is available in Windows, but I don't believe that the behind the scenes code is similar in the slightest.

Kim, you are correct. Creating a pipe on various Unix systems ends up calling the same program - pipe.c. This program is part of every type of Unix and does the same thing. If Sun chooses to wrap this in mkfifo, mknod etc... it does not matter since the underlying concept and code has not changed since the first versions of Unix.

When a pipe is created all that really happens is the kernel allocates an i-node for the pipe and returns two file descriptors - one for reading and one for writing. The important point to remember is that when a writer attempts a write to a full pipe the writer will suspend and when a reader tries to read from an empty pipe the reader will suspend.

Regards,

Michael Hester
ariear
Participant
Posts: 237
Joined: Thu Dec 26, 2002 2:19 pm

Post by ariear »

Guillermo,

You can use named pipes on W2K without any command-line like unix (mknod,mkfifo) to create the pipe. It will be created automatically by DataStage probably using the "HANDLE CreateNamedPipe (..... " function. The pipe name should be just a name because windows names it's pipes like \.\\PIPE\Your name. (By the way in this manner i'm using /dev/null in windows \.\\NUL with the append option in sequential stages- it works !)
You must be aware that :
1. If you want to use it inside a job - versions 6 & 7 gives you row buffering/IPC which has almost the same functionalities.
2. If you're on DS 5 be aware that you won't be able to use debug/validate on jobs that uses NP (If your breakpoint is before the reader part of the pipe - you're stuck !)
3. When implementing NP between jobs it works fine


ArieAR
Post Reply