Page 1 of 1

Posted: Tue Aug 01, 2006 4:00 pm
by urshit_1983
Try to see the access rights of the file to which you are writing the output.

change it to

chmod 777 filename

It will be fine.

Thanks

Posted: Tue Aug 01, 2006 5:01 pm
by katz
Hi urshit_1983,

Thanks for your quick reply. I cannot see why I need execute permission on the named pipe file (for owner, group, and all other users) but since I'm shooting in the dark, I can give it a shot.

katz

Posted: Tue Aug 01, 2006 8:51 pm
by kcbland
It seems to me your job designs do nothing special. You should consider merging them all together and turning on inter-process, because if all three jobs are executing simultaneously than that's the same effect.

If you're not executing them simultaneously, then all you're doing is avoiding writing a file, but if your data is big enough do you really think that fifo is holding in memory or has long since paged to disk? Writing physical files won't add load to the job if you use delimited (fixed width adds overhead padding the fields). Since you're already doing an rm, you can just remove the files when done. You're either using temp disk space for paging, or user disk space to hold a file. The point is you'll still be using disk space, except all those little problems like timeouts, etc, go away.

By avoiding using the pipes, you can also run multiple instances of your first two jobs if you setup them up correctly. You can run as many instances as you have CPUs and probably not degrade and scale linearly. You'll vastly beat any improvement using pipes by using multiples of your transform job. If a named pipe cut in half processing time, then 4 job instances will cut into 1/4 on a 4 cpu machine.

Posted: Wed Aug 02, 2006 1:46 am
by ArndW
I would leave out the step that removes the pipes (as a test) and see if the problems persist. Also, did you change the timeout and size settings for the pipes in the jobs (I recommend leaving them unless your row size is huge and only a couple of rows would fit into the buffer space).

Posted: Wed Aug 02, 2006 4:06 am
by ArndW
The timeout settings in the sequential file {pipe} stage should be the default of 60; I would avoid using 0. I don't know if that change will directly affect your job but it should be done.
The unhandled interrupt in ds_seqopen() might have something to do with the timing or having that pipe still attached by some other process.

Posted: Wed Aug 02, 2006 7:30 am
by kcbland
I understand your design now. You are using the pipe functionality in the Sequential stage to act as an IPC stage/column redefiner. If you used a Transformer and did the redefinition there you would not need to do all of the work you are doing. The inter-processing gives you an inherent fifo effect as well.

I avoid pipes in all cases.

Posted: Wed Aug 02, 2006 8:12 am
by ArndW
Ken - I like pipes for reformatting and use then often when necessary. If I have one input row that generates {n} output rows I will use a pipe insted of landing the data to a sequential or hashed file. It is usually stable and always fast.

Posted: Wed Aug 02, 2006 8:29 am
by kcbland
Arnd, within the same job I'll use the link collector if the metadata is consistent, but I understand the reformatting trick. However, it's not apparent from a design metadata perspective what is happening. It's just a difference of style. If we were all the same, life would be boring. :lol:

Posted: Wed Aug 02, 2006 8:37 am
by ArndW
The "you will conform" approach has been tried many times and has failed each time. My test for stringent design documents is that if the design were given to 5 developers and all code produced is essentially identical then you've gone too far with your specifications. What would our work be like without being able to put in our own quirks. Heck, even bartenders have "flair"; why should be be left with only our signature lines as means of expressing ourselves :D