Page 1 of 1

External Source Stage

Posted: Thu Aug 03, 2006 10:02 am
by nvalia
Has anyone worked on the External Source stage?
I need to read a file from a URL path and pick the last record only. Any solution for this?

Regards,
Nirav

Posted: Thu Aug 03, 2006 10:38 am
by ray.wurlod
The External Source stage executes a command (possibly a shell script). That command's stdout becomes the output of the External Source stage. Assuming you have a script that can "read a file from a URL path" (whatever that involves), you can pipe its result through tail -1 to get the last line. If it's a regular pathname, you could apply tail -1 directly to that pathname.

Make sure that the record schema on the External Source stage matches exactly what is produced by the command that you invoke.

Beware, too, that without intervention this stage will operate on every processing node. Set its properties so that its execution mode is sequential, and/or that it executes in a node pool containing only one node. Unless, of course, you want that last line on every partition.

Posted: Wed Nov 08, 2006 11:01 am
by splayer
ray, a question about this statement of yours, "this stage will operate on every processing node". Can you tell me how you came to know this? It is not documented anywhere, at least not in the PDFs. Thanks.