Consuming split files in a job

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
saikrishna
Participant
Posts: 158
Joined: Tue Mar 15, 2005 3:16 am

Consuming split files in a job

Post by saikrishna »

Hi

We have designed a job which has the following structure

SeqFile -> Tfm -> OraBulk

If I want to load more than one file(for ex: 100 files) to the same Database table using the same job, is there any best way to do this?

In Oracle, there is an external table concept with which we can get the data from more than one file.....

Is there any concept like this in DataStage ?

I have the following options:
1. Parametrize the file names in sequential file stage and call this job with the invocation id on.
Create a sequence and call this job using loop and invocation id.
The problem in this is we will be running sequentially these invocation jobs... It would be great if we can run these parallelly.


Any inputs would be great....

Thanks
Sai
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Investigate the Folder stage.

Investigate using a Filter command in your Sequential File stage that uses cat to spool all the files into the job as if they were one large data stream.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saikrishna
Participant
Posts: 158
Joined: Tue Mar 15, 2005 3:16 am

Post by saikrishna »

I would like to use the first approach u hv told... i.e. Folder stage..

How do we pass the output of folder stage, (i.e list of files in the folder) in the "column name" of folder stage to the "file name" in the sequential file stage?


Thanks
Sai
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You don't. Read the chapter in the manual about the Folder stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Me, I'd just cat all the files together then bulk load. Once. :wink:

Depending on your skill level, you could build a looping Sequence that doesn't wait for each job so that (eventually) they will all be running in parallel, but then you may end up with locking and/or resource problems.
-craig

"You can never have too many knives" -- Logan Nine Fingers
saikrishna
Participant
Posts: 158
Joined: Tue Mar 15, 2005 3:16 am

Post by saikrishna »

Hi Chullet, Ray

I was not selected cat filter because the size of each file is huge ... The cat operation will take lot of resources...

Thanks
Sai
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I doubt that unless 'resources' means 'disk space'... have you actually tried it? Ray's way does this in a virtual fashion, so no 'extra' resources there. You could also consider a named pipe...
-craig

"You can never have too many knives" -- Logan Nine Fingers
saikrishna
Participant
Posts: 158
Joined: Tue Mar 15, 2005 3:16 am

Post by saikrishna »

why not folder stage??
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Why not Folder stage what? Have you read up on how it works? It was built for XML and not really appropriate here. IMHO.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The cat command will use hardly any resources at all. The files are already on disk. Output from the cat command is not written to disk; it becomes the input to the sequential file stage. If you like the effect is that of

Code: Select all

cat files* | DataStage job
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saikrishna
Participant
Posts: 158
Joined: Tue Mar 15, 2005 3:16 am

Post by saikrishna »

Thanks ray, Chullet...

Whatever you said is right for cat option...I would see this in practical....

For Folder stage...I went through documentation ..it can have only two outputs, i.e. filename and file content...May be this is ideally suited to XML documents reading... I wanted to know whether folder stage can be used here or not.....


Chullet... You said using named pipes also possible?, if you have any idea can you please share it?


Thanks
Sai
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Folder stage is not an option when file size is too large. Don't have exact figures on that ready to hand - if, indeed, the limit is documented at all. It has to put the entire contents of the file into a single field.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You can build a process to feed the files to a named pipe and then the Sequential stage supports reading from a pipe. Click on Help in the stage to read about the 'Stage uses named pipes' option.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply