Consuming split files in a job
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 158
- Joined: Tue Mar 15, 2005 3:16 am
Consuming split files in a job
Hi
We have designed a job which has the following structure
SeqFile -> Tfm -> OraBulk
If I want to load more than one file(for ex: 100 files) to the same Database table using the same job, is there any best way to do this?
In Oracle, there is an external table concept with which we can get the data from more than one file.....
Is there any concept like this in DataStage ?
I have the following options:
1. Parametrize the file names in sequential file stage and call this job with the invocation id on.
Create a sequence and call this job using loop and invocation id.
The problem in this is we will be running sequentially these invocation jobs... It would be great if we can run these parallelly.
Any inputs would be great....
Thanks
Sai
We have designed a job which has the following structure
SeqFile -> Tfm -> OraBulk
If I want to load more than one file(for ex: 100 files) to the same Database table using the same job, is there any best way to do this?
In Oracle, there is an external table concept with which we can get the data from more than one file.....
Is there any concept like this in DataStage ?
I have the following options:
1. Parametrize the file names in sequential file stage and call this job with the invocation id on.
Create a sequence and call this job using loop and invocation id.
The problem in this is we will be running sequentially these invocation jobs... It would be great if we can run these parallelly.
Any inputs would be great....
Thanks
Sai
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Investigate the Folder stage.
Investigate using a Filter command in your Sequential File stage that uses cat to spool all the files into the job as if they were one large data stream.
Investigate using a Filter command in your Sequential File stage that uses cat to spool all the files into the job as if they were one large data stream.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 158
- Joined: Tue Mar 15, 2005 3:16 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Me, I'd just cat all the files together then bulk load. Once.
Depending on your skill level, you could build a looping Sequence that doesn't wait for each job so that (eventually) they will all be running in parallel, but then you may end up with locking and/or resource problems.
Depending on your skill level, you could build a looping Sequence that doesn't wait for each job so that (eventually) they will all be running in parallel, but then you may end up with locking and/or resource problems.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 158
- Joined: Tue Mar 15, 2005 3:16 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The cat command will use hardly any resources at all. The files are already on disk. Output from the cat command is not written to disk; it becomes the input to the sequential file stage. If you like the effect is that of
Code: Select all
cat files* | DataStage job
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 158
- Joined: Tue Mar 15, 2005 3:16 am
Thanks ray, Chullet...
Whatever you said is right for cat option...I would see this in practical....
For Folder stage...I went through documentation ..it can have only two outputs, i.e. filename and file content...May be this is ideally suited to XML documents reading... I wanted to know whether folder stage can be used here or not.....
Chullet... You said using named pipes also possible?, if you have any idea can you please share it?
Thanks
Sai
Whatever you said is right for cat option...I would see this in practical....
For Folder stage...I went through documentation ..it can have only two outputs, i.e. filename and file content...May be this is ideally suited to XML documents reading... I wanted to know whether folder stage can be used here or not.....
Chullet... You said using named pipes also possible?, if you have any idea can you please share it?
Thanks
Sai
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Folder stage is not an option when file size is too large. Don't have exact figures on that ready to hand - if, indeed, the limit is documented at all. It has to put the entire contents of the file into a single field.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.