Controling File size in a Sequence Loop

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rodre
Premium Member
Premium Member
Posts: 218
Joined: Wed Mar 01, 2006 1:28 pm
Location: Tennessee

Controling File size in a Sequence Loop

Post by rodre »

Due to database load restrictions, I have to load a file of 2Million records in chunks of 5000 records at a time. For my approach I have been using a Sequence loop but the performance is not good. It seems it got to be a better way. Here is my approach:
Using a Sequence Loop, I pick 5000 records via a transformer constraint. The problem is that the input file loads all 2Mil records at time first. This seems such a waist. :shock:
My question is: Is there a way to filter 5000 records at the sequential stage level, :?: or perhaps approach this from a total different angel... :idea:

Any suggestions are much appreciated. :)

Thank you much in advance!!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The sequential file stage in server jobs will read the whole file, you cannot have it read just a portion. If you really need to limit yourself to a given number of rows per execution and you wish minimize passes through your source file then I would suggest you look into the split command to do this for you.

But your real problem is with the database. Usually the limitations are encountered around the commit frequency and size, which you can tune from within the job. If you are not having issues with commit, perhaps you can explain why you are limited to blocks of 5000 records at a time?
reypotxo
Participant
Posts: 15
Joined: Mon Jan 29, 2007 3:53 am

Re: Controling File size in a Sequence Loop

Post by reypotxo »

rodre wrote:Due to database load restrictions, I have to load a file of 2Million records in chunks of 5000 records at a time. For my approach I have been using a Sequence loop but the performance is not good. It seems it got to be a better way. Here is my approach:
Using a Sequence Loop, I pick 5000 records via a transformer constraint. The problem is that the input file loads all 2Mil records at time first. This seems such a waist. :shock:
My question is: Is there a way to filter 5000 records at the sequential stage level, :?: or perhaps approach this from a total different angel... :idea:

Any suggestions are much appreciated. :)

Thank you much in advance!!
I think you can split your file making a routine that would be a loop that calls OS API (like UNIX's echo) with different filenames. So, making 400 iterations you will have it
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Use a link partitioner to split your file into parts... use an apporpriate Spliting logic ( hashed / Round Robin etc..) you may check the Link Partitioner's help.

Since you want to chunk the file into 5000 rows each, use this constraint in the Transformr stage which is reading the fille @INROWNUM>0 and @INROWNUM<=5000 on first output link, @INROWNUM>5000 and @INROWNUM<=10000 on the second , ... @INROWNUM>20000 on each output link .


Else use this partitioning logic within the transformer stage which is reading the file. eg. Say, u want to split the file into 10 equal chunks, then MOD(@INROWNUM,10)=0 , MOD(@INROWNUM,10)=1 ... MOD(@INROWNUM,10)=9 on 10 output links from the Transformer stage.


Hope this helps.
Share to Learn, and Learn to Share.
rodre
Premium Member
Premium Member
Posts: 218
Joined: Wed Mar 01, 2006 1:28 pm
Location: Tennessee

Post by rodre »

ArndW,

The database restriction comes from the DBA's. I don't know the exact reason but I think is because of the traffic going on the database at the time this process needs to run.
Post Reply