Page 1 of 1

Controling File size in a Sequence Loop

Posted: Thu Jul 26, 2007 3:20 pm
by rodre
Due to database load restrictions, I have to load a file of 2Million records in chunks of 5000 records at a time. For my approach I have been using a Sequence loop but the performance is not good. It seems it got to be a better way. Here is my approach:
Using a Sequence Loop, I pick 5000 records via a transformer constraint. The problem is that the input file loads all 2Mil records at time first. This seems such a waist. :shock:
My question is: Is there a way to filter 5000 records at the sequential stage level, :?: or perhaps approach this from a total different angel... :idea:

Any suggestions are much appreciated. :)

Thank you much in advance!!

Posted: Thu Jul 26, 2007 5:30 pm
by ArndW
The sequential file stage in server jobs will read the whole file, you cannot have it read just a portion. If you really need to limit yourself to a given number of rows per execution and you wish minimize passes through your source file then I would suggest you look into the split command to do this for you.

But your real problem is with the database. Usually the limitations are encountered around the commit frequency and size, which you can tune from within the job. If you are not having issues with commit, perhaps you can explain why you are limited to blocks of 5000 records at a time?

Re: Controling File size in a Sequence Loop

Posted: Fri Jul 27, 2007 5:11 am
by reypotxo
rodre wrote:Due to database load restrictions, I have to load a file of 2Million records in chunks of 5000 records at a time. For my approach I have been using a Sequence loop but the performance is not good. It seems it got to be a better way. Here is my approach:
Using a Sequence Loop, I pick 5000 records via a transformer constraint. The problem is that the input file loads all 2Mil records at time first. This seems such a waist. :shock:
My question is: Is there a way to filter 5000 records at the sequential stage level, :?: or perhaps approach this from a total different angel... :idea:

Any suggestions are much appreciated. :)

Thank you much in advance!!
I think you can split your file making a routine that would be a loop that calls OS API (like UNIX's echo) with different filenames. So, making 400 iterations you will have it

Posted: Fri Jul 27, 2007 7:50 am
by asitagrawal
Use a link partitioner to split your file into parts... use an apporpriate Spliting logic ( hashed / Round Robin etc..) you may check the Link Partitioner's help.

Since you want to chunk the file into 5000 rows each, use this constraint in the Transformr stage which is reading the fille @INROWNUM>0 and @INROWNUM<=5000 on first output link, @INROWNUM>5000 and @INROWNUM<=10000 on the second , ... @INROWNUM>20000 on each output link .


Else use this partitioning logic within the transformer stage which is reading the file. eg. Say, u want to split the file into 10 equal chunks, then MOD(@INROWNUM,10)=0 , MOD(@INROWNUM,10)=1 ... MOD(@INROWNUM,10)=9 on 10 output links from the Transformer stage.


Hope this helps.

Posted: Fri Jul 27, 2007 8:13 am
by rodre
ArndW,

The database restriction comes from the DBA's. I don't know the exact reason but I think is because of the traffic going on the database at the time this process needs to run.