Page 1 of 1

Transaction Size vs Array Size

Posted: Thu Oct 03, 2013 11:39 am
by gayathrisivakumar
Hi,

I would like to know the best values to be kept in Transaction Size and Array size for a datastage update.

I am updating 10 million rows based on one key column. Data input is coming from a sequential file and target is db2 connector.

Also does the option 'keep conductor connections Alive' has some impact on the performance of the job?

Thanks

Posted: Thu Oct 03, 2013 11:58 am
by chulett
Short answer is - there are no "best values".

Posted: Thu Oct 03, 2013 12:29 pm
by chulett
And only because you have "vs" in your subject... are you clear on what the two settings control?

Posted: Thu Oct 03, 2013 1:36 pm
by ray.wurlod
What does "best" mean in your context?

Posted: Sun Oct 06, 2013 11:50 pm
by chandra.shekhar@tcs.com
Use DB2 Connector Partitioning in the target stage for better performance.
And for Transaction and Array size, give them a decent value i.e. not so big and not so small.
Now you have to determine the best combination for both as everybody have said that there's no best value.
It depends upon various factors like Database load, Datastage Server load, Number of jobs running, complexity of the job etc.

When I say that those values should not be big like ~ 1-2 lakhs, this is because if the job fails then all that data which is not committed will be rolled back.
And if the value is too low ~ 100-200, then it can slow down the database.
So try combination like 5000-10000, 10000-20000 for array-transaction size.
Hope it helps

Posted: Mon Oct 07, 2013 2:03 am
by ArndW
I disagree with just setting the two values without actually knowing what they entail, the recommendation above will only work for a certain combination of record size, database connection, database and system sizing.

The "Transaction Size" is the number of records after which a commit will be done. Partly the setting is dependent upon what the goal of a job is. If the volumes are low and your job does an all-or-nothing load, then setting the transaction size to 0 would make sense. If the volume is high and/or the record sizes very large, then doing a single commit might result in the transaction log filling up. Thus you need to know your (a) goals (b) sizing (c) logging policy and use those to decide how big your transactions size should be. Generally the larger this value is, the faster your database load will run.

The "Array Size" determines how many records will be written in one "block" sent to the database. This can be both the network transport layer block size and also the database block size. I recall in the early days of DataStage this setting really made a big difference, but recently I seem to see little difference on this setting. The goal is to fit as many records as possible into a block as possible with little wastage.

If we take a 4Kb block and 1Kb records, using an array size of 1 would result in a transport block being only 25% utilized. Optimal would be an array size of 4 in this.