Page 1 of 2

ArraySize and Transaction Size

Posted: Thu Aug 31, 2006 4:49 pm
by vsi
Hi,

I am loading data into Oracle Table every night. The job uses DRS Stage.

daily maximum records can be 50,000

what values i need to give, for ArraySize and Transaction Size?

Thanks
vsi

Posted: Thu Aug 31, 2006 4:59 pm
by chulett
There's no rule for how they 'need' to be set, it's up to you and how you want them set. Array Size controls how many records at a time are written to the database and Transaction Size controls how many records are commited at a time. IMO, TS should be a multiple of AS but other than that...

Posted: Thu Aug 31, 2006 5:11 pm
by vsi
If you were to use ,
what you will use for AS, and Transaction Size?

Thanks
vsi

Posted: Thu Aug 31, 2006 5:22 pm
by kris007
The answer to that depends on how much data(number of rows and average record length) you are loading into the table and how much speed will want to achieve in your job and how much time your job takes to finish with different values of those. Usually its a trial and error method(atleast initially) until you are satisfied with the speed you achieve. IMO

Posted: Thu Aug 31, 2006 5:28 pm
by chulett
There are way too many variables for anyone to just give you numbers. For array size, play around, run some tests. There is a point of diminishing returns for it... processing speed will increase and then eventually slow back down again as that number goes up. You need to find that sweet spot for each particular data set.

Transaction Size just controls commits. Use 0 for an 'all or nothing' load if you can. Otherwise recognize the fact that a failure part way through a job will complicate the recovery / restart process if you've been doing intermediate commits every X records. Do you need to delete previous work before the restart? Have it start where it left off?

Posted: Thu Aug 31, 2006 6:09 pm
by dsdoubt
By default Array size is 1.
May I know, for what scenario we need to set it as it is.
And what scenario we can increase?
Is it the memory bandwidth of datastage, which limits the performance at a point when we increase the Array size to a level?

Posted: Thu Aug 31, 2006 6:32 pm
by ray.wurlod
(Array size * row size) should be close to, but not over, network packet size (or a multiple thereof) for optimum transmission.

(Rows per commit) - or transaction size - is mediated by a number of factors, but should always be a multiple of array size.

Array size of 1 with rows per commit of 1 guarantees accurate error reporting, at the cost of some diminution in throughput.

Too large a transaction size may result in fatal failure in the database.

Posted: Thu Aug 31, 2006 7:54 pm
by dsdoubt
Though the Array size totally depend upon their environment, may I know the maximum value been used by your guys. So that I can visuvalize the maximum possible limits for more efficiency.

Posted: Thu Aug 31, 2006 8:17 pm
by ray.wurlod
1024 / rowsize

Posted: Thu Aug 31, 2006 11:29 pm
by dsdoubt
Thanks Ray,
So is that ment that, datastage will read and process 1024 rows from Database parallely.

Posted: Fri Sep 01, 2006 7:17 am
by chulett
No, not 1024 and not in parallel. He meant he divides 1024 by the row size and uses that number. So, for a 50 byte row size, the Array Size would be set to 20. Which means 20 rows would be sent across the network at a time where they would be processed in a serial fashion there.

Posted: Fri Sep 01, 2006 7:46 am
by ray.wurlod
And the 1024 is a typical network packet size.

Posted: Tue Nov 28, 2006 10:18 pm
by nkln@you
chulett wrote:There's no rule for how they 'need' to be set, it's up to you and how you want them set. Array Size controls how many records at a time are written to the database and Transaction Size controls ...
I am not a premium member and so unable to see your full reply. Can you Please post that again so that i can see it. And i used to think that Array size is the no. of records that DS will fetch from a DB and Transaction Size is the no. of records after which DS will send a commit command to DB. Please correct me if i am wrong.
Also going through the below posts, i got the way to calculate the array size for optimum speed. Is there any rule for Transaction size too? If yes Please post it.

Posted: Tue Nov 28, 2006 11:47 pm
by chulett
nkln@you wrote:I am not a premium member and so unable to see your full reply. Can you Please post that again so that i can see it.
No, I'm afraid not. Sign up! Pennies a day, ya know.
nkln@you wrote:And i used to think that Array size is the no. of records that DS will fetch from a DB and Transaction Size is the no. of records after which DS will send a commit command to DB. Please correct me if i am wrong.
Why 'used to think'? That's basically correct.
nkln@you wrote:Also going through the below posts, i got the way to calculate the array size for optimum speed. Is there any rule for Transaction size too? If yes Please post it.
There's no rule, per se. Zero is 'all or nothing', any other value is the number of rows to commit and could vary based on input volumes, rollback / redo sizing, all kinds of things.

The 'rule' to keep in mind is that Transaction Size should be an even multiple of whatever Array Size you use... unless you use zero, of course. :wink:

Posted: Wed Nov 29, 2006 12:58 am
by nkln@you
Regarding the array size, it was calculated based on network packet size. But what is the DS and DB resides on the same machine? What will be the way to calculate optimum array size for this?