ArraySize and Transaction Size

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

ArraySize and Transaction Size

Post by vsi »

Hi,

I am loading data into Oracle Table every night. The job uses DRS Stage.

daily maximum records can be 50,000

what values i need to give, for ArraySize and Transaction Size?

Thanks
vsi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no rule for how they 'need' to be set, it's up to you and how you want them set. Array Size controls how many records at a time are written to the database and Transaction Size controls how many records are commited at a time. IMO, TS should be a multiple of AS but other than that...
-craig

"You can never have too many knives" -- Logan Nine Fingers
vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

Post by vsi »

If you were to use ,
what you will use for AS, and Transaction Size?

Thanks
vsi
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

The answer to that depends on how much data(number of rows and average record length) you are loading into the table and how much speed will want to achieve in your job and how much time your job takes to finish with different values of those. Usually its a trial and error method(atleast initially) until you are satisfied with the speed you achieve. IMO
Kris

Where's the "Any" key?-Homer Simpson
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There are way too many variables for anyone to just give you numbers. For array size, play around, run some tests. There is a point of diminishing returns for it... processing speed will increase and then eventually slow back down again as that number goes up. You need to find that sweet spot for each particular data set.

Transaction Size just controls commits. Use 0 for an 'all or nothing' load if you can. Otherwise recognize the fact that a failure part way through a job will complicate the recovery / restart process if you've been doing intermediate commits every X records. Do you need to delete previous work before the restart? Have it start where it left off?
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsdoubt
Participant
Posts: 106
Joined: Sat Jul 15, 2006 12:17 am

Post by dsdoubt »

By default Array size is 1.
May I know, for what scenario we need to set it as it is.
And what scenario we can increase?
Is it the memory bandwidth of datastage, which limits the performance at a point when we increase the Array size to a level?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

(Array size * row size) should be close to, but not over, network packet size (or a multiple thereof) for optimum transmission.

(Rows per commit) - or transaction size - is mediated by a number of factors, but should always be a multiple of array size.

Array size of 1 with rows per commit of 1 guarantees accurate error reporting, at the cost of some diminution in throughput.

Too large a transaction size may result in fatal failure in the database.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsdoubt
Participant
Posts: 106
Joined: Sat Jul 15, 2006 12:17 am

Post by dsdoubt »

Though the Array size totally depend upon their environment, may I know the maximum value been used by your guys. So that I can visuvalize the maximum possible limits for more efficiency.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1024 / rowsize
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsdoubt
Participant
Posts: 106
Joined: Sat Jul 15, 2006 12:17 am

Post by dsdoubt »

Thanks Ray,
So is that ment that, datastage will read and process 1024 rows from Database parallely.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, not 1024 and not in parallel. He meant he divides 1024 by the row size and uses that number. So, for a 50 byte row size, the Array Size would be set to 20. Which means 20 rows would be sent across the network at a time where they would be processed in a serial fashion there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

And the 1024 is a typical network packet size.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nkln@you
Premium Member
Premium Member
Posts: 271
Joined: Wed Nov 17, 2004 5:15 am
Location: US

Post by nkln@you »

chulett wrote:There's no rule for how they 'need' to be set, it's up to you and how you want them set. Array Size controls how many records at a time are written to the database and Transaction Size controls ...
I am not a premium member and so unable to see your full reply. Can you Please post that again so that i can see it. And i used to think that Array size is the no. of records that DS will fetch from a DB and Transaction Size is the no. of records after which DS will send a commit command to DB. Please correct me if i am wrong.
Also going through the below posts, i got the way to calculate the array size for optimum speed. Is there any rule for Transaction size too? If yes Please post it.
Aim high
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

nkln@you wrote:I am not a premium member and so unable to see your full reply. Can you Please post that again so that i can see it.
No, I'm afraid not. Sign up! Pennies a day, ya know.
nkln@you wrote:And i used to think that Array size is the no. of records that DS will fetch from a DB and Transaction Size is the no. of records after which DS will send a commit command to DB. Please correct me if i am wrong.
Why 'used to think'? That's basically correct.
nkln@you wrote:Also going through the below posts, i got the way to calculate the array size for optimum speed. Is there any rule for Transaction size too? If yes Please post it.
There's no rule, per se. Zero is 'all or nothing', any other value is the number of rows to commit and could vary based on input volumes, rollback / redo sizing, all kinds of things.

The 'rule' to keep in mind is that Transaction Size should be an even multiple of whatever Array Size you use... unless you use zero, of course. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
nkln@you
Premium Member
Premium Member
Posts: 271
Joined: Wed Nov 17, 2004 5:15 am
Location: US

Post by nkln@you »

Regarding the array size, it was calculated based on network packet size. But what is the DS and DB resides on the same machine? What will be the way to calculate optimum array size for this?
Aim high
Post Reply