Delete

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Delete

Post by asitagrawal »

Hi,

I need to delete some rows from the tables,
for which I have the keys in an input file.

With the help of 1 key-set, the corresponding data needs
to be deleted from 3 tables ( 1 parent and 2 children tables ).

How should I determine the Array Size and Transaction size ??
The help guide does not mention for Delete cases...

Plz advice.
Share to Learn, and Learn to Share.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If DB2 UDB server is on the same machine as DataStage server, set Array Size to 1. Otherwise set Array Size to some multiple of (packet_size / row_size). Is there a large number of deletes? If not, set rows per transaction to 0, so that all the deletes form one transaction.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Hi Ray,

The DB is on a separate machine.
Please advice me how to determine packet_size and row_size ?

The deletes may scale upto 6 million a day !!!

in my last run, deleting 6 Million rows, took 45 minutes !!

Please advice,
Asit
Share to Learn, and Learn to Share.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Hi Ray,

Can u plz guide me on determining the packet_size and row_size ?

Warm Regards,
Asit
Share to Learn, and Learn to Share.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ask your network administrator about packet size. It's usually some small multiple of 1KB.
Row size must be calculated by summing the size of each field and adding one byte for the inter-field delimiters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Hi,

I have requested the network team for the packet_size value..
Meanwhile, can u guide a lil on this:

As my input, I have a set of key which uniquely identify a row in table 1 but, multiple rows in tables 2 & 3.

Based on this input key set, deletes have to happen on all the 3 tables,
where table 2 & 3 are child tables of Table 1.

For each input set of key, from table 1, there will be one row to be deleted, and from table 2 & 3 ( the child tables ), there could be as high as 8 Million rows and as low as 2 million rows to be deleted.

Now for ex:
The input number of keysis just 600 and it might be deleting 6Million rows from table 2 & 3, so what should be my transaction size, in order to issue commits ?
Currently I am giving transaction size as 1000 and Array size as 1000.
So I am unable to understand the behavor of DataStage wrt issuing Commit, as I have only 600 keys as input but my Array size and transaction size are set as 1000 !!

Please help !
Share to Learn, and Learn to Share.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What's not to understand? You told it to commit every 1000 records and only sent it 600. In that case it only commits once at the end. It will always 'commit once at the end' regardless of the setting, the setting is what drives intermediate commits. In this particular case you had none.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Post by sachin1 »

Array size : Specifies number of rows to be transfered in one call between datastage server and oracle server before they are written. setting large number use more memory on the client machine, this minimizes round trips and maximizes performance by executing fewer statement.

transaction size:specify number of rows written before the data is committed to the table.

difficult to monitor in run time.

javascript:emoticon(':)')
Smile
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Let me try to explain again.

What I don't understand is that the Array Sie and Transaction Size must be determined , considering the rows in the Input to the stage OR the number of rows affected as a result of executing the SQL corresponding to each inout set of key ???
Share to Learn, and Learn to Share.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Array Size governs transmission of data between DataStage and the database server.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The former. It has no clue how many rows any given sql has affected.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Hi All,

The deletes performance is becoming a biq Question for me...

I have set the Array Size and Transaction Size based
on the packet_Size and Row_size..... I suppose as a result the
time taken to delete the set of data is following a linear
curve...

But still the time taken is pretty high !!..

For a set of keys, the number of rows effected is drastically different, eg.

Key-Set-1: 2 rows to be deleted
Key-Set-2: 5,189,142 rows to be deleted
Key-Set-3: 87,425 rows to be deleted
Key-Set-4: 235,244 rows to be deleted
Key-Set-5: 1,523,532 rows to be deleted
Key-Set-6: 1 rows to be deleted
Key-Set-7: 2,152,345 rows to be deleted
Key-Set-8: 2,125 rows to be deleted
Key-Set-9: 122 rows to be deleted


Total time elapsed for deleting various num of rows is:

Code: Select all

3.996.662	0:32:38	
5.089.642	0:44:12		
234.567	  0:01:02	
2.464.859	0:22:07
Whats your advice ???
Share to Learn, and Learn to Share.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Manage your expectations better. A minute to delete 2 lakh rows isn't too bad, remembering that it's a transactional operation so there will be some overhead in managing the transaction(s).

Or invest in more powerful hardware. I'm sure your vendor would be happy to take your inquiry.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

ray.wurlod wrote: I'm sure your vendor would be happy to take your inquiry.
:D , Yes Ray, even I understand that.. but the thing is that now I must convey to the client that this is the best tuned way or best set parameters , for the deletes, from Ascential side and nothing else can be done on Ascential side.

What say ??
Share to Learn, and Learn to Share.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What is your 'set of keys' - one record? If so, then futzing with the Array Size won't help at all if all DataStage see is 9 records. About all you could control from a Transaction Size standpoint would be if each set of records from any given 'key set' were committed per key change. That would take a value of 1.

And there really isn't anything about the 'Ascential side' going on here. You are doing transactional deletes against a relational database where a small number of driving keys generate a large number of related deletes. Something like this would take that long in any tool.

Without changing the core design of your job, improvements would have to come on the database side. Things like dropping unnecessary indexes and disabling constraints, for example, could help with the overall processing time. It also depends on the keys used to control your deletes - are they PK keys, indexed in some other fashion or deleted via full table scan? Check with your DBA or Architect to see if they have other options you could explore.

You could rethink your design, build something more suited to 'high-speed' deletes. Does DB2 offer anything in that regard to help here? Could you, for example, build a sproc that does bulk collecting of rowids or PK values and then 'block' deletes of some nature. I do this in Oracle when the traditional transactional method is deemed to be taking 'too long' for the process's SLA. I don't know squat about DB2 so can't offer anything in the way of specifics. However, I'm sure others can.

Or, as Ray notes, manage your expectations better. You deleted what... around 9M records in about 1.5 hours? For that design that's nothing to sneeze at. How often does this need to run? That time for those volumes might be great in a weekly or monthly process, less so for a daily one.

So you've got some other options to explore. Now it's between you and your client. Are they willing to pay you to take the time needed for this, to achieve an X percent reduction in the runtime? Especially considering X (as the name implies) is an unknown value? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply