How to prevent truncating before data extraction?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

How to prevent truncating before data extraction?

Post by Palermo »

Hi,

I have a issue. SQL query performs for a long time from DB2 Connector stage and the target table is truncated before data extraction. During this time the target table is empty. How to prevent truncating the target table before data extraction from source table by using only one parallel job?

The job:
Source_DB2_Connector----->Target_DB2_Connector

Thanks!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... you are looking for a way to truncate the target table only once the data is ready to be loaded so it doesn't spend so long empty? There really isn't a way to do that 'using only one parallel job' as to do that you'd need to extract your source data elsewhere, say to a dataset and then once complete, kick off the truncate and load. But that means two jobs.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can perform these "blocking operations" in server jobs, but not in single parallel jobs. But even there persistent storage is involved.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

True. Took advantage of that fact many times. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

I think that it is impossible too. But I know one way but it impacts on performance. That is to use Change Capture stage and create 4 additional DB2 connector stage(READ, INSERT, UPDATE, DELETE) and Transformer stage. I will be forced to use them. Maybe you know another way?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Use two jobs.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Once the query has executed - how long is it taking for data transfer?

If all you are after is not having the table empty for so long, is it silly to ask if your query is optimised?

Something also that I have wanted to try is, if you have an output reject link from a DB stage which feeds as a reference link into a lookup with entire as the partitioning, will a Parallel job 'wait' for the data before outputting anything? If that was to work, then you could write the data to temp table, then maybe write a routine that that calls the table truncation from the OS only once data has got to that point? Not convinced it would work though, but you could try.

EDIT:
The more I think about it the question arises: why can you not have 2 jobs?
Can you have a temporary table in your target DB? If so - run your job into the temporary table, and then once the job is finished, execute a stored procedure as either after SQL or as an ofter job routine, to copy the data from the temp table to you target table? That would get it done in one job.
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

ShaneMuir wrote:Once the query has executed - how long is it taking for data transfer?

If all you are after is not having the table empty for so long, is it silly to ask if your query is optimised?
The original SQL executed for minimum 20 minutes and I optimized it and now it executes for minimum 10 minutes. But it is minimum time and varies according to workload.
ShaneMuir wrote:The more I think about it the question arises: why can you not have 2 jobs?
Can you have a temporary table in your target DB? If so - run your job into the temporary table, and then once the job is finished, execute a stored procedure as either after SQL or as an ofter job routine, to copy the data from the temp table to you target table? That would get it done in one job.
Yes, of course I know that it is easy to solve the issue by using 2 jobs or a temporary staging table but I can change only one parallel job. I've
implemented the job by using Change Capture stage and it seems that the job works fine.

Thanks to everyone for the tips.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Palermo wrote:I can change only one parallel job
That seems like a very... odd... restriction. And the mechanism you were 'forced' to use (Change Capture with INSERT/UPDATE/DELETE) is pretty much the polar opposite of a simple truncate and load. But if that is working for you without 'impacts on performance' then yay.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

chulett wrote:But if that is working for you without 'impacts on performance' then yay.
The SQL query returns about 60 000 rows and the change capture stage don't impact on performance in this case.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your proof of this assertion?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

truncate
Image

change capture stage (read, insert and delete)
Image
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It would be interesting to see the %CPU for these operators.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

You are right, %CPU increases from 1% to 28% but the loading time is not long.
running
Image
finished
Image

truncate
Image
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah... I can see why your performance seems to be fine with this change, you are working with what basically is an insignificant number of rows. I was assuming the number was... much larger.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply