Page 1 of 1

How to prevent truncating before data extraction?

Posted: Thu May 07, 2015 3:25 pm
by Palermo
Hi,

I have a issue. SQL query performs for a long time from DB2 Connector stage and the target table is truncated before data extraction. During this time the target table is empty. How to prevent truncating the target table before data extraction from source table by using only one parallel job?

The job:
Source_DB2_Connector----->Target_DB2_Connector

Thanks!

Posted: Thu May 07, 2015 5:42 pm
by chulett
So... you are looking for a way to truncate the target table only once the data is ready to be loaded so it doesn't spend so long empty? There really isn't a way to do that 'using only one parallel job' as to do that you'd need to extract your source data elsewhere, say to a dataset and then once complete, kick off the truncate and load. But that means two jobs.

Posted: Thu May 07, 2015 6:54 pm
by ray.wurlod
You can perform these "blocking operations" in server jobs, but not in single parallel jobs. But even there persistent storage is involved.

Posted: Fri May 08, 2015 7:12 am
by chulett
True. Took advantage of that fact many times. :wink:

Posted: Fri May 08, 2015 9:54 am
by Palermo
I think that it is impossible too. But I know one way but it impacts on performance. That is to use Change Capture stage and create 4 additional DB2 connector stage(READ, INSERT, UPDATE, DELETE) and Transformer stage. I will be forced to use them. Maybe you know another way?

Posted: Fri May 08, 2015 10:23 am
by chulett
Use two jobs.

Posted: Fri May 08, 2015 11:19 am
by ShaneMuir
Once the query has executed - how long is it taking for data transfer?

If all you are after is not having the table empty for so long, is it silly to ask if your query is optimised?

Something also that I have wanted to try is, if you have an output reject link from a DB stage which feeds as a reference link into a lookup with entire as the partitioning, will a Parallel job 'wait' for the data before outputting anything? If that was to work, then you could write the data to temp table, then maybe write a routine that that calls the table truncation from the OS only once data has got to that point? Not convinced it would work though, but you could try.

EDIT:
The more I think about it the question arises: why can you not have 2 jobs?
Can you have a temporary table in your target DB? If so - run your job into the temporary table, and then once the job is finished, execute a stored procedure as either after SQL or as an ofter job routine, to copy the data from the temp table to you target table? That would get it done in one job.

Posted: Fri May 08, 2015 12:21 pm
by Palermo
ShaneMuir wrote:Once the query has executed - how long is it taking for data transfer?

If all you are after is not having the table empty for so long, is it silly to ask if your query is optimised?
The original SQL executed for minimum 20 minutes and I optimized it and now it executes for minimum 10 minutes. But it is minimum time and varies according to workload.
ShaneMuir wrote:The more I think about it the question arises: why can you not have 2 jobs?
Can you have a temporary table in your target DB? If so - run your job into the temporary table, and then once the job is finished, execute a stored procedure as either after SQL or as an ofter job routine, to copy the data from the temp table to you target table? That would get it done in one job.
Yes, of course I know that it is easy to solve the issue by using 2 jobs or a temporary staging table but I can change only one parallel job. I've
implemented the job by using Change Capture stage and it seems that the job works fine.

Thanks to everyone for the tips.

Posted: Fri May 08, 2015 12:27 pm
by chulett
Palermo wrote:I can change only one parallel job
That seems like a very... odd... restriction. And the mechanism you were 'forced' to use (Change Capture with INSERT/UPDATE/DELETE) is pretty much the polar opposite of a simple truncate and load. But if that is working for you without 'impacts on performance' then yay.

Posted: Fri May 08, 2015 12:54 pm
by Palermo
chulett wrote:But if that is working for you without 'impacts on performance' then yay.
The SQL query returns about 60 000 rows and the change capture stage don't impact on performance in this case.

Posted: Fri May 08, 2015 4:01 pm
by ray.wurlod
Your proof of this assertion?

Posted: Sat May 09, 2015 1:42 am
by Palermo
truncate
Image

change capture stage (read, insert and delete)
Image

Posted: Sat May 09, 2015 2:01 am
by ray.wurlod
It would be interesting to see the %CPU for these operators.

Posted: Sat May 09, 2015 2:43 am
by Palermo
You are right, %CPU increases from 1% to 28% but the loading time is not long.
running
Image
finished
Image

truncate
Image

Posted: Sat May 09, 2015 7:56 am
by chulett
Ah... I can see why your performance seems to be fine with this change, you are working with what basically is an insignificant number of rows. I was assuming the number was... much larger.