Target table truncated if source is empty

bmarko22 · Post by **bmarko22** » Sun Oct 31, 2010 2:23 pm

I have a simple job that uses the DB2 stage to select data, it's sent to a transformer, and inserting into a table with clear table then insert.

My question is when the source table is unavailable do to a lock or if it's being refreshed the job see this as the table being empty and truncates my target table. Or what's happening, is that the parralell job will do what ever steps it can get away with in any order it deems appropriate. So it might be deleting all the records while the source data is being fetch in order to save time.

I must be doing something wrong because I can't see an ETL tool saying hmm.. I can't access the source table, let's delete all the target data.

I welcome any suggestions, and thank you all for such a wonderful forum. And please be kind if I'm missing some obvious function.

chulett · Post by **chulett** » Sun Oct 31, 2010 2:29 pm

Lost on what your problem is. You've told it to "clear the table" every time the job runs and it will do that regardless of the state of your source, zero records or not. First empty the table then start loading records. Is that really what you want to be doing?

Are you trying to find a way to "clear" and reload the table only if the source is not empty?

bmarko22 · Post by **bmarko22** » Sun Oct 31, 2010 7:33 pm

I am trying to do a full refresh to the target table. If the source is unavailable or empty I don't want to lose the current data. Better to have day old data than nothing. My source is a very busy warehouse that often has tables that are unavailable.

chulett · Post by **chulett** » Sun Oct 31, 2010 7:46 pm

So, yes to the last question. Then you'll need a Sequence to check the source and then conditionally run the load job only when there is actually data to load.

bmarko22 · Post by **bmarko22** » Sun Oct 31, 2010 8:35 pm

This issue with checking the source is I'm not just using one table as a source but a query with upwards of 5 tables that can run for 90 minutes before it gets the first result row. Each table might have data but the query result might be 0. Don't want to wait almost 2 hours to check if there will be data and then another 2 hours to pull that same data.

So there is no way to do an insert statement in DS that says if there's nothing to insert don't' delete all the target data? There's gotta be. In pli or other application environments we would look for an SQLCODE -100 meaning there was no resulting data in the query. And you can decide what to do from that point on. It's a pretty basic condition to have to handle.

In a server job you can check the rowcount in the link and create a condition on the count. This is not possible in parallel jobs. I don't want to move to server jobs and lose the advantage of parallelism just to not truncate on an empty result set.

chulett · Post by **chulett** » Sun Oct 31, 2010 8:46 pm

Nobody said you would have to pull the data twice. Can you not land the result set and then check that result set is not empty? Doesn't have to be a flat file and there is (off the top of my head) an orchadmin command to get the count from a data set.

I honestly don't know of any way in a single job to conditionally truncate the target table only if there is data to load but others might.

swapnilverma · Post by **swapnilverma** » Sun Oct 31, 2010 9:24 pm

I guess you have to change your job designe ...

Load the data to a temp table --> Target_TEMP
Than you can use a simple pl sql or another DS job to check

if target_temp is empty do nothing ....
Else ---- Drop ORIG_TARGET and rename Target_temp to ORIG_TARGET

Alternate approch
------------------------------------------------------------------------

You can use Two schema load and Current Schema...

Load the table in Load schema and do the schema swap ...

Current---> Prev
Load ---> Current
Prev --> Load

Now at any given day you have two copies of the data...

Hope it helps

ray.wurlod · Post by **ray.wurlod** » Mon Nov 01, 2010 1:56 am

Land it to a Data Set. That's what they're for.

bmarko22 · Post by **bmarko22** » Mon Nov 01, 2010 6:31 am

ray.wurlod wrote:Land it to a Data Set. That's what they're for. ...

Sounds like the approach I was planning on. I'll need to have two jobs. One to extract (and possibly land the data) and another that gets called if the file has any records in it. While I really appreciate all the help I'm still surprised this is not something I can do in one job (with out having to land the data). Technically if I'm landing the data and then loading, that is two copies of the data. We have space considerations.

Is there no other approach that would leave us with one copy. I guess we could always delete the file when the job is done as far as space is concerned.

Thanks Everyone!!

Sreenivasulu · Post by **Sreenivasulu** » Mon Nov 01, 2010 10:53 am

Use the overwrite feature in filehandline. Then you need not delete the file

Regards
Sreeni

bmarko22 · Post by **bmarko22** » Mon Nov 01, 2010 11:15 am

Thank you. The point of deleting the files is we pay for space we use.

Sreenivasulu · Post by **Sreenivasulu** » Mon Nov 01, 2010 11:23 am

Then if possible create a small file by using the count function.
By this way you need to keep the entire source data

Regards
Sreeni