Target table truncated if source is empty

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bmarko22
Participant
Posts: 21
Joined: Mon Nov 16, 2009 7:03 am
Location: MD

Target table truncated if source is empty

Post by bmarko22 »

I have a simple job that uses the DB2 stage to select data, it's sent to a transformer, and inserting into a table with clear table then insert.

My question is when the source table is unavailable do to a lock or if it's being refreshed the job see this as the table being empty and truncates my target table. Or what's happening, is that the parralell job will do what ever steps it can get away with in any order it deems appropriate. So it might be deleting all the records while the source data is being fetch in order to save time.

I must be doing something wrong because I can't see an ETL tool saying hmm.. I can't access the source table, let's delete all the target data.

I welcome any suggestions, and thank you all for such a wonderful forum. And please be kind if I'm missing some obvious function.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Lost on what your problem is. You've told it to "clear the table" every time the job runs and it will do that regardless of the state of your source, zero records or not. First empty the table then start loading records. Is that really what you want to be doing? :?

Are you trying to find a way to "clear" and reload the table only if the source is not empty?
-craig

"You can never have too many knives" -- Logan Nine Fingers
bmarko22
Participant
Posts: 21
Joined: Mon Nov 16, 2009 7:03 am
Location: MD

Post by bmarko22 »

I am trying to do a full refresh to the target table. If the source is unavailable or empty I don't want to lose the current data. Better to have day old data than nothing. My source is a very busy warehouse that often has tables that are unavailable.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, yes to the last question. Then you'll need a Sequence to check the source and then conditionally run the load job only when there is actually data to load.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bmarko22
Participant
Posts: 21
Joined: Mon Nov 16, 2009 7:03 am
Location: MD

Post by bmarko22 »

This issue with checking the source is I'm not just using one table as a source but a query with upwards of 5 tables that can run for 90 minutes before it gets the first result row. Each table might have data but the query result might be 0. Don't want to wait almost 2 hours to check if there will be data and then another 2 hours to pull that same data.

So there is no way to do an insert statement in DS that says if there's nothing to insert don't' delete all the target data? There's gotta be. In pli or other application environments we would look for an SQLCODE -100 meaning there was no resulting data in the query. And you can decide what to do from that point on. It's a pretty basic condition to have to handle.

In a server job you can check the rowcount in the link and create a condition on the count. This is not possible in parallel jobs. I don't want to move to server jobs and lose the advantage of parallelism just to not truncate on an empty result set.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Nobody said you would have to pull the data twice. Can you not land the result set and then check that result set is not empty? Doesn't have to be a flat file and there is (off the top of my head) an orchadmin command to get the count from a data set.

I honestly don't know of any way in a single job to conditionally truncate the target table only if there is data to load but others might.
-craig

"You can never have too many knives" -- Logan Nine Fingers
swapnilverma
Participant
Posts: 135
Joined: Tue Aug 14, 2007 4:27 am
Location: Mumbai

Post by swapnilverma »

I guess you have to change your job designe ...

Load the data to a temp table --> Target_TEMP
Than you can use a simple pl sql or another DS job to check

if target_temp is empty do nothing ....
Else ---- Drop ORIG_TARGET and rename Target_temp to ORIG_TARGET


Alternate approch
------------------------------------------------------------------------

You can use Two schema load and Current Schema...

Load the table in Load schema and do the schema swap ...

Current---> Prev
Load ---> Current
Prev --> Load

Now at any given day you have two copies of the data...




Hope it helps
Thanks
Swapnil

"Whenever you find whole world against you just turn around and Lead the world"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Land it to a Data Set. That's what they're for.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmarko22
Participant
Posts: 21
Joined: Mon Nov 16, 2009 7:03 am
Location: MD

Post by bmarko22 »

ray.wurlod wrote:Land it to a Data Set. That's what they're for. ...
Sounds like the approach I was planning on. I'll need to have two jobs. One to extract (and possibly land the data) and another that gets called if the file has any records in it. While I really appreciate all the help I'm still surprised this is not something I can do in one job (with out having to land the data). Technically if I'm landing the data and then loading, that is two copies of the data. We have space considerations.

Is there no other approach that would leave us with one copy. I guess we could always delete the file when the job is done as far as space is concerned.

Thanks Everyone!!
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Use the overwrite feature in filehandline. Then you need not delete the file

Regards
Sreeni
bmarko22
Participant
Posts: 21
Joined: Mon Nov 16, 2009 7:03 am
Location: MD

Post by bmarko22 »

Thank you. The point of deleting the files is we pay for space we use.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Then if possible create a small file by using the count function.
By this way you need to keep the entire source data

Regards
Sreeni
Post Reply