Target table truncated if source is empty
Moderators: chulett, rschirm, roy
Target table truncated if source is empty
I have a simple job that uses the DB2 stage to select data, it's sent to a transformer, and inserting into a table with clear table then insert.
My question is when the source table is unavailable do to a lock or if it's being refreshed the job see this as the table being empty and truncates my target table. Or what's happening, is that the parralell job will do what ever steps it can get away with in any order it deems appropriate. So it might be deleting all the records while the source data is being fetch in order to save time.
I must be doing something wrong because I can't see an ETL tool saying hmm.. I can't access the source table, let's delete all the target data.
I welcome any suggestions, and thank you all for such a wonderful forum. And please be kind if I'm missing some obvious function.
My question is when the source table is unavailable do to a lock or if it's being refreshed the job see this as the table being empty and truncates my target table. Or what's happening, is that the parralell job will do what ever steps it can get away with in any order it deems appropriate. So it might be deleting all the records while the source data is being fetch in order to save time.
I must be doing something wrong because I can't see an ETL tool saying hmm.. I can't access the source table, let's delete all the target data.
I welcome any suggestions, and thank you all for such a wonderful forum. And please be kind if I'm missing some obvious function.
Lost on what your problem is. You've told it to "clear the table" every time the job runs and it will do that regardless of the state of your source, zero records or not. First empty the table then start loading records. Is that really what you want to be doing?
Are you trying to find a way to "clear" and reload the table only if the source is not empty?
Are you trying to find a way to "clear" and reload the table only if the source is not empty?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
This issue with checking the source is I'm not just using one table as a source but a query with upwards of 5 tables that can run for 90 minutes before it gets the first result row. Each table might have data but the query result might be 0. Don't want to wait almost 2 hours to check if there will be data and then another 2 hours to pull that same data.
So there is no way to do an insert statement in DS that says if there's nothing to insert don't' delete all the target data? There's gotta be. In pli or other application environments we would look for an SQLCODE -100 meaning there was no resulting data in the query. And you can decide what to do from that point on. It's a pretty basic condition to have to handle.
In a server job you can check the rowcount in the link and create a condition on the count. This is not possible in parallel jobs. I don't want to move to server jobs and lose the advantage of parallelism just to not truncate on an empty result set.
So there is no way to do an insert statement in DS that says if there's nothing to insert don't' delete all the target data? There's gotta be. In pli or other application environments we would look for an SQLCODE -100 meaning there was no resulting data in the query. And you can decide what to do from that point on. It's a pretty basic condition to have to handle.
In a server job you can check the rowcount in the link and create a condition on the count. This is not possible in parallel jobs. I don't want to move to server jobs and lose the advantage of parallelism just to not truncate on an empty result set.
Nobody said you would have to pull the data twice. Can you not land the result set and then check that result set is not empty? Doesn't have to be a flat file and there is (off the top of my head) an orchadmin command to get the count from a data set.
I honestly don't know of any way in a single job to conditionally truncate the target table only if there is data to load but others might.
I honestly don't know of any way in a single job to conditionally truncate the target table only if there is data to load but others might.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 135
- Joined: Tue Aug 14, 2007 4:27 am
- Location: Mumbai
I guess you have to change your job designe ...
Load the data to a temp table --> Target_TEMP
Than you can use a simple pl sql or another DS job to check
if target_temp is empty do nothing ....
Else ---- Drop ORIG_TARGET and rename Target_temp to ORIG_TARGET
Alternate approch
------------------------------------------------------------------------
You can use Two schema load and Current Schema...
Load the table in Load schema and do the schema swap ...
Current---> Prev
Load ---> Current
Prev --> Load
Now at any given day you have two copies of the data...
Hope it helps
Load the data to a temp table --> Target_TEMP
Than you can use a simple pl sql or another DS job to check
if target_temp is empty do nothing ....
Else ---- Drop ORIG_TARGET and rename Target_temp to ORIG_TARGET
Alternate approch
------------------------------------------------------------------------
You can use Two schema load and Current Schema...
Load the table in Load schema and do the schema swap ...
Current---> Prev
Load ---> Current
Prev --> Load
Now at any given day you have two copies of the data...
Hope it helps
Thanks
Swapnil
"Whenever you find whole world against you just turn around and Lead the world"
Swapnil
"Whenever you find whole world against you just turn around and Lead the world"
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Sounds like the approach I was planning on. I'll need to have two jobs. One to extract (and possibly land the data) and another that gets called if the file has any records in it. While I really appreciate all the help I'm still surprised this is not something I can do in one job (with out having to land the data). Technically if I'm landing the data and then loading, that is two copies of the data. We have space considerations.ray.wurlod wrote:Land it to a Data Set. That's what they're for. ...
Is there no other approach that would leave us with one copy. I guess we could always delete the file when the job is done as far as space is concerned.
Thanks Everyone!!
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am