Target Load strategy
Moderators: chulett, rschirm, roy
Target Load strategy
Hi,
I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.
Thanks in advance.
I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.
Thanks in advance.
Cheers
Wasim
--If necessity is the mother of invention, then the need to simplify is the father of it.
Wasim
--If necessity is the mother of invention, then the need to simplify is the father of it.
Re: Target Load strategy
Also, I see that there is a filter stage. Is it available in the server edition? I tried adding it to the palette but couldnt find one available. Is there something I am missing?wasimraja wrote:Hi,
I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.
Thanks in advance.
Cheers
Wasim
--If necessity is the mother of invention, then the need to simplify is the father of it.
Wasim
--If necessity is the mother of invention, then the need to simplify is the father of it.
If you are doing a normal insert (versus a load) into your database, you can just stick with a simple INSERT and just not look for errors (which would happen if you attempted to insert into an existing key), this is probably easier than doing a lookup - although it depends upon the relative percentage of inserts. If you only have a small percentage, then loading your list of keys into a hashed file and checking against that would be more efficient.
There is no filter stage in Server, although the functionality is easily added into a transformer stage.
There is no filter stage in Server, although the functionality is easily added into a transformer stage.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
you can try this sir
If your source has a ROWlastupdated timestamp column then you can try the following:
store the job last run time in a shared container.
In the job use the selection tab to query row.LASTUPD_DTTM>%DateTimeIn('#LastModifiedDateTime#')
this would only pic up new or updated rows since last run.
(would save you processing all the million rows further in the job.)
Then do a target lookup through a transformer( use a hash file which stores all the keys)
put a constraint where in it will pass only those rows which fail the lookup(because the new rows will not have a match in the existing target hash .)
also update the target hash file and the shared container after the target load.
(senior members please correct me if any thing is wrong.)
regards,
satish.
store the job last run time in a shared container.
In the job use the selection tab to query row.LASTUPD_DTTM>%DateTimeIn('#LastModifiedDateTime#')
this would only pic up new or updated rows since last run.
(would save you processing all the million rows further in the job.)
Then do a target lookup through a transformer( use a hash file which stores all the keys)
put a constraint where in it will pass only those rows which fail the lookup(because the new rows will not have a match in the existing target hash .)
also update the target hash file and the shared container after the target load.
(senior members please correct me if any thing is wrong.)
regards,
satish.
Satish - I wouldn't say that your approach is wrong, but it might be over-engineered for what the original poster is looking for. We will need to hear back to see what it is they wish to do.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Re: Target Load strategy
hello if you want to do conditional insert.
so what we do is we select records from database and create a hash file, have a lookup of this hash file with input source containing huge records, if records matched no upload otherwise insertion.
put proper constraints in transformer.
so what we do is we select records from database and create a hash file, have a lookup of this hash file with input source containing huge records, if records matched no upload otherwise insertion.
put proper constraints in transformer.
Re: Target Load strategy
Look up the destination table using hash file.
Regards,
Akumar1
Regards,
Akumar1
wasimraja wrote:Also, I see that there is a filter stage. Is it available in the server edition? I tried adding it to the palette but couldnt find one available. Is there something I am missing?wasimraja wrote:Hi,
I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.
Thanks in advance.
Hi all,
Thanks for your ideas.
I donot have a timestamp in the source.
Lets say my source looks something like this.
COL1 COL2
------ ------
1 ASDFA
2 ASDF
3 SFDSF
Now, I have loaded all the 1 million records from source.
Next day, as there is no update timestamp, I will pickup all the 1 million records+10 new records(say).
Now, I can do a target lookup(using hash file) and see if the record coming in from source exists or not. But, how do I restrict datastage to only insert the 10 new records and stop it from updating the existing records.
I see that we need to use a constraint in the transformer. It will be great if someone can tell me the actual constraint itself.
Thanks for your ideas.
I donot have a timestamp in the source.
Lets say my source looks something like this.
COL1 COL2
------ ------
1 ASDFA
2 ASDF
3 SFDSF
Now, I have loaded all the 1 million records from source.
Next day, as there is no update timestamp, I will pickup all the 1 million records+10 new records(say).
Now, I can do a target lookup(using hash file) and see if the record coming in from source exists or not. But, how do I restrict datastage to only insert the 10 new records and stop it from updating the existing records.
I see that we need to use a constraint in the transformer. It will be great if someone can tell me the actual constraint itself.
Cheers
Wasim
--If necessity is the mother of invention, then the need to simplify is the father of it.
Wasim
--If necessity is the mother of invention, then the need to simplify is the father of it.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Something like
Code: Select all
IsNull(InLink.TheTimestamp) Or (InLink.TheTimestamp = "")
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Old style. You could also use an Input Link Variable boolean:
Code: Select all
LookupLinkname.NOTFOUND
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Wasim was not using a hashed file, the only known mechanism for which the NOTFOUND link variable is reliable.
A hash file is just not the same (since it doesn't exist!)
A hash file is just not the same (since it doesn't exist!)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Agreed, the link variables are only reliable for a hashed file lookup. For any other stage, fall back on the original mechanism of checking for a null Key field post lookup.ray.wurlod wrote:Wasim was not using a hashed file, the only known mechanism for which the NOTFOUND link variable is reliable.
However, I based my answer on this statement but without your anal-retentive 'hash versus hashed' filter enabled:
wasimraja wrote:Now, I can do a target lookup(using hash file) and see if the record coming in from source exists or not. But, how do I restrict datastage to only insert the 10 new records and stop it from updating the existing records.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers