Target Load strategy

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
wasimraja
Participant
Posts: 61
Joined: Fri Jul 06, 2007 4:23 pm
Location: USA

Target Load strategy

Post by wasimraja »

Hi,

I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.

Thanks in advance.
Cheers
Wasim

--If necessity is the mother of invention, then the need to simplify is the father of it.
wasimraja
Participant
Posts: 61
Joined: Fri Jul 06, 2007 4:23 pm
Location: USA

Re: Target Load strategy

Post by wasimraja »

wasimraja wrote:Hi,

I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.

Thanks in advance.
Also, I see that there is a filter stage. Is it available in the server edition? I tried adding it to the palette but couldnt find one available. Is there something I am missing?
Cheers
Wasim

--If necessity is the mother of invention, then the need to simplify is the father of it.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you are doing a normal insert (versus a load) into your database, you can just stick with a simple INSERT and just not look for errors (which would happen if you attempted to insert into an existing key), this is probably easier than doing a lookup - although it depends upon the relative percentage of inserts. If you only have a small percentage, then loading your list of keys into a hashed file and checking against that would be more efficient.

There is no filter stage in Server, although the functionality is easily added into a transformer stage.
satyasur
Participant
Posts: 15
Joined: Fri Apr 07, 2006 12:11 am

you can try this sir

Post by satyasur »

If your source has a ROWlastupdated timestamp column then you can try the following:
store the job last run time in a shared container.
In the job use the selection tab to query row.LASTUPD_DTTM>%DateTimeIn('#LastModifiedDateTime#')
this would only pic up new or updated rows since last run.
(would save you processing all the million rows further in the job.)

Then do a target lookup through a transformer( use a hash file which stores all the keys)
put a constraint where in it will pass only those rows which fail the lookup(because the new rows will not have a match in the existing target hash .)
also update the target hash file and the shared container after the target load.

(senior members please correct me if any thing is wrong.)
regards,
satish.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Satish - I wouldn't say that your approach is wrong, but it might be over-engineered for what the original poster is looking for. We will need to hear back to see what it is they wish to do.
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Re: Target Load strategy

Post by sachin1 »

hello if you want to do conditional insert.

so what we do is we select records from database and create a hash file, have a lookup of this hash file with input source containing huge records, if records matched no upload otherwise insertion.

put proper constraints in transformer.
Akumar1
Participant
Posts: 48
Joined: Tue May 22, 2007 3:38 am
Location: bangalore
Contact:

Re: Target Load strategy

Post by Akumar1 »

Look up the destination table using hash file.

Regards,
Akumar1
wasimraja wrote:
wasimraja wrote:Hi,

I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.

Thanks in advance.
Also, I see that there is a filter stage. Is it available in the server edition? I tried adding it to the palette but couldnt find one available. Is there something I am missing?
wasimraja
Participant
Posts: 61
Joined: Fri Jul 06, 2007 4:23 pm
Location: USA

Post by wasimraja »

Hi all,

Thanks for your ideas.

I donot have a timestamp in the source.
Lets say my source looks something like this.

COL1 COL2
------ ------
1 ASDFA
2 ASDF
3 SFDSF

Now, I have loaded all the 1 million records from source.
Next day, as there is no update timestamp, I will pickup all the 1 million records+10 new records(say).

Now, I can do a target lookup(using hash file) and see if the record coming in from source exists or not. But, how do I restrict datastage to only insert the 10 new records and stop it from updating the existing records.

I see that we need to use a constraint in the transformer. It will be great if someone can tell me the actual constraint itself.
Cheers
Wasim

--If necessity is the mother of invention, then the need to simplify is the father of it.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Something like

Code: Select all

IsNull(InLink.TheTimestamp) Or (InLink.TheTimestamp = "")
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
wasimraja
Participant
Posts: 61
Joined: Fri Jul 06, 2007 4:23 pm
Location: USA

Post by wasimraja »

Thanks Ray.

That answers my need.
Cheers
Wasim

--If necessity is the mother of invention, then the need to simplify is the father of it.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Old style. You could also use an Input Link Variable boolean:

Code: Select all

LookupLinkname.NOTFOUND
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Wasim was not using a hashed file, the only known mechanism for which the NOTFOUND link variable is reliable.

A hash file is just not the same (since it doesn't exist!) :twisted:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:Wasim was not using a hashed file, the only known mechanism for which the NOTFOUND link variable is reliable.
Agreed, the link variables are only reliable for a hashed file lookup. For any other stage, fall back on the original mechanism of checking for a null Key field post lookup.

However, I based my answer on this statement but without your anal-retentive 'hash versus hashed' filter enabled:
wasimraja wrote:Now, I can do a target lookup(using hash file) and see if the record coming in from source exists or not. But, how do I restrict datastage to only insert the 10 new records and stop it from updating the existing records.
:wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply