Target Load strategy

wasimraja · Post by **wasimraja** » Sat Jul 28, 2007 9:05 pm

Hi,

I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.

Thanks in advance.

wasimraja · Post by **wasimraja** » Sat Jul 28, 2007 9:20 pm

wasimraja wrote:Hi,

I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.

Thanks in advance.

Also, I see that there is a filter stage. Is it available in the server edition? I tried adding it to the palette but couldnt find one available. Is there something I am missing?

ArndW · Post by **ArndW** » Sat Jul 28, 2007 10:01 pm

If you are doing a normal insert (versus a load) into your database, you can just stick with a simple INSERT and just not look for errors (which would happen if you attempted to insert into an existing key), this is probably easier than doing a lookup - although it depends upon the relative percentage of inserts. If you only have a small percentage, then loading your list of keys into a hashed file and checking against that would be more efficient.

There is no filter stage in Server, although the functionality is easily added into a transformer stage.

satyasur · Post by **satyasur** » Mon Jul 30, 2007 1:05 am

If your source has a ROWlastupdated timestamp column then you can try the following:
store the job last run time in a shared container.
In the job use the selection tab to query row.LASTUPD_DTTM>%DateTimeIn('#LastModifiedDateTime#')
this would only pic up new or updated rows since last run.
(would save you processing all the million rows further in the job.)

Then do a target lookup through a transformer( use a hash file which stores all the keys)
put a constraint where in it will pass only those rows which fail the lookup(because the new rows will not have a match in the existing target hash .)
also update the target hash file and the shared container after the target load.

(senior members please correct me if any thing is wrong.)
regards,
satish.

ArndW · Post by **ArndW** » Mon Jul 30, 2007 2:06 am

Satish - I wouldn't say that your approach is wrong, but it might be over-engineered for what the original poster is looking for. We will need to hear back to see what it is they wish to do.

sachin1 · Post by **sachin1** » Mon Jul 30, 2007 6:28 am

hello if you want to do conditional insert.

so what we do is we select records from database and create a hash file, have a lookup of this hash file with input source containing huge records, if records matched no upload otherwise insertion.

put proper constraints in transformer.

Akumar1 · Post by **Akumar1** » Mon Jul 30, 2007 6:35 am

Look up the destination table using hash file.

Regards,
Akumar1

wasimraja wrote:
wasimraja wrote:Hi,

I have a job which picks up 1 million records and loads it into the target.
Basically, I want to insert only the new records. Is there a functionality in datastage similar to the DD_INSERT in informatica. If so, I can do a target lookup and if the record already exists, I wont process else I will insert.

Thanks in advance.
Also, I see that there is a filter stage. Is it available in the server edition? I tried adding it to the palette but couldnt find one available. Is there something I am missing?

wasimraja · Post by **wasimraja** » Mon Jul 30, 2007 12:01 pm

Hi all,

Thanks for your ideas.

I donot have a timestamp in the source.
Lets say my source looks something like this.

COL1 COL2
------ ------
1 ASDFA
2 ASDF
3 SFDSF

Now, I have loaded all the 1 million records from source.
Next day, as there is no update timestamp, I will pickup all the 1 million records+10 new records(say).

Now, I can do a target lookup(using hash file) and see if the record coming in from source exists or not. But, how do I restrict datastage to only insert the 10 new records and stop it from updating the existing records.

I see that we need to use a constraint in the transformer. It will be great if someone can tell me the actual constraint itself.

ray.wurlod · Post by **ray.wurlod** » Mon Jul 30, 2007 3:38 pm

Something like

Code: Select all

IsNull(InLink.TheTimestamp) Or (InLink.TheTimestamp = "")

wasimraja · Post by **wasimraja** » Mon Jul 30, 2007 7:38 pm

Thanks Ray.

That answers my need.

chulett · Post by **chulett** » Mon Jul 30, 2007 8:48 pm

Old style. You could also use an Input Link Variable boolean:

Code: Select all

LookupLinkname.NOTFOUND

ray.wurlod · Post by **ray.wurlod** » Tue Jul 31, 2007 12:03 am

Wasim was not using a hashed file, the only known mechanism for which the NOTFOUND link variable is reliable.

A hash file is just not the same (since it doesn't exist!)

chulett · Post by **chulett** » Tue Jul 31, 2007 6:31 am

ray.wurlod wrote:Wasim was not using a hashed file, the only known mechanism for which the NOTFOUND link variable is reliable.

Agreed, the link variables are only reliable for a hashed file lookup. For any other stage, fall back on the original mechanism of checking for a null Key field post lookup.

However, I based my answer on this statement but without your anal-retentive 'hash versus hashed' filter enabled:

wasimraja wrote:Now, I can do a target lookup(using hash file) and see if the record coming in from source exists or not. But, how do I restrict datastage to only insert the 10 new records and stop it from updating the existing records.

DSXchange

Target Load strategy

Target Load strategy

Re: Target Load strategy

you can try this sir

Re: Target Load strategy

Re: Target Load strategy