Page 1 of 1

Incremental load with no key field at source

Posted: Wed May 02, 2007 4:15 pm
by rafik2k
This is one of the interveiw question.

Source tables/files does not contain any key field.

How to implement incremental load in the target tables?

I was guesing using unix shell script commands like compare and other commmands, it be can be possible.

Any other tips/solution would be a great help.

Thanks in advance

Regards,
Rafiq

Re: Incremental load with no key field at source

Posted: Wed May 02, 2007 4:22 pm
by sud
rafik2k wrote:This is one of the interveiw question.

Source tables/files does not contain any key field.

How to implement incremental load in the target tables?

I was guesing using unix shell script commands like compare and other commmands, it be can be possible.

Any other tips/solution would be a great help.

Thanks in advance

Regards,
Rafiq
The "key" to understanding whether a incoming record is already present or not are the "key columns". When you don't have any, what it necessarily means is that 1>no primary keys are specified in the table or 2>there are actually no keys. Option 1 gives you the alternate or natural keys based on which you can do a change capture in datastage. Option 2 tells you that all the columns are a key, so you do a change capture on all columns as keys.

In a practical situation, I cannot think of a Logical Datamodel where you have a table which does not have any keys and for which you want to write incremental load ETL. Probably when you find one, you should bust your data modeller.

Posted: Wed May 02, 2007 4:26 pm
by nick.bond
Without a key what identifies the records to compare from one run to the next?

If there is no natural key in the source field then all you can test for is whether their is already a record in the target that is exactly the same, in which case all fields are key fields???

Or is this a trick question? was there a modified time field in the source?

Posted: Wed May 02, 2007 5:17 pm
by chucksmith
If you trust your load portion of ETL, you can compare yesterday's full load file with today's full load file to identify inserts, updates, and deletes.

Look at the unix "comm" command to compare your load files.

Posted: Wed May 02, 2007 7:55 pm
by kduke
It is a trick question. An incremental load is based on only selecting records in the source which have changed. You can do that without keys. All you need is last updated field in the source.

What to update in the target may require keys.

Posted: Thu May 03, 2007 2:13 am
by katz
There might be a source system log/journal file from which you can extract the incremental set of source data. Updating the target without key definitions is a bit harder to do.

Posted: Fri May 04, 2007 1:02 am
by rafik2k
chucksmith wrote:If you trust your load portion of ETL, you can compare yesterday's full load file with today's full load file to identify inserts, updates, and deletes.

Look at the unix "comm" command to compare your load files.
Thanks for your tips
Though I can indentify changed records using unix utilities, but how can I update or deletes these recorods at target database without any key :?:

Posted: Fri May 04, 2007 1:49 am
by rafik2k
kduke wrote:It is a trick question. An incremental load is based on only selecting records in the source which have changed. You can do that without keys. All you need is last updated field in the source.

What to update in the target may require keys.
I also believe that it is trick question.
That means if we have last updated field in the source then we can insert fresh records only.

We can't assume all fields are key fields as some field even may contain null values.

So updating existing records in target not possible without key.