Hi all,
I've got SCD of type 1 working correctly within DataStage so I can now figure out which records I have previously processed. However, I'm now struggling to figure out which records I have previously delivered.
I don't know if that really makes sense, but here is the currently solution:
1. I use a lookup table to check if the current record has been previously processed.
2. If it hasn't been previously processed, generate key and insert. Else it has been previously processed; use a change capture stage to check if the record has changed.
3. If data has changed update the database.
This works fine as long as the target database and lookup table are in sync. However, Synchronisation is easily broken when I start to receive a new record multiple time. In this situation, it gets inserted into the database the first time but once it is inserted, the lookup table is not out of date. Any time I get that same record again, the lookup table will not recognise that it has already been inserted and therefore require an update instead.
Can I use the target database as the reference link for the lookup? When using spare lookup, will the lookup table recognise the newly inserted records? Does this cause performance issues (reading & writing to the same database table)?
Are there any alternative ways of looking at this? I would prefer to avoid staging followed by post processing of the incremental updates before doing a final flushing the changes to production. A sort and removed duplicates system is also out of the question due the nature of our data source.
Any input would be highly valued,
Barry
SCD Type 1 - Removing duplicate deliveries
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 147
- Joined: Sat Apr 30, 2005 1:23 am
- Location: Bangalore,India
yes you can use target database as reference lookup.
If you use sparse lookup then thelookup database definetly recognises the newly inserted record.
Reading and writing to the same database table won't be a problem.
But when you use a sparse lookup you get performance issues if your driving and reference data ratio is more than 1:100.
If you use sparse lookup then thelookup database definetly recognises the newly inserted record.
Reading and writing to the same database table won't be a problem.
But when you use a sparse lookup you get performance issues if your driving and reference data ratio is more than 1:100.
Thanks for your reply, that helps ALOT! :D
What do you mean by this ratio? Are you saying that if the size of the reference data is large compared to the driving data then there will be performance issues? Or are you saying the opposite? In my case, the driving data is in the millions whereas the references data is less than 2000 and rarely needs updating.But when you use a sparse lookup you get performance issues if your driving and reference data ratio is more than 1:100
-
- Participant
- Posts: 222
- Joined: Tue Aug 30, 2005 2:07 am
- Location: pune
- Contact:
Hi,
For scenarios where the number of input rows is significantly smaller
(1:100 or more) than the number of reference rows in a DB2 or Oracle
table, a Sparse Lookup may be appropriate.
I think your condition is opposite, in your case Driving rows are huge and refrence row are less. So, It will definetely hit the performance.
For scenarios where the number of input rows is significantly smaller
(1:100 or more) than the number of reference rows in a DB2 or Oracle
table, a Sparse Lookup may be appropriate.
I think your condition is opposite, in your case Driving rows are huge and refrence row are less. So, It will definetely hit the performance.
NageshSunkoji
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
-
- Participant
- Posts: 147
- Joined: Sat Apr 30, 2005 1:23 am
- Location: Bangalore,India