Design for real time reference data update.

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
raghav_ds
Premium Member
Premium Member
Posts: 40
Joined: Wed May 04, 2011 2:21 am

Design for real time reference data update.

Post by raghav_ds »

We have created a webservice for customer search (both deterministic and probabilistic matches). The customer will be searched on a reference data of 200 million records.

Currently we have designed the webservice for static reference data which needs to be updated for realtime scenario.

Current design:
Static data is loaded into a lookup fileset and also frequency dataset will be created at the start of the day. We have created one job and exposed as a webservice, which expects an input message for customer and the customer details in the input message are used for searching the data on the lookup file set. This is working as expected.

Next steps:
Now we want to update the reference data in real time. The reference data stored in a database will be provided in a file at the start of the day. After that the real time updates on the database (inserts and updates) are provided through an MQ.

Now we should make the real time updates availble to the webservice. We tried to appened the real time data to lookup file set. But there is no option for append mode. And even if we appened, we cannot update the existing the lookup fileset.

Second option we tried is using a dataset. We were able to append the data in real time but cannot update the dataset.

Third option, we thought is of using a database. But if we use a database it will affect the performance of the webservice as we need to read the entire data into DataStage for probabilistic matching (match specifications in QualityStage).

Could you please provide your inputs on solving this real time update issue.

Please let me know if you need any further information.

Thanks
Raghav
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Sparse lookup is probably the only easy way (assuming you aren't crafting your own in-memory objects using Java or some other technology)....so that each time you commit a new entry to the database, it will be found by your lookup stage... of course, that means you have to return to the database for each lookup, but that's the goal, right?...

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
raghav_ds
Premium Member
Premium Member
Posts: 40
Joined: Wed May 04, 2011 2:21 am

Post by raghav_ds »

Ernie,

Sparse lookup option may not be a good option for us as we need to hit the database for each web-service invocation.

Could you please provide some input on " crafting your own in-memory objects using Java or some other technology".

Thanks
Raghav
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Design for real time reference data update.

Post by SURA »

sorry craig
Last edited by SURA on Wed May 25, 2011 10:39 pm, edited 1 time in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

SURA, why in the blue blazes do you continue to feel the need to quote an entire post just to reply? Learn to use the perfectly lovely Reply to topic link please. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
raghav_ds
Premium Member
Premium Member
Posts: 40
Joined: Wed May 04, 2011 2:21 am

Post by raghav_ds »

SURA,
I am not able to see your reply. If you have added some information in the thread, could you please update it again.

Thanks
Raghav
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

SURA, I appreciate you cleaning up your post as I don't have admin privs in this forum but you should have left your actual reply in place. Please edit your post again and add that back in.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

What was meant by that is "use your creative juices"....... If you have your own methods of providing in memory tables and lookups, and are an expert in using Java or C++ and building/manipulating and managing such structures, go for it. DataStage is very extensible.

But the code will probably be difficult and need lots of expertise and lots of testing and lots of justification. Probably not worth it in most cases, but I don't know the specifics of your case.

In most situations, fine tune your rdbms, use Sparse lookups and measure your performance impact.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Design for real time reference data update.

Post by SURA »

Hi Raghav

Couple of questions.

1. How much changes will you expect a day?

1a. How frequent?

2. What will be the hit? (To query the data)?

3. Growing Datavolume / day?

4. which DB you are using?

DS User
raghav_ds
Premium Member
Premium Member
Posts: 40
Joined: Wed May 04, 2011 2:21 am

Post by raghav_ds »

Ernie,

Thank you for your explanation. I will try with database and sparse lookups.

SURA,

Please find answers for your questions.

1. How much changes will you expect a day?

The web service would be used at enterprise level and we are having huge number of users who will be invoking it.

1a. How frequent?

The web service should be available for 20 hours in a day. And we have multiple users who will be invoking it from different places. For example consider 10 invocations per minute.

2. What will be the hit? (To query the data)?

Same as 1a. 10 per minute.

3. Growing Datavolume / day?

Not sure exactly. But consider some 2000 new records per day for example.

4. which DB you are using?

db2
Post Reply