Page 1 of 1

Design for real time reference data update.

Posted: Wed May 25, 2011 1:15 pm
by raghav_ds
We have created a webservice for customer search (both deterministic and probabilistic matches). The customer will be searched on a reference data of 200 million records.

Currently we have designed the webservice for static reference data which needs to be updated for realtime scenario.

Current design:
Static data is loaded into a lookup fileset and also frequency dataset will be created at the start of the day. We have created one job and exposed as a webservice, which expects an input message for customer and the customer details in the input message are used for searching the data on the lookup file set. This is working as expected.

Next steps:
Now we want to update the reference data in real time. The reference data stored in a database will be provided in a file at the start of the day. After that the real time updates on the database (inserts and updates) are provided through an MQ.

Now we should make the real time updates availble to the webservice. We tried to appened the real time data to lookup file set. But there is no option for append mode. And even if we appened, we cannot update the existing the lookup fileset.

Second option we tried is using a dataset. We were able to append the data in real time but cannot update the dataset.

Third option, we thought is of using a database. But if we use a database it will affect the performance of the webservice as we need to read the entire data into DataStage for probabilistic matching (match specifications in QualityStage).

Could you please provide your inputs on solving this real time update issue.

Please let me know if you need any further information.

Thanks
Raghav

Posted: Wed May 25, 2011 4:51 pm
by eostic
Sparse lookup is probably the only easy way (assuming you aren't crafting your own in-memory objects using Java or some other technology)....so that each time you commit a new entry to the database, it will be found by your lookup stage... of course, that means you have to return to the database for each lookup, but that's the goal, right?...

Ernie

Posted: Wed May 25, 2011 9:35 pm
by raghav_ds
Ernie,

Sparse lookup option may not be a good option for us as we need to hit the database for each web-service invocation.

Could you please provide some input on " crafting your own in-memory objects using Java or some other technology".

Thanks
Raghav

Re: Design for real time reference data update.

Posted: Wed May 25, 2011 10:08 pm
by SURA
sorry craig

Posted: Wed May 25, 2011 10:33 pm
by chulett
SURA, why in the blue blazes do you continue to feel the need to quote an entire post just to reply? Learn to use the perfectly lovely Reply to topic link please. :?

Posted: Thu May 26, 2011 3:49 am
by raghav_ds
SURA,
I am not able to see your reply. If you have added some information in the thread, could you please update it again.

Thanks
Raghav

Posted: Thu May 26, 2011 7:15 am
by chulett
SURA, I appreciate you cleaning up your post as I don't have admin privs in this forum but you should have left your actual reply in place. Please edit your post again and add that back in.

Posted: Thu May 26, 2011 12:30 pm
by eostic
What was meant by that is "use your creative juices"....... If you have your own methods of providing in memory tables and lookups, and are an expert in using Java or C++ and building/manipulating and managing such structures, go for it. DataStage is very extensible.

But the code will probably be difficult and need lots of expertise and lots of testing and lots of justification. Probably not worth it in most cases, but I don't know the specifics of your case.

In most situations, fine tune your rdbms, use Sparse lookups and measure your performance impact.

Ernie

Re: Design for real time reference data update.

Posted: Thu May 26, 2011 8:26 pm
by SURA
Hi Raghav

Couple of questions.

1. How much changes will you expect a day?

1a. How frequent?

2. What will be the hit? (To query the data)?

3. Growing Datavolume / day?

4. which DB you are using?

DS User

Posted: Fri May 27, 2011 3:23 am
by raghav_ds
Ernie,

Thank you for your explanation. I will try with database and sparse lookups.

SURA,

Please find answers for your questions.

1. How much changes will you expect a day?

The web service would be used at enterprise level and we are having huge number of users who will be invoking it.

1a. How frequent?

The web service should be available for 20 hours in a day. And we have multiple users who will be invoking it from different places. For example consider 10 invocations per minute.

2. What will be the hit? (To query the data)?

Same as 1a. 10 per minute.

3. Growing Datavolume / day?

Not sure exactly. But consider some 2000 new records per day for example.

4. which DB you are using?

db2