Hashed File Vs Routine...Performance Question ?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Hashed File Vs Routine...Performance Question ?

Post by kaps »

Guys

I would like to know which one will perform better for the below given scenorio.

Say I have a file with 500,000 records and each record has 10 columns.
Out of the 10 columns 4 columns need to be validated.
I have a hashed file with one record for each coulmn validation rule.

Scenario 1
----------

My DS Job is like this...I have four links from Hashed file as there are 4 fields need to be validated.

Code: Select all

	         Hashed File
	             | | | |	
	             | | | |
	             | | | |
seq file------->Transformer--------->Seq file
Scenario 2
----------

Code: Select all

seq file------->Transformer--------->Seq file
Here instead of doing a look up in a hashed file, I use a routine to do the same(passing value to routine and get the value back).

My question is which approach will be fatser ?

Thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The routine will be faster.
but
How are you planning on getting your 500,000 x 11 [10 cols + 1 key] pieces of information into your routine? If your routine does a hashed file lookup to get this information then your scenario 1 is suddenly the better one, since you can pre-load the data to memory and even use just one memory image with system-level caching if you enable that.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why not build both and publish your results?

How good are you at writing efficient DataStage BASIC routines?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

This query has a certain academic interest, but surely coding your own lookups in BASIC defeats the purpose of spending AUD$100K on an ETL tool.
Ross Leishman
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

Thanks for the replies.
Actually we are trying to come up with a best approach to cleanse or translate our data. I was suggesting the first approach menthioned in my first post but there was an argument for sending the whole record to a routine and that routine will lookup to the hashed file to get the translated value for the fields which need to be validated if there is a record in hashed file or just pass the input value if it does not have mataching record in hashed file. This can be dynamic means if we change another rule to the rule table which will be picked up automatically ! But the process is going to spend lot of time doing lookup for the column which does not need to be validates.

Ofcourse as relishman said ETL Tools are to reduce procedural language type of coding which hides the meta data of the process !

Thanks
Post Reply