UV secondary indexes

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Emma,

When I ran into this need, I created an extra field on the hash file that was a "composite" index, i.e. it contained the contents of both "key" fields, concatenated together and separated by some character (I used a tilde). So creating the field looked something like this:

Field1:"~":Field2

Then I created the secondary index on this new field (I called it Key1).

Then I would do the same thing when composing the key for this field in the lookup.

Hope this helps,
Tony
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Emma

What are your trying to do? You can always combine two columns in an I-descriptor and create an index off of that. You need to use this new field in your lookup and not the two separate fields. You can add this new field by using the editor. Here is how it would look. You need to compile the dictionary item afterwards.

At TCL:

ED DICT MyHashFile CombinedFieldName
1: I
2: Field1 : @TM : Field2
3:
4: CombinedFieldName
5: 40L
6: S

Save it and:

CD MyHashFile CombinedFieldName

This will compile it. Then CREATE.INDEX. I put a @TM between the 2 fields. It does not have to be there.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

CREATE INDEX (the SQL variant) will create an index on multiple fields, but you have to name them both in the statement. You can also specify the sort order for each of the indexed columns' values. For example:

CREATE INDEX indexname ON tablename (colname1 ASC, colname2 DESC);

However, for this to work, the hashed file will need to have been created as a table (using a CREATE TABLE statement rather than CREATE.FILE or mkdbfile). That is, it can not have been created using a Hashed File stage; it would need to have been created using a UV stage.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
datastagedummy
Participant
Posts: 56
Joined: Thu Feb 13, 2003 6:08 pm
Location: USA

Post by datastagedummy »

This is not a reply but a follow up question.

I have a job in which I look up a UniverseStage which is being referenced with multi row result set and this job takes forever to run so what I did was I added an INDEX on the table at the before job subroutine
ExecTCL=CREATE INDEX TMPINDEX ON SAP_SALES_BOM_OUTPUT (MATERIAL_NBR, PLANT_ID) ;

This did improve the performance to 1 row/second [:(] from 0 rows/sec

Do you guys have any suggestions on how I can improve the performance by using indexes or anything else.

Thanks
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Dummy

I sure would not build the index in the before routine. I would do it in a separate job or in the batch job or sequence job. The before routines are not really run before the job starts. They are sort of run at the same time as all the files or tables are opened. I am not sure that your index is ever used. Try creating it before the job and let us know what happens. A one to many lookup is a complicated thing to pull off. This is a new feature. I expect it to improve performance in new releases. Call it into support. You may cause it to be faster in new releases.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
datastagedummy
Participant
Posts: 56
Joined: Thu Feb 13, 2003 6:08 pm
Location: USA

Post by datastagedummy »

Tried to change from Pre Job to a new job that runs before the job in question as suggested but it still is one row per second.

[V]
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Yes, stop doing relational work on a storage structure optimized for 1-to-1 reference lookups.

Why are you doing multi-row lookups? Please state your functional requirements. If you're doing a ranging (BETWEEN) style select, there are ways to construct your lookup so as to give you the optimized reference lookup design for your hash file.

Thanks

Kenneth Bland
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Ken is right. Hash files are not the best choice here. Explain why you need this functionality and we can offer a better solution.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Sorry Emma, I was responding to "datastagedummy", who was usurping your message forum.

Kenneth Bland
datastagedummy
Participant
Posts: 56
Joined: Thu Feb 13, 2003 6:08 pm
Location: USA

Post by datastagedummy »

I am very Sorry Emma for using your forum.

Do I have your permission or should I open a new one for myself as my problem was INDEX related.

Sorry once again
datastagedummy
Participant
Posts: 56
Joined: Thu Feb 13, 2003 6:08 pm
Location: USA

Post by datastagedummy »

My situation is that I have a COMPANY_CODE field(input) which returns me multiple PLANT_ID (from hash lookup and UVStage) and then I have to create an output of n rows for that particular COMPANY_CODE is there any better way of doing this.

eg :
Input :
CO_CODE = 110

Hash File/UV Lookup

Reference:
CO_CODE PLANT_ID
110 1030
110 1045
110 3090

Output :
MATERIAL_NO CO_CODE PLANT_ID
ABCDEF 110 1030
ABCDEF 110 1045
ABCDEF 110 3090

Any help that would get me up from the present speed of 6 rows/sec would be appreciated.

Thanks
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

Post by spracht »

Hi

is the job logic and the structure of the hash file the same as originally posted? If so, you would need a secondary index on CO_CODE rather than MATERIAL_NO, PLANT_ID? Where is your stream input coming from? If it's a hash file or uv table as well, you could interchange stream and reference. Or join the two tables within a uv stage.

Stephan
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

Post by spracht »

Dummy, (I really hesitated to call you like that), I hope you don't mind that I ask Ken on the range (between) style select within this topic?

Ken

I had a couple of situations when I had to do such a range style lookup and couldn't help but to do a 'multi row result set' reference and subsequenly excluding the records not wanted using a constraint. I would be very glad to hear how an optimized lookup design could look like.

Thanks in advance

Stephan
Post Reply