Multi valued Column in Hash File
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 6
- Joined: Tue Jul 13, 2004 7:40 am
Multi valued Column in Hash File
Hi,
Can anyone suggest me how to create a multi-valued colmn in a hash file ?
Column should display all the values separated by comma for a particular Key.
for example : input file with three columns c1,c2 & c3
c1-- c2 -- c3
NE-- 20-- 1
EX-- 30 -- 2
NE-- 40-- 3
EX-- 45-- 4
EX-- 50-- 5
And Hash file should show the result as: (c1 is my Key)
c1 -- c2 --------------c3
NE -- 20, 40 -------1, 3
EX -- 30 , 45, 50 --2,4,5
Thanks in advance,
Somesh
Can anyone suggest me how to create a multi-valued colmn in a hash file ?
Column should display all the values separated by comma for a particular Key.
for example : input file with three columns c1,c2 & c3
c1-- c2 -- c3
NE-- 20-- 1
EX-- 30 -- 2
NE-- 40-- 3
EX-- 45-- 4
EX-- 50-- 5
And Hash file should show the result as: (c1 is my Key)
c1 -- c2 --------------c3
NE -- 20, 40 -------1, 3
EX -- 30 , 45, 50 --2,4,5
Thanks in advance,
Somesh
Hash file writes are destructive, so having said that I have approached your issue in a variety of ways. One method I have employed was to use the same type of logic used for SCDs with a twist.
You would use your hash file as both a target and reference. Create stage variables to hold the contents of the columns (EX and NE). Perform a lookup on each incoming row of data and use logic within the stage variables to append the incoming EX and NE to an already existing value or to set the value of the stage variable to the incoming NE and EX if no lookup is found.
This still performs destructive writes, but each write will be building the record you want.
Another way to solve this without a reference lookup would be to utilize a DS routine in the transform that expects a key and whatever fields you are building and returns a string(s) suitable for writing to the hash as value (253) delimited strings.
Good luck
Regards,
You would use your hash file as both a target and reference. Create stage variables to hold the contents of the columns (EX and NE). Perform a lookup on each incoming row of data and use logic within the stage variables to append the incoming EX and NE to an already existing value or to set the value of the stage variable to the incoming NE and EX if no lookup is found.
This still performs destructive writes, but each write will be building the record you want.
Another way to solve this without a reference lookup would be to utilize a DS routine in the transform that expects a key and whatever fields you are building and returns a string(s) suitable for writing to the hash as value (253) delimited strings.
Good luck
Regards,
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard! :D
How you create multi-valued columns depends on how you create the hashed file.
If you're using a Hashed File or UV stage to create the hashed file, specify M in the Type column in the Columns grid.
If you're using a CREATE TABLE statement, you specify that the column is MULTIVALUED within its column definition.
If you're using a CREATE.FILE or mkdbfile command, you specify when building the file dictionary.
Populate the hashed file normalized on the multi-valued column.
How you create multi-valued columns depends on how you create the hashed file.
If you're using a Hashed File or UV stage to create the hashed file, specify M in the Type column in the Columns grid.
If you're using a CREATE TABLE statement, you specify that the column is MULTIVALUED within its column definition.
If you're using a CREATE.FILE or mkdbfile command, you specify when building the file dictionary.
Populate the hashed file normalized on the multi-valued column.
Last edited by ray.wurlod on Tue Aug 31, 2004 9:35 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
My original post was more slanted to their post of comma separated values, hence the reply that I gave.
Regards,
Regards,
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
I want to create a hash file stage for multivalued columns,but not able to see the option available for specifying the "M" or MULTIVALUED ....If you're using a Hashed File or UV stage to create the hashed file, specify M in the Type column in the Columns grid.
in hash file stage, there is no Type column. There is only one SQL Type Column Grid available for specifying the type of the column.
Please suggest how to create a multi-valued hash file using Hash File Stage in Datastage Version 7X
----------------
Rgds,
Anupam
----------------
The future is not something we enter. The future is something we create.
Rgds,
Anupam
----------------
The future is not something we enter. The future is something we create.
Anupam,
I do not believe DS has the ability (natively) to write multi-valued fields to a UV file - that's why I gave you my suggestion. I believe that in order to do this DS would have to read and write every record (output) so that a destructive write does not take place.
If the ability is there I have never used it nor am I aware of it.
Regards,
I do not believe DS has the ability (natively) to write multi-valued fields to a UV file - that's why I gave you my suggestion. I believe that in order to do this DS would have to read and write every record (output) so that a destructive write does not take place.
If the ability is there I have never used it nor am I aware of it.
Regards,
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
Michael
It can be done. I just wrote a job which takes a source. It writes to a hash file. It does a lookup on the same hash file. If the lookup was there then the output column was lookup.col1 :@VM: In.col1. You can aggregate to multivalues to a hash file.
UvStage
SELECT DS_JOBS.NAME, DS_JOBS.CATEGORY FROM DS_JOBS WHERE NAME NOT LIKE '\%' ORDER BY CATEGORY, NAME;
Trans
if IsNull(LkpEtlJobCatHash.JOB_NAME) then DsJobLinks.JOB_NAME else LkpEtlJobCatHash.JOB_NAME:@VM:DsJobLinks.JOB_NAME
Try it. It works great. This would be awesome on dates.
Now you need to create new dictionary items to associate the multivalues on the hash file just created.
It can be done. I just wrote a job which takes a source. It writes to a hash file. It does a lookup on the same hash file. If the lookup was there then the output column was lookup.col1 :@VM: In.col1. You can aggregate to multivalues to a hash file.
Code: Select all
JobsByCatHash
|
v
UvStage -> Trans -> JobsByCatHash
SELECT DS_JOBS.NAME, DS_JOBS.CATEGORY FROM DS_JOBS WHERE NAME NOT LIKE '\%' ORDER BY CATEGORY, NAME;
Trans
if IsNull(LkpEtlJobCatHash.JOB_NAME) then DsJobLinks.JOB_NAME else LkpEtlJobCatHash.JOB_NAME:@VM:DsJobLinks.JOB_NAME
Try it. It works great. This would be awesome on dates.
Now you need to create new dictionary items to associate the multivalues on the hash file just created.
Mamu Kim
-
- Premium Member
- Posts: 483
- Joined: Thu Jun 12, 2003 4:47 pm
- Location: St. Louis, Missouri USA
-
- Premium Member
- Posts: 483
- Joined: Thu Jun 12, 2003 4:47 pm
- Location: St. Louis, Missouri USA
Code: Select all
JobsByCatHash
|
v
UvStage -> Trans -> JobsByCatHash
Code: Select all
If you're using a Hashed File or UV stage to create the hashed file, specify M in the Type column in the Columns grid.
----------------
Rgds,
Anupam
----------------
The future is not something we enter. The future is something we create.
Rgds,
Anupam
----------------
The future is not something we enter. The future is something we create.
That is the trick. In the hash file you do not treat this field as multivalued. You do so only in a UV stage. This allows one job to create the multivalues and another job to use them.
Most of the time I would only use this in a lookup. I would write a routine to do it. If you look at Ken Bland's SCD type 2 routine to locate a value by date. Then you no longer need the rollup routine. You can do that in a job, Also the convert '|' to @VM goes away in the other routine. The @VM are already there. Look at his routine, this is what I would create to use this new hash file.
Most of the time I would only use this in a lookup. I would write a routine to do it. If you look at Ken Bland's SCD type 2 routine to locate a value by date. Then you no longer need the rollup routine. You can do that in a job, Also the convert '|' to @VM goes away in the other routine. The @VM are already there. Look at his routine, this is what I would create to use this new hash file.
Mamu Kim
Thanks for all of the great info. We are just staring the construction stage of an Enterprise DW and these posts are invaluable.
I am testing this as a solution for Type II Slowly Changing Dimension surrogate key lookups (in conjunction with Mr. Bland's routine) and have one question...
Is there a more efficient way of doing this? My test seems to be quite slow. I have the lookup hash configured with caching "Enabled, Lock For Updates".
My dimension has about 3M distinct natural key values, with about 9M records overall (average of three Type II changes per entity).
I am getting about 180 rows/sec. As you can imagine, this would take several hours to complete for an initial load.
One option I've researched is creating the multivalue strings in the database. I created a table in the dimension table's database using PL/SQL that accomplishes the same result and is MUCH faster (3000 rows/sec) which I subsequently dumped to a hash file for lookup (using the routine Mr. Bland supplied).
I'd like to keep it all in DataStage if possible.
Any ideas or configuration suggestions to speed this up?
Thanks!
I am testing this as a solution for Type II Slowly Changing Dimension surrogate key lookups (in conjunction with Mr. Bland's routine) and have one question...
Is there a more efficient way of doing this? My test seems to be quite slow. I have the lookup hash configured with caching "Enabled, Lock For Updates".
My dimension has about 3M distinct natural key values, with about 9M records overall (average of three Type II changes per entity).
I am getting about 180 rows/sec. As you can imagine, this would take several hours to complete for an initial load.
One option I've researched is creating the multivalue strings in the database. I created a table in the dimension table's database using PL/SQL that accomplishes the same result and is MUCH faster (3000 rows/sec) which I subsequently dumped to a hash file for lookup (using the routine Mr. Bland supplied).
I'd like to keep it all in DataStage if possible.
Any ideas or configuration suggestions to speed this up?
Thanks!
I think it is valid to do some of this in PLSQL if that speeds things up. I think doing it in a BEFORE SQL may make it look like it is all done in DataStage. I think most of us have used temp tables created with SQL to get around some performance issue. Maybe even Ray has used this trick.
I do not like to use aggregator stages. I usually do this in SQL or one of those sorting tools like CoSort. Sometimes going around DataStage is worth it in performance. I try to limit this type of thing.
I do not like to use aggregator stages. I usually do this in SQL or one of those sorting tools like CoSort. Sometimes going around DataStage is worth it in performance. I try to limit this type of thing.
Mamu Kim