Converting one row into multiple rows

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
casedwgroup
Premium Member
Premium Member
Posts: 24
Joined: Mon Aug 20, 2007 1:17 pm

Converting one row into multiple rows

Post by casedwgroup »

It seems to me like this is probably a common problem, so if this has already been discussed here before, please point me in that direction. I spent some time looking around and couldn't find what I was looking for.

I'm working on populating some row level security tables that have 2 key columns: OPRID and EMPLID. Setting the table up for people who exist in the source system is relatively easy. However, I need to be able to add rows for the people on my team who do not exist in the source system. I have a list of IDs that I need to add a row for every EMPLID.

I couldn't figure out a way to do this simply with a pivot stage. My next thought was to just write a query with no join and add a distinct clause to it:

Code: Select all

SELECT DISTINCT B.OPRID,A.EMPLID,TO_CHAR(A.EFFDT, 'YYYY-MM-DD HH24:MI:SS')
FROM TBL_A A, TBL_B B
This works fine for small amounts of data, but TBL_A has over 1 million rows. (TBL_B probably won't grow much larger than 10 rows.) This is obviously a terrible query for large amounts of data, and I need a better solution. Any suggestions would be greatly appreciated.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: Converting one row into multiple rows

Post by gateleys »

casedwgroup wrote:I spent some time looking around and couldn't find what I was looking for.
Really!! How about this? Its actually in the FAQ.

viewtopic.php?t=88639
gateleys
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

What about

Code: Select all

SELECT B.OPRID as MY_ID FROM TABLE_B as B
UNION
SELECT A.EMPLID as MY_ID FROM TABLE_A as A
?
Wolfgang Hürter
Amsterdam
casedwgroup
Premium Member
Premium Member
Posts: 24
Joined: Mon Aug 20, 2007 1:17 pm

Re: Converting one row into multiple rows

Post by casedwgroup »

gateleys wrote: Really!! How about this? Its actually in the FAQ.

viewtopic.php?t=88639
If I understand correctly (and it's entirely possible that I don't), that post was about splitting one row into multiple rows. I'm trying to do something more along the lines of repeating rows and just changing a key column. Let me give a data example of what I am trying to do, where TBL_C is the target table:

Code: Select all

*TBL_A
OPRID	EMPLID     EFFDT
-----	------     -----
ABC     1          01/01
ABC     2          01/01
XYZ     1          01/01
XYZ     2          01/01
XYZ     3          03/01

*TBL_B
OPRID
-----
JKL
MNO

*TBL_C
OPRID	EMPLID     EFFDT
-----	------     -----
ABC     1          01/01
ABC     2          01/01
XYZ     1          01/01
XYZ     2          01/01
XYZ     3          03/01
JKL     1          01/01
JKL     2          01/01
JKL     3          03/01
MNO     1          01/01
MNO     2          01/01
MNO     3          03/01
Hopefully that makes more sense. If there is any additional detail I need to provide, let me know.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: Converting one row into multiple rows

Post by gateleys »

It seems that you want to do a Cartesian join between TBL_B and TBL_A. Assuming that the EFFDT is dependent on EMPLID alone (that is, OPRID has no effect on it, whatsoever), you need to do the following -

1. Create a hashed file with the columns, DummyID, EMPLID and EFFDT, with keys on the first 2 ID columns.
2. Load TBL_A into this hashed file, with the value 1 assigned to DummyID. This eliminates all duplicates of EMPLID.
3. Load this hashed file into a sequential file, say File_1, with DummyID (only) as the key.
4. Load TBL_B into a sequential file, say File_2, with columns DummyID2 and OPRID2. Just as in step 2, assign a value 1 to DummyID2.
5. Use a Merge Stage with these 2 sequential files as input, and with Join Type set to INNER JOIN. In the Mapping tab, make sure you specify the key as DummyID for File_1 and DummyID2 for File_2 so that a Cartesian Join will be performed since all key values are 1.
5. The end result is you have a file -

Code: Select all

DummyID  EMPID  EFFDT  DummyID2  OPRID2
------   ----   -----   -------  -------
1        1       01/01      1      JKL
1        2       01/01      1      JKL
1        3       03/01      1      JKL
1        1       01/01      1      MNO
1        2       01/01      1      MNO
1        3       03/01      1      MNO
You can now, extract the columns OPRID2, EMPID and EFFDT into another file and append it to your TBL_A rows. Voila!!
gateleys
casedwgroup
Premium Member
Premium Member
Posts: 24
Joined: Mon Aug 20, 2007 1:17 pm

Post by casedwgroup »

I'll try that out and get back to you. Thanks for the suggestion!
casedwgroup
Premium Member
Premium Member
Posts: 24
Joined: Mon Aug 20, 2007 1:17 pm

Post by casedwgroup »

That appears to have worked. Thank you very much for the assistance!
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

casedwgroup wrote:That appears to have worked. Thank you very much for the assistance!
Great!

Now, can you share your performance stats with us, with over a million rows in your TAB_A performing a cross join with TAB_B?
gateleys
casedwgroup
Premium Member
Premium Member
Posts: 24
Joined: Mon Aug 20, 2007 1:17 pm

Post by casedwgroup »

Not all of the rows from table A are joined to table B -- just the unique ones, which is less than 12,000. So the performance is pretty good. It took less than 3 minutes to read the 1.2 million rows from the database and write to a hashed file, then a couple seconds to read 12,000 rows from the hashed file and write to a sequential file. The merging of that sequential file with another one with 4 rows in it, and then appending the results to the database, took 13 seconds. Assuming A and B are already loaded, the whole process took less than 4 minutes. The end result only added 48,000 rows though.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Oh! I forgot about the duplicates. Of course. :wink:
gateleys
Post Reply