Denormalizing Data

phillip.small · Post by **phillip.small** » Thu Sep 30, 2004 7:46 am

Hello

I have two input tables with 3 keys. I am taking some fields from both tables and putting them into another table. Based on the two input tables, I have a phone type field with values either "BUSINESS" or "HOME" and this indicates which type if the phone number. The tables also have a status column with either "A" or "I" (Active or Inactive) as the statuses. Therefore, the error correctly states that there are two rows per PK. In the target, I changed the phone column and recreated the table with BUSN_PHONE and HOME_PHONE and did not create a phone_type column. If I run the load and populate the BUSN_PHONE, how do I loop to load the HOME_PHONE column on the same record?

input table1 ---- input table2 ---->Transformer ----> target table

kcbland · Post by **kcbland** » Thu Sep 30, 2004 7:53 am

Your solution is not a loop, it is a UNION in your SQL. You need to collapse two rows into one. Imagine SQL like this:

Code: Select all

select key, max(bus_phone), max(home_phone) from
   (
   select key, NULL "BUS_PHONE", home_phone from table A
   union all
   select key, bus_phone, NULL "HOME_PHONE" from table B
   )
group by key

This solution provides a FULL OUTER JOIN, meaning all rows from table A and all rows from table B will make it into your target. Using one table as a reference DOESN'T WORK unless you do a pass thru both tables as a source, using the other as a reference each time. This SQL is very elegant.

chulett · Post by **chulett** » Thu Sep 30, 2004 8:17 am

Just as an FYI for those in our studio audience using Oracle as their database of choice, there is actual "Full Outer Join" syntax available in 9i:

Code: Select all

SELECT a.x, b.x
FROM a FULL OUTER JOIN b
ON (a.y = b.y)

phillip.small · Post by **phillip.small** » Thu Sep 30, 2004 9:12 am

chulett wrote:Just as an FYI for those in our studio audience using Oracle as their database of choice, there is actual "Full Outer Join" syntax available in 9i:
Code: Select all
SELECT a.x, b.x
FROM a FULL OUTER JOIN b
ON (a.y = b.y)

DB2 is my DB env. in DataStage. How do you implement this in DataStage?

kcbland · Post by **kcbland** » Thu Sep 30, 2004 9:39 am

Doing it in DataStage is ugly, that's why I gave you an "elegant" SQL solution. Write source table "A" into a hash file, leaving the B phone number column blank. Read source table "B" and reference the hash file, if the row is there, add the B phone number to the B phone number column and write out the complete row. If the row is not in the hash file, create it but leave the "A" phone number blank. Then, dump out all of the rows from the hash file.

phillip.small · Post by **phillip.small** » Thu Sep 30, 2004 11:21 am

kcbland wrote:Doing it in DataStage is ugly, that's why I gave you an "elegant" SQL solution. Write source table "A" into a hash file, leaving the B phone number column blank. Read source table "B" and reference the hash file, if the row is there, add the B phone number to the B phone number column and write out the complete row. If the row is not in the hash file, create it but leave the "A" phone number blank. Then, dump out all of the rows from the hash file.

When you say Hash file? I thought you meant doing in DataStage? Outside of Ascential, what do you mean by Hash file? Are you referring to DB2 command line or another tool that also references Hash files? Sorry but I am new to this and I am only familar with that term for Ascential.

kcbland · Post by **kcbland** » Thu Sep 30, 2004 12:02 pm

I'm talking about a DataStage Server job utilizing DataStage Server hash files. You are posting questions in the Server forum, so I'm giving you a Server solution.

If you want a Parallel solution, I'd tell you to a different solution.

ray.wurlod · Post by **ray.wurlod** » Thu Sep 30, 2004 4:56 pm

DB2 is my DB env. in DataStage. How do you implement this in DataStage?

You use user-defined SQL in the stage type that is extracting the data from DB2.