Another PIVOT/TRANSPOSE question

Offshored2002 · Post by **Offshored2002** » Wed May 14, 2008 10:04 am

I have an XML file that needs to be transposed:
Source:
Same Date, Unknown number of Group_Type, Group_Num, approx 2280 records per as shown below. There will always be 13 "stat" columns that will have values or be null (Field_Name is a character field):

Code: Select all

Date Group_Type Group_Num Field_Name Stat1  Stat2 .. Stat13
9/07    A             1       Field1  100     350     9999
9/07    A             1       Field2  200     500     8888
.
.
.
9/07    A             1       Field2280 45   6000    19999
9/07    A             2       Field1      .............................

Our DB2 table, however, wants the layout as follows, where the column name is the value of the "Field_Name" and the Value of the derived field
Stat_Typ is the name(s) of the statistic columns, and would be a character field:

Code: Select all

Date Group_Type Group_Num Stat_Type Field1   Field2 ... Field2280
9/07    A             1      Stat1     100     200        45
9/07    A             1      Stat2     350     500      6000
9/07    A             1      Stat13   9999    8888     19999

So for a given set of date, group type, group number, 2280 rows x 17 columns would become 17 rows by 2284 columns (2280 fields and 4 key columns). Nothing is to be summed, averaged and so on. It is a pure transposition of data. I have looked thru some of the other posts on vertical pivots and such and can't seem to find an answer. I realize there may be performance issues as well.

Can an aggregrator stage do this? The pivot stage always seems to create even more columns, so I don't think this is an option.

ShaneMuir · Post by **ShaneMuir** » Thu May 15, 2008 1:45 am

You should be able to get what you are trying to achieve by doing a horizontal pivot followed by a vertical pivot.

Once your xml is in the format you describe you would do the following:
1. Insert a transformer where you pass all the original columns, but you would change the derivation for the Stat1...Stat13 values to:

Code: Select all

DSLink.FieldName:',Stat1,':DSLink.Stat1

2. Put in a pivot stage. You would have only 3 output columns:

Code: Select all

Columns Name  Derivation
Group_Type    
Group_Num
Field         Stat1, Stat2, etc ... Stat13

3. Put in a column import stage

Output columns:
Group Type
Group_Num
FieldName
Stat_type
FieldNum

Format: Field Defaults:
Delimiter = comma
quote = none

Stage Properties:
Import Input Column = Field
Column Method Explict
Column to Import = FieldName
Column to Import = Stat_type
Column to Import = FieldNum

Input Tab:

Code: Select all

Partition Type = Hash; Perform Sort = Yes
Keys         Usage 
Group_Type   Sorting, Partitioning
Group_Num    Sorting, Partitioning
Field        Sorting

4. Put in a Sort stage

Add output column keyChange

Code: Select all

Sorting Keys  Mode
Group_Type    Don't Sort
Group_Num     Don't Sort
Stat_Type     Sort
Create Key Change Column = True

5. Add a transform with the following stage variables
svMaxColumns = The Maximum number of columns
svCount = If lnkSort.keyChange then 1 else svCount+1
svBuildColumns = If lnkSort.keyChange then lnkSort.FieldNum else svBuildColumns :',': lnkSort.FieldNum
svOutput = svBuildColumns : str(',', svMaxColumns - svCount)

Output columns:

Code: Select all

Column Name     Derivation
Group Type      lnkSort.GroupType
Group Num       lnkSort.Group_Num
Stat_type       lnkSort.Stat_Type
Value           svOutput

This will give a comma delimited output of all the columns up svMaxColumns

6. Add Remove Duplicates
Key by Group_Type, Group_Num, Stat_Type
Duplicate to Retain = Last

7. You can then output to a sequential file which will be comma delimited, no quotes. This should give you a file in the format you want. You can then create another job to read that file with the correct metadata for load into your database.
It might be possible to use a transformer here and replace the comma delimiter with @FM (Char(254)?) to keep it within the same job but I am not sure what would be required.

There is probably an easier way to implement this - Maybe using arrays, but that would take somebody smarter than me.

Offshored2002 · Post by **Offshored2002** » Thu May 15, 2008 6:36 am

I will look into this. Too bad DS doesn't have a transpose stage, this is a one step process in SAS with a PROC TRANSPOSE (as of now, that plug in is not installed here, though I am considering asking for it).

If any of the other DS gurus have any ideas, they would be greatly appreciated.

ray.wurlod · Post by **ray.wurlod** » Fri May 16, 2008 6:46 am

"Roll your own" transpose functionality as a Build stage.