Another PIVOT/TRANSPOSE question

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Offshored2002
Participant
Posts: 17
Joined: Wed Apr 16, 2008 6:39 am
Location: Arlington, VA

Another PIVOT/TRANSPOSE question

Post by Offshored2002 »

I have an XML file that needs to be transposed:
Source:
Same Date, Unknown number of Group_Type, Group_Num, approx 2280 records per as shown below. There will always be 13 "stat" columns that will have values or be null (Field_Name is a character field):

Code: Select all

Date Group_Type Group_Num Field_Name Stat1  Stat2 .. Stat13
9/07    A             1       Field1  100     350     9999
9/07    A             1       Field2  200     500     8888
.
.
.
9/07    A             1       Field2280 45   6000    19999
9/07    A             2       Field1      .............................
Our DB2 table, however, wants the layout as follows, where the column name is the value of the "Field_Name" and the Value of the derived field
Stat_Typ is the name(s) of the statistic columns, and would be a character field:

Code: Select all

Date Group_Type Group_Num Stat_Type Field1   Field2 ... Field2280
9/07    A             1      Stat1     100     200        45
9/07    A             1      Stat2     350     500      6000
9/07    A             1      Stat13   9999    8888     19999
So for a given set of date, group type, group number, 2280 rows x 17 columns would become 17 rows by 2284 columns (2280 fields and 4 key columns). Nothing is to be summed, averaged and so on. It is a pure transposition of data. I have looked thru some of the other posts on vertical pivots and such and can't seem to find an answer. I realize there may be performance issues as well.

Can an aggregrator stage do this? The pivot stage always seems to create even more columns, so I don't think this is an option.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

You should be able to get what you are trying to achieve by doing a horizontal pivot followed by a vertical pivot.

Once your xml is in the format you describe you would do the following:
1. Insert a transformer where you pass all the original columns, but you would change the derivation for the Stat1...Stat13 values to:

Code: Select all

DSLink.FieldName:',Stat1,':DSLink.Stat1
2. Put in a pivot stage. You would have only 3 output columns:

Code: Select all

Columns Name  Derivation
Group_Type    
Group_Num
Field         Stat1, Stat2, etc ... Stat13
3. Put in a column import stage

Output columns:
Group Type
Group_Num
FieldName
Stat_type
FieldNum

Format: Field Defaults:
Delimiter = comma
quote = none

Stage Properties:
Import Input Column = Field
Column Method Explict
Column to Import = FieldName
Column to Import = Stat_type
Column to Import = FieldNum

Input Tab:

Code: Select all

Partition Type = Hash; Perform Sort = Yes
Keys         Usage 
Group_Type   Sorting, Partitioning
Group_Num    Sorting, Partitioning
Field        Sorting
4. Put in a Sort stage

Add output column keyChange

Code: Select all

Sorting Keys  Mode
Group_Type    Don't Sort
Group_Num     Don't Sort
Stat_Type     Sort
Create Key Change Column = True
5. Add a transform with the following stage variables
svMaxColumns = The Maximum number of columns
svCount = If lnkSort.keyChange then 1 else svCount+1
svBuildColumns = If lnkSort.keyChange then lnkSort.FieldNum else svBuildColumns :',': lnkSort.FieldNum
svOutput = svBuildColumns : str(',', svMaxColumns - svCount)

Output columns:

Code: Select all

Column Name     Derivation
Group Type      lnkSort.GroupType
Group Num       lnkSort.Group_Num
Stat_type       lnkSort.Stat_Type
Value           svOutput
This will give a comma delimited output of all the columns up svMaxColumns

6. Add Remove Duplicates
Key by Group_Type, Group_Num, Stat_Type
Duplicate to Retain = Last

7. You can then output to a sequential file which will be comma delimited, no quotes. This should give you a file in the format you want. You can then create another job to read that file with the correct metadata for load into your database.
It might be possible to use a transformer here and replace the comma delimiter with @FM (Char(254)?) to keep it within the same job but I am not sure what would be required.

There is probably an easier way to implement this - Maybe using arrays, but that would take somebody smarter than me.
Offshored2002
Participant
Posts: 17
Joined: Wed Apr 16, 2008 6:39 am
Location: Arlington, VA

Post by Offshored2002 »

I will look into this. Too bad DS doesn't have a transpose stage, this is a one step process in SAS with a PROC TRANSPOSE (as of now, that plug in is not installed here, though I am considering asking for it).

If any of the other DS gurus have any ideas, they would be greatly appreciated.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Roll your own" transpose functionality as a Build stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply