Transposing rows to columns
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 20
- Joined: Thu Nov 27, 2003 3:45 am
- Location: New Delhi, India
Transposing rows to columns
Hello All,
I have this problem wherein I need to transpose a set of rows into columns thereby making multiple rows in the source to one single row in the target.
My source in an Informix Table and I am using the Informix CLI plugin to get the data out of Informix DB. My target has to be a hash file.
The source data looks something like this:
Col1 Col2 Col3
1 A ValueA
1 B ValueB
1 C ValueC
2 A ValueA
2 B ValueB
3 A ValueA
4 A ValueA
4 B ValueB
4 C ValueC
The target should be something like this:
Col1 Col2 Col3 Col4
1 ValueA ValueB ValueC
2 ValueA ValueB ""
3 ValueA "" ""
4 ValueA ValueB ValueC
I have been scratching my head to use some stage variables and RowProcGetPreviousValue routine, but somehow not getting and breakthru.
Any insights??
Thanks,
Siddharth
I have this problem wherein I need to transpose a set of rows into columns thereby making multiple rows in the source to one single row in the target.
My source in an Informix Table and I am using the Informix CLI plugin to get the data out of Informix DB. My target has to be a hash file.
The source data looks something like this:
Col1 Col2 Col3
1 A ValueA
1 B ValueB
1 C ValueC
2 A ValueA
2 B ValueB
3 A ValueA
4 A ValueA
4 B ValueB
4 C ValueC
The target should be something like this:
Col1 Col2 Col3 Col4
1 ValueA ValueB ValueC
2 ValueA ValueB ""
3 ValueA "" ""
4 ValueA ValueB ValueC
I have been scratching my head to use some stage variables and RowProcGetPreviousValue routine, but somehow not getting and breakthru.
Any insights??
Thanks,
Siddharth
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Try searching the forum for VERTICAL PIVOT, as this is what you are trying to accomplish.
It's not a job for the Pivot stage, which performs horizontal pivots.
It's not a job for the Pivot stage, which performs horizontal pivots.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 20
- Joined: Thu Nov 27, 2003 3:45 am
- Location: New Delhi, India
Hi Ray,
I had already done that search before posting this message. The closest I could get was this.
viewtopic.php?t=85702&highlight=vertica ... 7cc107e5dd
but this did not solve my problem as my input has variable number of rows for a particular key.
Thanks
Siddharth
I had already done that search before posting this message. The closest I could get was this.
viewtopic.php?t=85702&highlight=vertica ... 7cc107e5dd
but this did not solve my problem as my input has variable number of rows for a particular key.
Thanks
Siddharth
-
- Participant
- Posts: 20
- Joined: Thu Nov 27, 2003 3:45 am
- Location: New Delhi, India
-
- Participant
- Posts: 20
- Joined: Thu Nov 27, 2003 3:45 am
- Location: New Delhi, India
Okay, by doing some reordering of stage variables I am through with the above mentioned problem. But have run into another one. The last set of rows is not being processed.
Reading the previous posts, I know the solution to this problem; i.e. adding a dummy last row to the source, but the problem is how do I do that as my source is a transformer and so is my target. The job design has changed since my first post.
So how do I add this dummy row?
Thanks,
Siddharth
Reading the previous posts, I know the solution to this problem; i.e. adding a dummy last row to the source, but the problem is how do I do that as my source is a transformer and so is my target. The job design has changed since my first post.
So how do I add this dummy row?
Thanks,
Siddharth
Glad to see you are making some progress.
So, there's no other place to introduce this dummy row? Before this job, perhaps? Unless you are truly generating data on the fly, much like in Markov Chaining...
Throwing another log on the fire of thought, sometimes problems like this can be solved be sorting in reverse order and triggering your 'group by' logic by the first appearance of a key rather than the last. This eliminates the need for a dummy trailer row. Don't know if that is possible or even applicable in your case, but something to keep in mind.
I think you are going to need to expand on this just a little. I know how to setup a Transformer as a 'source' in a job, but I'd be interested in knowing what you are doing that requires both the source and the target to be Transformers and what's going on in the middle there. It will also help ensure people can help you properly, I would think.siddharthkaushik wrote:Reading the previous posts, I know the solution to this problem; i.e. adding a dummy last row to the source, but the problem is how do I do that as my source is a transformer and so is my target.
So, there's no other place to introduce this dummy row? Before this job, perhaps? Unless you are truly generating data on the fly, much like in Markov Chaining...
Throwing another log on the fire of thought, sometimes problems like this can be solved be sorting in reverse order and triggering your 'group by' logic by the first appearance of a key rather than the last. This eliminates the need for a dummy trailer row. Don't know if that is possible or even applicable in your case, but something to keep in mind.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 20
- Joined: Thu Nov 27, 2003 3:45 am
- Location: New Delhi, India
The data is coming from four informix tables with one as the driver and the other three being lookups, so thats how the data lands up in a transformer. When I said that my source is a transformer, I meant the source of the final generated data after the necessary lookups and after the transpose I have to put this data into another transformer which will then put the rows into a hash file with some more transformations and timestamps.chulett wrote: I think you are going to need to expand on this just a little. I know how to setup a Transformer as a 'source' in a job, but I'd be interested in knowing what you are doing that requires both the source and the target to be Transformers and what's going on in the middle there. It will also help ensure people can help you properly, I would think.
I hope this information helps you... ofcourse in order to help me out here!!!
Thanks,
Siddharth
Ok, if your source is a Relational Database then here's what I've done in the past to generate this dummy row. Switch to Custom SQL and use a union to join one 'extra' row to your dataset - one that you can easily recognize and will sort last into your input steam.
In Oracle I'd simply select a hard-coded something from DUAL, generally with key values of all 9s or Zs or something else you "know" will be unique and will end up last. You'll need to be able to recognize it when you see it in your job so you can push your last group out but skip any processing of this particular record.
Does that help?
In Oracle I'd simply select a hard-coded something from DUAL, generally with key values of all 9s or Zs or something else you "know" will be unique and will end up last. You'll need to be able to recognize it when you see it in your job so you can push your last group out but skip any processing of this particular record.
Does that help?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 20
- Joined: Thu Nov 27, 2003 3:45 am
- Location: New Delhi, India
Well I am open to try this out (I can do that on Monday only, when I get to the office) but I suppose that if I use the union to get that extra row in the dataset from the first table, the row would be rejected in the first lookup itself as the join has to be an inner join. I did wonder about this approach, but did not try it because of the join problem.
I am trying to get something out of tools4datastage archieve, but no head ways as yet
Thanks,
Siddharth
I am trying to get something out of tools4datastage archieve, but no head ways as yet
Thanks,
Siddharth
Don't let it. You control what gets rejected. Like I said, make sure you can recognize the extra row and then allow it to break the rules - pass it on through.siddharthkaushik wrote:Well I am open to try this out (I can do that on Monday only, when I get to the office) but I suppose that if I use the union to get that extra row in the dataset from the first table, the row would be rejected in the first lookup itself as the join has to be an inner join.
Good luck!
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 20
- Joined: Thu Nov 27, 2003 3:45 am
- Location: New Delhi, India
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Volume is irrelevant.
You control what the DataStage job does by designing it.
Direct the rejected rows to a text file for maximum "efficiency" if this is a concern.
You control what the DataStage job does by designing it.
Direct the rejected rows to a text file for maximum "efficiency" if this is a concern.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Not talking about rejecting any rows. You're not getting what I'm trying to say, let me see if I can rephrase it. I don't know all of the work that goes on inbetween the original source query and the lookups that join it to the reference tables, so it may be harder to do than I think it will be. But what I'm saying is this:siddharthkaushik wrote:Hmm...
Craig, its not possible not to reject the rows as the volume is huge. So I will have to insert the dummy row somewhere in the "source" transformer itself.
A) Add a recognizable dummy row (via a union) to your initial dataset.
B) Account for the dummy row when you do your lookups.
Yes, reference lookups are inner joins and you won't get a hit on it. Fine. "Break the rules and let it through" What I mean by that is I'm assuming you have a constraint that doesn't allow lookups that fail to pass - you said "the row would be rejected in the first lookup itself as the join has to be an inner join". You enforce this by using a constraint that says something like "Not(IsNull(Key_Field_In_Lookup))" or perhaps "Not(NOTFOUND)", right? To allow this dummy record to pass your constraint in spite of the fact that the reference lookup failed, just add something like " OR KEY = '9999999' " to the constraint - whatever it takes to recognize the dummy record is the one breaking the rules and to allow it to pass.
If this still doesn't make sense or it doesn't seem like an approach you want to take, how about a variation of the Ray Wurlod method in your other thread? Break this up into two jobs. The first one creates the initial dataset, does the lookups and writes the data you need to do the grouping on out to a flat file. The second job reads the flat file and does the group processing logic using the Stage Variables and then populates your target table. In between the two jobs - perhaps as an "after job" routine in the first job - run "ExecSH" and in there cat the dummy row on to the end of the flat file you just created. That way it will be in place when the second 'half' of the job starts up and finishes the processing.
One of those two approaches should work for you, I would think!
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you really think about it, reference lookups per se are left outer joins.
You only make them inner joins if you constrain the output to those rows for which the lookup succeeded!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You don't even need two jobs. Consider this design.If this still doesn't make sense or it doesn't seem like an approach you want to take, how about a variation of the Ray Wurlod method in your other thread? Break this up into two jobs. The first one creates the initial dataset, does the lookups and writes the data you need to do the grouping on out to a flat file. The second job reads the flat file and does the group processing logic using the Stage Variables and then populates your target table. In between the two jobs - perhaps as an "after job" routine in the first job - run "ExecSH" and in there cat the dummy row on to the end of the flat file you just created. That way it will be in place when the second 'half' of the job starts up and finishes the processing.
---> Transformer1 ---> SeqFile ---> Transformer2 --->
An after-stage subroutine in Transformer1 invoking ExecSH can happily append the dummy row.
The output from the SeqFile stage can not be opened until its input has been closed.
Or this could be done in a before-stage subroutine in Transformer2. A before-stage subroutine is executed before the file is opened. In fact, this would be my preferred approach of the two.
Tip: whenever you use a before-stage or after-stage subroutine in a design, put an annotation on the canvas warning future developers that this hidden functionality is part of the design.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.