Transposing rows to columns

siddharthkaushik · Post by **siddharthkaushik** » Fri Apr 02, 2004 4:44 pm

Hello All,

I have this problem wherein I need to transpose a set of rows into columns thereby making multiple rows in the source to one single row in the target.

My source in an Informix Table and I am using the Informix CLI plugin to get the data out of Informix DB. My target has to be a hash file.

The source data looks something like this:

Col1 Col2 Col3
1 A ValueA
1 B ValueB
1 C ValueC
2 A ValueA
2 B ValueB
3 A ValueA
4 A ValueA
4 B ValueB
4 C ValueC

The target should be something like this:

Col1 Col2 Col3 Col4
1 ValueA ValueB ValueC
2 ValueA ValueB ""
3 ValueA "" ""
4 ValueA ValueB ValueC

I have been scratching my head to use some stage variables and RowProcGetPreviousValue routine, but somehow not getting and breakthru.

Any insights??

Thanks,
Siddharth

ray.wurlod · Post by **ray.wurlod** » Fri Apr 02, 2004 7:54 pm

Try searching the forum for VERTICAL PIVOT, as this is what you are trying to accomplish.
It's not a job for the Pivot stage, which performs horizontal pivots.

siddharthkaushik · Post by **siddharthkaushik** » Fri Apr 02, 2004 10:04 pm

Hi Ray,

I had already done that search before posting this message. The closest I could get was this.

viewtopic.php?t=85702&highlight=vertica ... 7cc107e5dd

but this did not solve my problem as my input has variable number of rows for a particular key.

Thanks
Siddharth

siddharthkaushik · Post by **siddharthkaushik** » Fri Apr 02, 2004 10:39 pm

The problem is that when the decision is taken that a particular group has ended, by that time the values associated with that group key also vanish. The offset is exactly by one. I am not getting any ideas to shoot off this offset.

Thanks,
Siddharth

siddharthkaushik · Post by **siddharthkaushik** » Sat Apr 03, 2004 12:26 am

Okay, by doing some reordering of stage variables I am through with the above mentioned problem. But have run into another one. The last set of rows is not being processed.

Reading the previous posts, I know the solution to this problem; i.e. adding a dummy last row to the source, but the problem is how do I do that as my source is a transformer and so is my target. The job design has changed since my first post.

So how do I add this dummy row?

Thanks,
Siddharth

chulett · Post by **chulett** » Sat Apr 03, 2004 9:21 am

Glad to see you are making some progress.

siddharthkaushik wrote:Reading the previous posts, I know the solution to this problem; i.e. adding a dummy last row to the source, but the problem is how do I do that as my source is a transformer and so is my target.

I think you are going to need to expand on this just a little.

I know how to setup a Transformer as a 'source' in a job, but I'd be interested in knowing what you are doing that requires both the source and the target to be Transformers and what's going on in the middle there. It will also help ensure people can help you properly, I would think.

So, there's no other place to introduce this dummy row? Before this job, perhaps? Unless you are truly generating data on the fly, much like in Markov Chaining...

Throwing another log on the fire of thought, sometimes problems like this can be solved be sorting in reverse order and triggering your 'group by' logic by the first appearance of a key rather than the last. This eliminates the need for a dummy trailer row. Don't know if that is possible or even applicable in your case, but something to keep in mind.

siddharthkaushik · Post by **siddharthkaushik** » Sat Apr 03, 2004 11:28 am

chulett wrote: I think you are going to need to expand on this just a little. I know how to setup a Transformer as a 'source' in a job, but I'd be interested in knowing what you are doing that requires both the source and the target to be Transformers and what's going on in the middle there. It will also help ensure people can help you properly, I would think.

The data is coming from four informix tables with one as the driver and the other three being lookups, so thats how the data lands up in a transformer. When I said that my source is a transformer, I meant the source of the final generated data after the necessary lookups and after the transpose I have to put this data into another transformer which will then put the rows into a hash file with some more transformations and timestamps.

I hope this information helps you... ofcourse in order to help me out here!!!

Thanks,
Siddharth

chulett · Post by **chulett** » Sat Apr 03, 2004 12:02 pm

Ok, if your source is a Relational Database then here's what I've done in the past to generate this dummy row. Switch to Custom SQL and use a union to join one 'extra' row to your dataset - one that you can easily recognize and will sort last into your input steam.

In Oracle I'd simply select a hard-coded something from DUAL, generally with key values of all 9s or Zs or something else you "know" will be unique and will end up last. You'll need to be able to recognize it when you see it in your job so you can push your last group out but skip any processing of this particular record.

Does that help?

siddharthkaushik · Post by **siddharthkaushik** » Sat Apr 03, 2004 5:08 pm

Well I am open to try this out (I can do that on Monday only, when I get to the office) but I suppose that if I use the union to get that extra row in the dataset from the first table, the row would be rejected in the first lookup itself as the join has to be an inner join. I did wonder about this approach, but did not try it because of the join problem.

I am trying to get something out of tools4datastage archieve, but no head ways as yet

Thanks,
Siddharth

chulett · Post by **chulett** » Sat Apr 03, 2004 6:24 pm

siddharthkaushik wrote:Well I am open to try this out (I can do that on Monday only, when I get to the office) but I suppose that if I use the union to get that extra row in the dataset from the first table, the row would be rejected in the first lookup itself as the join has to be an inner join.

Don't let it.

You control what gets rejected. Like I said, make sure you can recognize the extra row and then allow it to break the rules - pass it on through.

Good luck!

siddharthkaushik · Post by **siddharthkaushik** » Mon Apr 05, 2004 1:05 pm

Hmm...
Craig, its not possible not to reject the rows as the volume is huge. So I will have to insert the dummy row somewhere in the "source" transformer itself.

Any insights how to do it?

Thanks,
Siddharth

ray.wurlod · Post by **ray.wurlod** » Mon Apr 05, 2004 4:56 pm

Volume is irrelevant.

You control what the DataStage job does by designing it.
Direct the rejected rows to a text file for maximum "efficiency" if this is a concern.

chulett · Post by **chulett** » Mon Apr 05, 2004 6:12 pm

siddharthkaushik wrote:Hmm...
Craig, its not possible not to reject the rows as the volume is huge. So I will have to insert the dummy row somewhere in the "source" transformer itself.

Not talking about rejecting any rows. You're not getting what I'm trying to say, let me see if I can rephrase it.

I don't know all of the work that goes on inbetween the original source query and the lookups that join it to the reference tables, so it may be harder to do than I think it will be. But what I'm saying is this:

A) Add a recognizable dummy row (via a union) to your initial dataset.
B) Account for the dummy row when you do your lookups.

Yes, reference lookups are inner joins and you won't get a hit on it. Fine. "Break the rules and let it through" What I mean by that is I'm assuming you have a constraint that doesn't allow lookups that fail to pass - you said "the row would be rejected in the first lookup itself as the join has to be an inner join". You enforce this by using a constraint that says something like "Not(IsNull(Key_Field_In_Lookup))" or perhaps "Not(NOTFOUND)", right? To allow this dummy record to pass your constraint in spite of the fact that the reference lookup failed, just add something like " OR KEY = '9999999' " to the constraint - whatever it takes to recognize the dummy record is the one breaking the rules and to allow it to pass.

If this still doesn't make sense or it doesn't seem like an approach you want to take, how about a variation of the Ray Wurlod method in your other thread?

Break this up into two jobs. The first one creates the initial dataset, does the lookups and writes the data you need to do the grouping on out to a flat file. The second job reads the flat file and does the group processing logic using the Stage Variables and then populates your target table. In between the two jobs - perhaps as an "after job" routine in the first job - run "ExecSH" and in there cat the dummy row on to the end of the flat file you just created. That way it will be in place when the second 'half' of the job starts up and finishes the processing.

One of those two approaches should work for you, I would think!

ray.wurlod · Post by **ray.wurlod** » Mon Apr 05, 2004 8:22 pm

If you really think about it, reference lookups per se are left outer joins.

You only make them inner joins if you constrain the output to those rows for which the lookup succeeded!

ray.wurlod · Post by **ray.wurlod** » Mon Apr 05, 2004 8:24 pm

If this still doesn't make sense or it doesn't seem like an approach you want to take, how about a variation of the Ray Wurlod method in your other thread? Break this up into two jobs. The first one creates the initial dataset, does the lookups and writes the data you need to do the grouping on out to a flat file. The second job reads the flat file and does the group processing logic using the Stage Variables and then populates your target table. In between the two jobs - perhaps as an "after job" routine in the first job - run "ExecSH" and in there cat the dummy row on to the end of the flat file you just created. That way it will be in place when the second 'half' of the job starts up and finishes the processing.

You don't even need two jobs. Consider this design.

---> Transformer1 ---> SeqFile ---> Transformer2 --->

An after-stage subroutine in Transformer1 invoking ExecSH can happily append the dummy row.
The output from the SeqFile stage can not be opened until its input has been closed.

Or this could be done in a before-stage subroutine in Transformer2. A before-stage subroutine is executed before the file is opened. In fact, this would be my preferred approach of the two.

Tip: whenever you use a before-stage or after-stage subroutine in a design, put an annotation on the canvas warning future developers that this hidden functionality is part of the design.