query on vertical pivot

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
manish1005
Participant
Posts: 39
Joined: Thu Nov 23, 2006 11:23 pm

query on vertical pivot

Post by manish1005 »

Hi, I am doing vertical pivot on the lines of whats mentioned at - http://dsxchange.com/viewtopic.php?t=97 ... d58c385d2b

Now, in the above solution, final data is being written in a hash file as it overwrites values for same keys with no extra effort. Since my data runs in millions of rows I was thinking about saving some of the hash file writes by using an aggregator with 'last' derivation. By the way my data is sorted as per key. And then i m thinking of putting a sequential file after aggregator, which would be used by bulk loading utility for oracle to load data.

old : seq. file1--> transformer-->hash file
new: seq. file1--> transformer-->aggregator-->seq. file2

My question is - can this approach put too much load on main memory? will the aggregator use too much of memory since data size is too large?
or is there any other way which could fast for large volumes of data.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the data are sorted on the grouping keys, and the Aggregator stage is informed of this fact, then the volumes are irrelevant - the stage can output a row (and free the memory associated with it) each time a sorted grouping key changes value - the sorting means that that value will never be seen again. It (the Aggregator stage) becomes very light-weight in its use of memory with sorted input.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
manish1005
Participant
Posts: 39
Joined: Thu Nov 23, 2006 11:23 pm

Post by manish1005 »

I tried using aggregator on a sample job, by replacing one of the end hash files with aggregator.
But in DS Designer after I run the job, I see twice the number of rows passed between transformer & Aggregator stage, how is that possible?
source----32rows----->transformer----20rows------>transformer-----40rows--->aggregator------5---->seq. file
....................................|
....................................|
.....................................----12------>transformer----4---------->hash file

its the 40rows link which shows double the rows which previous hash file design was showing.

In the aggregator I have simply grouped by index field which is sorted. and used 'last' in the record field which is a comma separated record.
INDEX,"Record"
0,"0,3434,001,asdsa,22"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Unless you tell us precisely how you are processing in the Transformer we can not answer your question. It IS possible for more rows to come out of a Transformer than go in, but you have to program this - it can not happen of its own accord.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
manish1005
Participant
Posts: 39
Joined: Thu Nov 23, 2006 11:23 pm

Post by manish1005 »

Requirment is to club these multiple rows with same index value(the integer in third field) into one row.
sample input:
Field001,Field001_1,Field001_2,Field001_3,Value
"wfappwols","hierarchies","0","cisid","3434"
"wfappwols","hierarchies","0","cisbankid","001"
"wfappwols","hierarchies","0","ciscompanyname","asdsa"
"wfappwols","hierarchies","0","cislevel","22"
"wfappwols","hierarchies","1","cisid","3435"
"wfappwols","hierarchies","1","cisbankid","002"
"wfappwols","hierarchies","1","ciscompanyname","asdsadsad"
"wfappwols","hierarchies","1","cislevel","22"

sample output:
INDEX,"Record-(cisid,cisbankid,ciscompanyname,cislevel)"
0,"0,3434,001,asdsa,22"
1,"1,3435,002,asdsadsad,22"
In the transformer I am doing vertical pivot the following way:
Input Link consists of 5fields:
Field001,Field001_1,Field001_2,Field001_3,Value.

Stage Variables:
currentKey:
Default=""
Derivation=InputLink.Field001_2

newRecord:
Default=""
Derivation=if currentKey <> lastKey Then InputLink.Field001_2:",": InputLink.Value ELSE newRecord:",":InputLink.Value

lastKey:
Default=""
Derivation=currentKey

Output Link:
INDEX=InputLink.Field001_2
Record=newRecord
So, now I can't understand why when I use aggregator stage, the number of output rows from transformer gets doubled?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How many output rows, how many input rows?

You probably need to constrain the output to pass only those rows that are complete.

Take a look at the actual rows being output.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
manish1005
Participant
Posts: 39
Joined: Thu Nov 23, 2006 11:23 pm

Post by manish1005 »

You probably need to constrain the output to pass only those rows that are complete.
Thats precisely why I am using an Aggregator Stage(with last derivation) or a hash file stage which gives me last write.

I will try putting up constraint in transformer itself, but the question still remains unanswered as to from where does these extra rows coming for 'Aggregator' and not for hash files.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please answer my questions about your Transformer stage job.

Help us to help you.

An extra question, then. What are the row counts when you use an Aggregator stage - inputs and outputs to that stage?

Can you please post the job design in each case? But use Code tags (which gives fixed-width typeface) rather than what you attempted earlier and which was not totally clear.

What are you doing (precisely) in the second Transformer stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply