Hi, I am doing vertical pivot on the lines of whats mentioned at - http://dsxchange.com/viewtopic.php?t=97 ... d58c385d2b
Now, in the above solution, final data is being written in a hash file as it overwrites values for same keys with no extra effort. Since my data runs in millions of rows I was thinking about saving some of the hash file writes by using an aggregator with 'last' derivation. By the way my data is sorted as per key. And then i m thinking of putting a sequential file after aggregator, which would be used by bulk loading utility for oracle to load data.
old : seq. file1--> transformer-->hash file
new: seq. file1--> transformer-->aggregator-->seq. file2
My question is - can this approach put too much load on main memory? will the aggregator use too much of memory since data size is too large?
or is there any other way which could fast for large volumes of data.
query on vertical pivot
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If the data are sorted on the grouping keys, and the Aggregator stage is informed of this fact, then the volumes are irrelevant - the stage can output a row (and free the memory associated with it) each time a sorted grouping key changes value - the sorting means that that value will never be seen again. It (the Aggregator stage) becomes very light-weight in its use of memory with sorted input.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 39
- Joined: Thu Nov 23, 2006 11:23 pm
I tried using aggregator on a sample job, by replacing one of the end hash files with aggregator.
But in DS Designer after I run the job, I see twice the number of rows passed between transformer & Aggregator stage, how is that possible?
its the 40rows link which shows double the rows which previous hash file design was showing.
In the aggregator I have simply grouped by index field which is sorted. and used 'last' in the record field which is a comma separated record.
But in DS Designer after I run the job, I see twice the number of rows passed between transformer & Aggregator stage, how is that possible?
source----32rows----->transformer----20rows------>transformer-----40rows--->aggregator------5---->seq. file
....................................|
....................................|
.....................................----12------>transformer----4---------->hash file
its the 40rows link which shows double the rows which previous hash file design was showing.
In the aggregator I have simply grouped by index field which is sorted. and used 'last' in the record field which is a comma separated record.
INDEX,"Record"
0,"0,3434,001,asdsa,22"
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Unless you tell us precisely how you are processing in the Transformer we can not answer your question. It IS possible for more rows to come out of a Transformer than go in, but you have to program this - it can not happen of its own accord.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 39
- Joined: Thu Nov 23, 2006 11:23 pm
Requirment is to club these multiple rows with same index value(the integer in third field) into one row.
In the transformer I am doing vertical pivot the following way:sample input:
Field001,Field001_1,Field001_2,Field001_3,Value
"wfappwols","hierarchies","0","cisid","3434"
"wfappwols","hierarchies","0","cisbankid","001"
"wfappwols","hierarchies","0","ciscompanyname","asdsa"
"wfappwols","hierarchies","0","cislevel","22"
"wfappwols","hierarchies","1","cisid","3435"
"wfappwols","hierarchies","1","cisbankid","002"
"wfappwols","hierarchies","1","ciscompanyname","asdsadsad"
"wfappwols","hierarchies","1","cislevel","22"
sample output:
INDEX,"Record-(cisid,cisbankid,ciscompanyname,cislevel)"
0,"0,3434,001,asdsa,22"
1,"1,3435,002,asdsadsad,22"
So, now I can't understand why when I use aggregator stage, the number of output rows from transformer gets doubled?Input Link consists of 5fields:
Field001,Field001_1,Field001_2,Field001_3,Value.
Stage Variables:
currentKey:
Default=""
Derivation=InputLink.Field001_2
newRecord:
Default=""
Derivation=if currentKey <> lastKey Then InputLink.Field001_2:",": InputLink.Value ELSE newRecord:",":InputLink.Value
lastKey:
Default=""
Derivation=currentKey
Output Link:
INDEX=InputLink.Field001_2
Record=newRecord
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
How many output rows, how many input rows?
You probably need to constrain the output to pass only those rows that are complete.
Take a look at the actual rows being output.
You probably need to constrain the output to pass only those rows that are complete.
Take a look at the actual rows being output.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 39
- Joined: Thu Nov 23, 2006 11:23 pm
Thats precisely why I am using an Aggregator Stage(with last derivation) or a hash file stage which gives me last write.You probably need to constrain the output to pass only those rows that are complete.
I will try putting up constraint in transformer itself, but the question still remains unanswered as to from where does these extra rows coming for 'Aggregator' and not for hash files.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Please answer my questions about your Transformer stage job.
Help us to help you.
An extra question, then. What are the row counts when you use an Aggregator stage - inputs and outputs to that stage?
Can you please post the job design in each case? But use Code tags (which gives fixed-width typeface) rather than what you attempted earlier and which was not totally clear.
What are you doing (precisely) in the second Transformer stage?
Help us to help you.
An extra question, then. What are the row counts when you use an Aggregator stage - inputs and outputs to that stage?
Can you please post the job design in each case? But use Code tags (which gives fixed-width typeface) rather than what you attempted earlier and which was not totally clear.
What are you doing (precisely) in the second Transformer stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.