Combine source rows into single target row?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
starlyt
Participant
Posts: 2
Joined: Tue Feb 24, 2004 2:24 pm
Location: Phoenix

Combine source rows into single target row?

Post by starlyt »

I need to combine multiple source rows into one target database row. Each target row will have the key fields plus a title, amount, project, etc. field. Each source row has the key fields plus an indicator of the data type and the data field itself. For example the first source row would have the title data, the second row would have the amount, the third row would have the project, etc. It sounds like the RowMerger stage should do this but we do not have the documentation for it. Can anyone help?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Combine source rows into single target row?

Post by chulett »

starlyt wrote:It sounds like the RowMerger stage should do this but we do not have the documentation for it.
Sure you do! Under your client install path is a 'Docs' directory with .pdf manuals for all sorts of things, including the new Row Splitter and Row Merger stages.

You can also add one to a job and then click on the 'Help' button to access the online version of the help for these stages.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Unless you have obscene volumes or an obscene number of columns, you could probably just use your source file and put it into hash lookups. It sounds like each row has a type. Simply make the primary key on the hash file the natural key of the data plus the column type. Load the whole source file into the hash. This should be a really fast job: seq -->xfm-->hash.

Then, write a job that re-reads that file for just the driving record type and has that hash file as a reference link for every column in the target. So if you have 5 columns, you have five reference lookups to the hash file. Use the natural key and the column you need, and map it to the appropriate target column. The job will look like: seq --> xfm w/N hash lookups --> seq

If you find that this job is slow, then use multiple job instances to improve your net throughput. If you have 10 cpus, then 10 job instances, each processing 1/10 the data should make the net runtime 10X shorter. Then, load your results or do further processing.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
starlyt
Participant
Posts: 2
Joined: Tue Feb 24, 2004 2:24 pm
Location: Phoenix

Post by starlyt »

No, when I click on the Help button I get the message, "Cannot find MergGUI.hlp file." When I look through the *.pdf files in the DOC directory there are none there for either row splitter or row merger.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That's... interesting. :? Sounds like an incomplete install. Did you upgrade from a previous version or install what you've got from scratch?

We found, during our upgrade testing, that it was easy for the person installing the server to 'miss' the new plugins and not get them installed in the project. I had to go back and do it by hand. Do you have any idea if they were actually installed on your server? Once you are sure they are out there, I'd suggest reinstalling your Client software.
-craig

"You can never have too many knives" -- Logan Nine Fingers
holgi02
Participant
Posts: 20
Joined: Tue Apr 22, 2003 3:17 am
Location: UK

Post by holgi02 »

Sounds like the client side install did not install everything. If you have access to the Ascential DataStage install CD there is a whole load of stuff on there under \packages\doc\enu
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Post by raju_chvr »

I have the same error. I have installed the client properly and I used Custom option while installing the Client and made sure everything is installed and the
holgi02 wrote:\packages\doc\enu
doesn't have the file for Row Splitter or Row Merger.

and no .pdf file is present for either Row Splitter or Row Merger.
holgi02
Participant
Posts: 20
Joined: Tue Apr 22, 2003 3:17 am
Location: UK

Post by holgi02 »

Yeah,

I was refering to the directory structure on the "Install CD" if it is available. Check this out first and if the files are missing from here then you must have a bad one - talk to Ascential and get a new "Install CD"

IHTH

Gil.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Some version 7 installs don't have row splitter and row merger documentation. I have a version 7 Windows install and I cannot locate any help files or PDF files for these stages. It was added to the release at the last moment. This is what I got from Ascential:
The properties of the Row Splitter and Row Merger pretty much mimics of the sequential stage ... and have the benefit of doing "on the fly" redefinition of meta data, without having to land to disk.

The old technique of using a sequential stage to read a row as a big block
of a thousand bytes, and then have the output link redefine the meta data
(without having to use derivations) is the primary purpose of the
RowSplitter, and RowMerger does the same thing in reverse.

In effect, it's splitting or merging the "meta data" not the row itself.

These new active stages were put in place primarily to ease the transition
for "classic" ds jobs that use this technique, into RTI (RTI cannot have a
passive stage with input "AND" output links).
Anyone out there on version 7.0.1? Can you confirm whether there are help or PDF files for these two stages? Can someone upload them onto Developernet.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

With some help from Craig I have uploaded the help and PDF files for these two stages to Developnet. Have a look in the Product Support category.

http://developernet.ascential.com/forum ... category=2
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Anyone even read my post? :( There was a working solution posted irrespective of this wild goose chase on the row splitter/merge.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I was just telling people where to get the row splitter and row merger documentation. Once he had it he could pretty quickly work out that it wasn't going to do what he wanted! Row merger is primarily a field concatenator, it can do row concatenation but it's an all or nothing approach. With the Multiple Lines turned on it concatenates all rows in the source until it reaches an end-of-data.

To merge collections of rows he is better off going with Ken's hash file lookup approach or using stage variables. You would have a stage variable for each data type, each variable checks the data type field and saves the value if it is the right type. Each stage variable is then written to an output field. A constraint on the transformer only outputs a row when the final data type is detected.

If I'd spent a bit more time reading the original question I might have mentioned this earlier. :oops:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ken, you posted in this thread? Oh yah, there it is waaay up there. :lol:

My contribution to this so called 'wild goose chase' was to point out the OP certainly did (or at least should) have the documentation for the new stages. It wouldn't have taken long after that to discover it wasn't the Silver Bullet here. And in the process of goose chasing, we uncovered the fact that some bad install cdroms went out and not everyone got all the tasty new bits. Lastly, we got them uploaded on ADN in case anyone else fell victim to this particular malady.

With all that out of the way, now we can all go back, re-read the original post and your solution and put this problem to bed. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

vmcburney wrote:You would have a stage variable for each data type, each variable checks the data type field and saves the value if it is the right type. Each stage variable is then written to an output field. A constraint on the transformer only outputs a row when the final data type is detected.
Make sure the source data is sorted by each grouping natural key so that the final row for each grouping natural key triggers the output.

My recommended solution takes two passes thru the data, once to put all reference rows into a single hash file, the second pulls out just the driver row and references the hashes to merge the reference columns. Vincents solution requires two passes also, one to sort the file and the second to parse and rollup. Both solutions work, if you have the time, try both. It should be very enlightening!
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Looks like it is already sorted:
example the first source row would have the title data, the second row would have the amount, the third row would have the project, etc
You just need to make sure the trigger row, the one that the constraint outputs on, is of the same type for all collections.
Post Reply