Page 1 of 1

Combine source rows into single target row?

Posted: Wed Feb 25, 2004 9:07 am
by starlyt
I need to combine multiple source rows into one target database row. Each target row will have the key fields plus a title, amount, project, etc. field. Each source row has the key fields plus an indicator of the data type and the data field itself. For example the first source row would have the title data, the second row would have the amount, the third row would have the project, etc. It sounds like the RowMerger stage should do this but we do not have the documentation for it. Can anyone help?

Re: Combine source rows into single target row?

Posted: Wed Feb 25, 2004 9:12 am
by chulett
starlyt wrote:It sounds like the RowMerger stage should do this but we do not have the documentation for it.
Sure you do! Under your client install path is a 'Docs' directory with .pdf manuals for all sorts of things, including the new Row Splitter and Row Merger stages.

You can also add one to a job and then click on the 'Help' button to access the online version of the help for these stages.

Posted: Wed Feb 25, 2004 9:29 am
by kcbland
Unless you have obscene volumes or an obscene number of columns, you could probably just use your source file and put it into hash lookups. It sounds like each row has a type. Simply make the primary key on the hash file the natural key of the data plus the column type. Load the whole source file into the hash. This should be a really fast job: seq -->xfm-->hash.

Then, write a job that re-reads that file for just the driving record type and has that hash file as a reference link for every column in the target. So if you have 5 columns, you have five reference lookups to the hash file. Use the natural key and the column you need, and map it to the appropriate target column. The job will look like: seq --> xfm w/N hash lookups --> seq

If you find that this job is slow, then use multiple job instances to improve your net throughput. If you have 10 cpus, then 10 job instances, each processing 1/10 the data should make the net runtime 10X shorter. Then, load your results or do further processing.

Posted: Wed Feb 25, 2004 9:33 am
by starlyt
No, when I click on the Help button I get the message, "Cannot find MergGUI.hlp file." When I look through the *.pdf files in the DOC directory there are none there for either row splitter or row merger.

Posted: Wed Feb 25, 2004 9:39 am
by chulett
That's... interesting. :? Sounds like an incomplete install. Did you upgrade from a previous version or install what you've got from scratch?

We found, during our upgrade testing, that it was easy for the person installing the server to 'miss' the new plugins and not get them installed in the project. I had to go back and do it by hand. Do you have any idea if they were actually installed on your server? Once you are sure they are out there, I'd suggest reinstalling your Client software.

Posted: Wed Feb 25, 2004 9:56 am
by holgi02
Sounds like the client side install did not install everything. If you have access to the Ascential DataStage install CD there is a whole load of stuff on there under \packages\doc\enu

Posted: Wed Feb 25, 2004 10:12 am
by raju_chvr
I have the same error. I have installed the client properly and I used Custom option while installing the Client and made sure everything is installed and the
holgi02 wrote:\packages\doc\enu
doesn't have the file for Row Splitter or Row Merger.

and no .pdf file is present for either Row Splitter or Row Merger.

Posted: Wed Feb 25, 2004 10:28 am
by holgi02
Yeah,

I was refering to the directory structure on the "Install CD" if it is available. Check this out first and if the files are missing from here then you must have a bad one - talk to Ascential and get a new "Install CD"

IHTH

Gil.

Posted: Wed Feb 25, 2004 6:12 pm
by vmcburney
Some version 7 installs don't have row splitter and row merger documentation. I have a version 7 Windows install and I cannot locate any help files or PDF files for these stages. It was added to the release at the last moment. This is what I got from Ascential:
The properties of the Row Splitter and Row Merger pretty much mimics of the sequential stage ... and have the benefit of doing "on the fly" redefinition of meta data, without having to land to disk.

The old technique of using a sequential stage to read a row as a big block
of a thousand bytes, and then have the output link redefine the meta data
(without having to use derivations) is the primary purpose of the
RowSplitter, and RowMerger does the same thing in reverse.

In effect, it's splitting or merging the "meta data" not the row itself.

These new active stages were put in place primarily to ease the transition
for "classic" ds jobs that use this technique, into RTI (RTI cannot have a
passive stage with input "AND" output links).
Anyone out there on version 7.0.1? Can you confirm whether there are help or PDF files for these two stages? Can someone upload them onto Developernet.

Posted: Wed Feb 25, 2004 6:46 pm
by vmcburney
With some help from Craig I have uploaded the help and PDF files for these two stages to Developnet. Have a look in the Product Support category.

http://developernet.ascential.com/forum ... category=2

Posted: Wed Feb 25, 2004 8:52 pm
by kcbland
Anyone even read my post? :( There was a working solution posted irrespective of this wild goose chase on the row splitter/merge.

Posted: Wed Feb 25, 2004 9:35 pm
by vmcburney
I was just telling people where to get the row splitter and row merger documentation. Once he had it he could pretty quickly work out that it wasn't going to do what he wanted! Row merger is primarily a field concatenator, it can do row concatenation but it's an all or nothing approach. With the Multiple Lines turned on it concatenates all rows in the source until it reaches an end-of-data.

To merge collections of rows he is better off going with Ken's hash file lookup approach or using stage variables. You would have a stage variable for each data type, each variable checks the data type field and saves the value if it is the right type. Each stage variable is then written to an output field. A constraint on the transformer only outputs a row when the final data type is detected.

If I'd spent a bit more time reading the original question I might have mentioned this earlier. :oops:

Posted: Wed Feb 25, 2004 10:39 pm
by chulett
Ken, you posted in this thread? Oh yah, there it is waaay up there. :lol:

My contribution to this so called 'wild goose chase' was to point out the OP certainly did (or at least should) have the documentation for the new stages. It wouldn't have taken long after that to discover it wasn't the Silver Bullet here. And in the process of goose chasing, we uncovered the fact that some bad install cdroms went out and not everyone got all the tasty new bits. Lastly, we got them uploaded on ADN in case anyone else fell victim to this particular malady.

With all that out of the way, now we can all go back, re-read the original post and your solution and put this problem to bed. :wink:

Posted: Wed Feb 25, 2004 10:41 pm
by kcbland
vmcburney wrote:You would have a stage variable for each data type, each variable checks the data type field and saves the value if it is the right type. Each stage variable is then written to an output field. A constraint on the transformer only outputs a row when the final data type is detected.
Make sure the source data is sorted by each grouping natural key so that the final row for each grouping natural key triggers the output.

My recommended solution takes two passes thru the data, once to put all reference rows into a single hash file, the second pulls out just the driver row and references the hashes to merge the reference columns. Vincents solution requires two passes also, one to sort the file and the second to parse and rollup. Both solutions work, if you have the time, try both. It should be very enlightening!

Posted: Wed Feb 25, 2004 10:56 pm
by vmcburney
Looks like it is already sorted:
example the first source row would have the title data, the second row would have the amount, the third row would have the project, etc
You just need to make sure the trigger row, the one that the constraint outputs on, is of the same type for all collections.