Combine source rows into single target row?
Moderators: chulett, rschirm, roy
Combine source rows into single target row?
I need to combine multiple source rows into one target database row. Each target row will have the key fields plus a title, amount, project, etc. field. Each source row has the key fields plus an indicator of the data type and the data field itself. For example the first source row would have the title data, the second row would have the amount, the third row would have the project, etc. It sounds like the RowMerger stage should do this but we do not have the documentation for it. Can anyone help?
Re: Combine source rows into single target row?
Sure you do! Under your client install path is a 'Docs' directory with .pdf manuals for all sorts of things, including the new Row Splitter and Row Merger stages.starlyt wrote:It sounds like the RowMerger stage should do this but we do not have the documentation for it.
You can also add one to a job and then click on the 'Help' button to access the online version of the help for these stages.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Unless you have obscene volumes or an obscene number of columns, you could probably just use your source file and put it into hash lookups. It sounds like each row has a type. Simply make the primary key on the hash file the natural key of the data plus the column type. Load the whole source file into the hash. This should be a really fast job: seq -->xfm-->hash.
Then, write a job that re-reads that file for just the driving record type and has that hash file as a reference link for every column in the target. So if you have 5 columns, you have five reference lookups to the hash file. Use the natural key and the column you need, and map it to the appropriate target column. The job will look like: seq --> xfm w/N hash lookups --> seq
If you find that this job is slow, then use multiple job instances to improve your net throughput. If you have 10 cpus, then 10 job instances, each processing 1/10 the data should make the net runtime 10X shorter. Then, load your results or do further processing.
Then, write a job that re-reads that file for just the driving record type and has that hash file as a reference link for every column in the target. So if you have 5 columns, you have five reference lookups to the hash file. Use the natural key and the column you need, and map it to the appropriate target column. The job will look like: seq --> xfm w/N hash lookups --> seq
If you find that this job is slow, then use multiple job instances to improve your net throughput. If you have 10 cpus, then 10 job instances, each processing 1/10 the data should make the net runtime 10X shorter. Then, load your results or do further processing.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
That's... interesting. Sounds like an incomplete install. Did you upgrade from a previous version or install what you've got from scratch?
We found, during our upgrade testing, that it was easy for the person installing the server to 'miss' the new plugins and not get them installed in the project. I had to go back and do it by hand. Do you have any idea if they were actually installed on your server? Once you are sure they are out there, I'd suggest reinstalling your Client software.
We found, during our upgrade testing, that it was easy for the person installing the server to 'miss' the new plugins and not get them installed in the project. I had to go back and do it by hand. Do you have any idea if they were actually installed on your server? Once you are sure they are out there, I'd suggest reinstalling your Client software.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I have the same error. I have installed the client properly and I used Custom option while installing the Client and made sure everything is installed and the
and no .pdf file is present for either Row Splitter or Row Merger.
doesn't have the file for Row Splitter or Row Merger.holgi02 wrote:\packages\doc\enu
and no .pdf file is present for either Row Splitter or Row Merger.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Some version 7 installs don't have row splitter and row merger documentation. I have a version 7 Windows install and I cannot locate any help files or PDF files for these stages. It was added to the release at the last moment. This is what I got from Ascential:
Anyone out there on version 7.0.1? Can you confirm whether there are help or PDF files for these two stages? Can someone upload them onto Developernet.The properties of the Row Splitter and Row Merger pretty much mimics of the sequential stage ... and have the benefit of doing "on the fly" redefinition of meta data, without having to land to disk.
The old technique of using a sequential stage to read a row as a big block
of a thousand bytes, and then have the output link redefine the meta data
(without having to use derivations) is the primary purpose of the
RowSplitter, and RowMerger does the same thing in reverse.
In effect, it's splitting or merging the "meta data" not the row itself.
These new active stages were put in place primarily to ease the transition
for "classic" ds jobs that use this technique, into RTI (RTI cannot have a
passive stage with input "AND" output links).
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
With some help from Craig I have uploaded the help and PDF files for these two stages to Developnet. Have a look in the Product Support category.
http://developernet.ascential.com/forum ... category=2
http://developernet.ascential.com/forum ... category=2
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Anyone even read my post? There was a working solution posted irrespective of this wild goose chase on the row splitter/merge.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I was just telling people where to get the row splitter and row merger documentation. Once he had it he could pretty quickly work out that it wasn't going to do what he wanted! Row merger is primarily a field concatenator, it can do row concatenation but it's an all or nothing approach. With the Multiple Lines turned on it concatenates all rows in the source until it reaches an end-of-data.
To merge collections of rows he is better off going with Ken's hash file lookup approach or using stage variables. You would have a stage variable for each data type, each variable checks the data type field and saves the value if it is the right type. Each stage variable is then written to an output field. A constraint on the transformer only outputs a row when the final data type is detected.
If I'd spent a bit more time reading the original question I might have mentioned this earlier.
To merge collections of rows he is better off going with Ken's hash file lookup approach or using stage variables. You would have a stage variable for each data type, each variable checks the data type field and saves the value if it is the right type. Each stage variable is then written to an output field. A constraint on the transformer only outputs a row when the final data type is detected.
If I'd spent a bit more time reading the original question I might have mentioned this earlier.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Ken, you posted in this thread? Oh yah, there it is waaay up there.
My contribution to this so called 'wild goose chase' was to point out the OP certainly did (or at least should) have the documentation for the new stages. It wouldn't have taken long after that to discover it wasn't the Silver Bullet here. And in the process of goose chasing, we uncovered the fact that some bad install cdroms went out and not everyone got all the tasty new bits. Lastly, we got them uploaded on ADN in case anyone else fell victim to this particular malady.
With all that out of the way, now we can all go back, re-read the original post and your solution and put this problem to bed.
My contribution to this so called 'wild goose chase' was to point out the OP certainly did (or at least should) have the documentation for the new stages. It wouldn't have taken long after that to discover it wasn't the Silver Bullet here. And in the process of goose chasing, we uncovered the fact that some bad install cdroms went out and not everyone got all the tasty new bits. Lastly, we got them uploaded on ADN in case anyone else fell victim to this particular malady.
With all that out of the way, now we can all go back, re-read the original post and your solution and put this problem to bed.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Make sure the source data is sorted by each grouping natural key so that the final row for each grouping natural key triggers the output.vmcburney wrote:You would have a stage variable for each data type, each variable checks the data type field and saves the value if it is the right type. Each stage variable is then written to an output field. A constraint on the transformer only outputs a row when the final data type is detected.
My recommended solution takes two passes thru the data, once to put all reference rows into a single hash file, the second pulls out just the driver row and references the hashes to merge the reference columns. Vincents solution requires two passes also, one to sort the file and the second to parse and rollup. Both solutions work, if you have the time, try both. It should be very enlightening!
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Looks like it is already sorted:
You just need to make sure the trigger row, the one that the constraint outputs on, is of the same type for all collections.example the first source row would have the title data, the second row would have the amount, the third row would have the project, etc
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn