Can the Row Merger stage concatenate a subset of the columns

sbass1 · Post by **sbass1** » Tue Mar 17, 2009 12:27 am

If my input file has:

Key1 Key2 Var1 Var2 Key3 Var4 Var5 ... VarN

I want the output of the Row Merger stage to be:

Key1 Key2 Key3 Var4 ... VarN

In other words, I want to output my Key fields (properly delimited), then a merged column with a subset of my input columns.

I know the help says "It merges all the columns into a single string of a specified format.", but I find it hard to believe that the DS developers could be so shortsighted and not allow me to chose which columns I want to merge.

Please confirm what I already suspect so I can start rolling my own function using FIELD, FIELDSTORE, and fiddling with the input seq file delimiters.

eostic · Post by **eostic** » Tue Mar 17, 2009 6:41 am

I haven't used the RowMerger as much as the RowSplitter, but it might be worth looking at the history....these stages were not designed to perform any meaningful manipulation of the row --- but to dynamically re-assign metadata (in memory) to an existing row that is coming down-stream. If it doesn't do partial row merging, it wasn't short-sighted -- it simply wasn't the goal of the Stage ---- these were designed to provide an in-memory solution to a technique that is often used --- writing a file to disk on an input link so that it could be read via output link with entirely alternative metadata, and written as part of the distribution of RTI because the Web Services architecture prevents a job from having a passive stage mid-stream dropping data to disk.

That being said, a partial merge would be useful....

...on the actual topic, I'm not entirely understanding your requirement. Couldn't you just re-order the columns as necessary in a Transformer immediately before the RowMerger, and then use RowMerger as designed?

Ernie

sureshreddy2009 · Post by **sureshreddy2009** » Tue Mar 17, 2009 7:14 am

eostic wrote:I haven't used the RowMerger as much as the RowSplitter, but it might be worth looking at the history....these stages were not designed to perform any meaningful manipulation of the row --- but to dynamically re-assign metadata (in memory) to an existing row that is coming down-stream. If it doesn't do partial row merging, it wasn't short-sighted -- it simply wasn't the goal of the Stage ---- these were designed to provide an in-memory solution to a technique that is often used --- writing a file to disk on an input link so that it could be read via output link with entirely alternative metadata, and written as part of the distribution of RTI because the Web Services architecture prevents a job from having a passive stage mid-stream dropping data to disk.

That being said, a partial merge would be useful....

...on the actual topic, I'm not entirely understanding your requirement. Couldn't you just re-order the columns as necessary in a Transformer immediately before the RowMerger, and then use RowMerger as designed?

Ernie

:D Hi iam suresh,
i know parallel jobs but i dont know server jobs, in these days iam learning server also, can u please explain me what row merger and row splitter do, i know now our topic is not that but can u please explain me this two stages

eostic · Post by **eostic** » Tue Mar 17, 2009 7:51 am

They are special tools....not needed that often, but great when necessary....I had a recent scenario where someone had "captured" a long repeating group from a mainframe file into an XML document. Strange indeed -- they probably were rushed and ran out of time to design a more formal model for their xml document. So...in one element there was a whole row. Easy to capture from XMLInput, but to break it up manually would have meant a lot of substring calls. Far simpler to just push it into the RowSplitter, and pop out 30 columns. Simplifying the example, consider a long string of repeating 5 character fields:

1111122222333334444455555666667777788888

Send that into the RowSplitter as one column called "myGroup"....the output link simply has 8 columns, Col1 thru Col8, each with Character 5 as their length.....

No transforms to manage or painfully execute at runtime...just metadata.

Of course, if it was just "two", I'd use a Transformer.....but in my case it was close to 50.

RowMerger largely does the opposite. Both are patterned off of the Sequential Stage. Ugly interfaces, but you'll recognize all of the properties as those that exist for the standard Server side Seq Stage.

Ernie

sbass1 · Post by **sbass1** » Tue Mar 17, 2009 8:25 am

eostic wrote:I haven't used the RowMerger as much as the RowSplitter, but it might be worth looking at the history....these stages were not designed to perform any meaningful manipulation of the row (emphasis added) --- but to dynamically re-assign metadata (in memory) to an existing row that is coming down-stream. If it doesn't do partial row merging, it wasn't short-sighted -- it simply wasn't the goal of the Stage ---- these were designed to provide an in-memory solution to a technique that is often used --- writing a file to disk on an input link so that it could be read via output link with entirely alternative metadata (yes this is the technique I'm currently using since I can't get Row Merger to do what I want), and written as part of the distribution of RTI because the Web Services architecture prevents a job from having a passive stage mid-stream dropping data to disk.

That being said, a partial merge would be useful....

...on the actual topic, I'm not entirely understanding your requirement. Couldn't you just re-order the columns as necessary in a Transformer immediately before the RowMerger, and then use RowMerger as designed? (No because the RowMerger includes 1) the key fields and 2) "insignificant variables" in the merged column. If I then need to parse out the undesired variables, at this point I might as well roll my own concatenation code)

Ernie

Hi Ernie,

Re: the origin of the stage design, I still find it shortsighted that I can't select the columns to merge. The DS developers would still have met their design goal (as you stated it above) by making the stage more flexible.

Re: the context of what I'm trying to do, see:

viewtopic.php?t=125895&highlight=server%20merge, scroll down to vmcburney's post

viewtopic.php?t=125989&highlight=

viewtopic.php?p=320343#320343

In summary, I'm trying to build a concatenated list of variables, delimited by some delimiter, with nulls converted to empty, to feed to CRC32. This is all to implement Changed Data Capture (CDC) and SCD Type 2. My input data does not have the keys all at the front of the line, and the variables of interest are non-contiguous.

And perhaps all of this goes away if we upgrade to 8.x and have the SCD stage, but that's moot at present.

Thanks...

chulett · Post by **chulett** » Tue Mar 17, 2009 8:49 am

Actually you're doing CDD or Change Data Detection, not CDC.

Why not just do as Vince posted? Read your landed data as one long delimited string, use calls to Field() to pull out the individual key fields and then one or more Field() calls can pull out everything else (whichever are contiguous) and lastly Convert() can flip all the internal delimiters in that 'one field' to whatever value you desire.

And it would seem to be fairly trivial to re-arrange the fields into the order you'd need to support this. Yes, that could mean another copy of the file and yes, it will probably generate another 'short-sighted' knock on the product comment, but what the heck.

ray.wurlod · Post by **ray.wurlod** » Tue Mar 17, 2009 3:26 pm

Looks like we need that RMM stage back on the enhancement wish list...

DSXchange

Can the Row Merger stage concatenate a subset of the columns

Can the Row Merger stage concatenate a subset of the columns

Can u explain me row merger and row splitter