Compare stage, output columns

cviewjones · Post by **cviewjones** » Mon Aug 01, 2005 11:01 am

I am playing with the compare stage for the first time, and can't get the ouput columns (or find any notes on it online).

I am trying to peek the results of a compare, but am getting either non-compilations (I have a peek output and don't declare the output columns), or runtime errors where the first.rec schema is dropped for whatever reason.

I am interested in doing something similar to what is listed in the parallel job devleopers manual, looking like (I apologize for the formatting, see the related table on page 563 of the manual):

Code: Select all

Result First Second
bcol0 bcol1 bcol2 bcol3 bcol4 bcol0 bcol1 bcol2 bcol3 bcol4
0 0 0 0 0 a 0 0 0 0 a
2 1 7 1 1 b 1 1 1 1 b
0 2 2 2 2 c 2 2 2 2 c
0 3 3 3 3 d 3 3 3 3 d
2 4 5 4 4 e 4 4 4 4 e
-1 5 2 5 5 f 5 5 5 5 f
0 6 6 6 6 g 6 6 6 6 g
0 7 7 7 7 h 7 7 7 7 h
0 8 8 8 8 i 8 8 8 8 i
0 9 9 9 9 j 9 9 9 9 j
http://www.ascentialsoftware.com/service/products/techLibrary/books/ds70/enu/parjdev.pdf

2 part question:
A) how do I label the output columns
B) and how do I determine what of the input columns are going to be compared (I only want to compare a subset).

Thanks

ukyrvd · Post by **ukyrvd** » Mon Aug 01, 2005 11:08 am

Columns getting dropped issue..
Runtime column propagation might be the culprit. Uncheck RCP on atleast one of the inputs and compile. OR try renaming one set of columns .. say add "_sec" to each column name

A. Define them in output columns grid
B. Use KEY option on properties tab of the stage to specify the columns to compare.

cviewjones · Post by **cviewjones** » Mon Aug 01, 2005 11:21 am

A) turns out that runtime column propogation was NOT checked, so I was not getting any help (or interference) from that. (With that said, I experimented with checking it, and it still gave me compile time error on the compare stage. Do I need to recreate the stage from scratch or add another option in order for RCP to work? Ideally, I like the idea of not having to note my output columns if it would work correctly....)

B) Key option is what I was looking for. Thanks

C) About the output of columns. Since many of the input columns have the same name in both links, when I specify the OUTPUT column name, how do I specify which input link it was defined from? There seems to be no mapping option/tab. Or does the the compare stage's output tab not work like the other stages I am thinking of.

kris007 · Post by **kris007** » Wed Jul 05, 2006 10:17 am

I am facing a similar issue, I would like to compare all columns in the 2 datasets and if there is any change in any of the columns I want them to updated and if they are new those records needs to be inserted.
When I turn on the RCP I am not able to see any column names and also even view data disappears. After CMP stage I would want to access the first column and based on that I would want to filter the records. Since I donot see any columns I am not sure if I can put the constraint. and when I define my own columns with different names for 2 datasets and a result column I am getting the following error:

main_program: Fatal Error: Attempt to drop schema variable: "first.rec"; this may be due to dropping a subrec which contains a schema variable; this operation is not currently supported; to work around this limitation, insert a copy operator between the modify and the upstream operator.

. I am getting this error regardless of RCP turned on or not. Also Can I acheive the similar functionality using DIFF stage?I think it only outputs one of the 2 datasets. I want everything from the new dataset and the diff/result column which says if the record is changed or not.
Thanks in advance.

ray.wurlod · Post by **ray.wurlod** » Wed Jul 05, 2006 7:15 pm

Difference stage outputs all columns (individually) from both input Data Sets, plus the result of the comparison as a coded value (codes are set up as properties) as one additional column.

kris007 · Post by **kris007** » Thu Jul 06, 2006 2:33 pm

ray.wurlod wrote:Difference stage outputs all columns (individually) from both input Data Sets, plus the result of the comparison as a coded value (codes are set up as properties) as one additional column.

I am trying to use the difference stage, but if I want to get all columns from both datasets in one row, like

Code: Select all

a.col1,a.col2,b.col1,b.col2,diff.

If I the column names for both datasets are same then it outputs only one record at a time like

Code: Select all

a.col1,a.col2,diff
b.col1,b.col2,diff

The problem with the above method is I only want the columns from the after dataset 'a' so how can I identify what records are from after dataset becomes one it comes out of diff stage both datasets are mixed.
Manual says if I rename the column names then I can get all the columns from both datasets but then it complains about renamed columns not being found as they have different name now.Is there a way to identify what records come from what dataset once they come out of the diff stage?

michaeld · Post by **michaeld** » Fri Feb 02, 2007 3:01 pm

For the compare stage, all you have to do is define the following output columns:

result : tinyint
first : unknown (these are subrecords)
second: unknonw

If you don't want to use the data because all you want is the result code then you can add a MODIFY stage and specify "KEEP result".

I hope that helps.

nkln@you · Post by **nkln@you** » Mon Feb 12, 2007 2:32 am

I am new to PX and I am facing "first.rec schema is dropped " error.

1.how to uncheck Run time column propagation .

2.Also how to define the columns in Output Tab.

"For the compare stage, all you have to do is define the following output columns:

result : tinyint
first : unknown (these are subrecords)
second: unknonw "

michaeld · Post by **michaeld** » Mon Feb 12, 2007 7:43 am

I just wrote that in the message above yours:

Output schema:

result : tinyint
first : unknown (these are subrecords)
second: unknonw

Runtime column propegation is checked off in the DS admin and also in the the Advanced tab of the stage.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 12, 2007 2:42 pm

Try defining the subrecords so that the individual column names within them do not clash. The following is conceptual, may not be syntactically accurate.

Code: Select all

record schema (
   result: int8;
   oldrecord: subrecord {
                      oldfirstcol: int32;
                      oldsecondcol: string[max=32];
                      oldthirdcol: nullable date;
                  }
   newrecord: subrecord {
                      newfirstcol: int32;
                      newsecondcol: string[max=32];
                      newthirdcol: nullable date;
                  }
)

vijayrc · Post by **vijayrc** » Mon Feb 12, 2007 6:51 pm

ray.wurlod wrote:Difference stage outputs all columns (individually) from both input Data Sets, plus the result of the comparison as a coded value (codes are set up as properties) as one additional column. ...

Ray,
Can you slightly elaborate on this...
plus the result of the comparison as a coded value (codes are set up as properties) as one additional column.
I'm getting a colum called diff which gives a coded value [0/1] etc and trying to decipher this...!![/b]

DSguru2B · Post by **DSguru2B** » Mon Feb 12, 2007 7:01 pm

Read about the codes in Parallel Developer's Guide. Its all there in the Change Capture chapter.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 12, 2007 8:40 pm

diff is simply the name of the output column.

There are five properties in the stage that let you specify the output value when the result of the comparison between (Key columns in) the first input and those in the second input is one of:

second equals first (option 'Equals' Value default 0)

first is empty (option 'First Is Empty' Value default 1)

second is empty (option 'Second Is Empty' Value default -2)

first is greater than second (option 'Greater Than' Value default 2)

first is less than second (option 'Less Than' Value default -1)

splayer · Post by **splayer** » Wed Jun 20, 2007 4:07 pm

Ray, according to the Developers Guide, page 33-3, in the example, if there is no difference between any of the columns then diff gets a value of 2. This is for the Difference stage.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 20, 2007 6:01 pm

YOU can define the encoding using stage properties. The defaults are different for different stage types (Compare, Difference, Change Capture).