I am playing with the compare stage for the first time, and can't get the ouput columns (or find any notes on it online).
I am trying to peek the results of a compare, but am getting either non-compilations (I have a peek output and don't declare the output columns), or runtime errors where the first.rec schema is dropped for whatever reason.
I am interested in doing something similar to what is listed in the parallel job devleopers manual, looking like (I apologize for the formatting, see the related table on page 563 of the manual):
Result First Second
bcol0 bcol1 bcol2 bcol3 bcol4 bcol0 bcol1 bcol2 bcol3 bcol4
0 0 0 0 0 a 0 0 0 0 a
2 1 7 1 1 b 1 1 1 1 b
0 2 2 2 2 c 2 2 2 2 c
0 3 3 3 3 d 3 3 3 3 d
2 4 5 4 4 e 4 4 4 4 e
-1 5 2 5 5 f 5 5 5 5 f
0 6 6 6 6 g 6 6 6 6 g
0 7 7 7 7 h 7 7 7 7 h
0 8 8 8 8 i 8 8 8 8 i
0 9 9 9 9 j 9 9 9 9 j
http://www.ascentialsoftware.com/service/products/techLibrary/books/ds70/enu/parjdev.pdf
2 part question:
A) how do I label the output columns
B) and how do I determine what of the input columns are going to be compared (I only want to compare a subset).
Columns getting dropped issue..
Runtime column propagation might be the culprit. Uncheck RCP on atleast one of the inputs and compile. OR try renaming one set of columns .. say add "_sec" to each column name
A. Define them in output columns grid
B. Use KEY option on properties tab of the stage to specify the columns to compare.
A) turns out that runtime column propogation was NOT checked, so I was not getting any help (or interference) from that. (With that said, I experimented with checking it, and it still gave me compile time error on the compare stage. Do I need to recreate the stage from scratch or add another option in order for RCP to work? Ideally, I like the idea of not having to note my output columns if it would work correctly....)
B) Key option is what I was looking for. Thanks
C) About the output of columns. Since many of the input columns have the same name in both links, when I specify the OUTPUT column name, how do I specify which input link it was defined from? There seems to be no mapping option/tab. Or does the the compare stage's output tab not work like the other stages I am thinking of.
I am facing a similar issue, I would like to compare all columns in the 2 datasets and if there is any change in any of the columns I want them to updated and if they are new those records needs to be inserted.
When I turn on the RCP I am not able to see any column names and also even view data disappears. After CMP stage I would want to access the first column and based on that I would want to filter the records. Since I donot see any columns I am not sure if I can put the constraint. and when I define my own columns with different names for 2 datasets and a result column I am getting the following error:
main_program: Fatal Error: Attempt to drop schema variable: "first.rec"; this may be due to dropping a subrec which contains a schema variable; this operation is not currently supported; to work around this limitation, insert a copy operator between the modify and the upstream operator.
. I am getting this error regardless of RCP turned on or not. Also Can I acheive the similar functionality using DIFF stage?I think it only outputs one of the 2 datasets. I want everything from the new dataset and the diff/result column which says if the record is changed or not.
Thanks in advance.
Difference stage outputs all columns (individually) from both input Data Sets, plus the result of the comparison as a coded value (codes are set up as properties) as one additional column.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod wrote:Difference stage outputs all columns (individually) from both input Data Sets, plus the result of the comparison as a coded value (codes are set up as properties) as one additional column.
I am trying to use the difference stage, but if I want to get all columns from both datasets in one row, like
The problem with the above method is I only want the columns from the after dataset 'a' so how can I identify what records are from after dataset becomes one it comes out of diff stage both datasets are mixed.
Manual says if I rename the column names then I can get all the columns from both datasets but then it complains about renamed columns not being found as they have different name now.Is there a way to identify what records come from what dataset once they come out of the diff stage?
Try defining the subrecords so that the individual column names within them do not clash. The following is conceptual, may not be syntactically accurate.
ray.wurlod wrote:Difference stage outputs all columns (individually) from both input Data Sets, plus the result of the comparison as a coded value (codes are set up as properties) as one additional column. ...
Ray,
Can you slightly elaborate on this... plus the result of the comparison as a coded value (codes are set up as properties) as one additional column.
I'm getting a colum called diff which gives a coded value [0/1] etc and trying to decipher this...!![/b]
There are five properties in the stage that let you specify the output value when the result of the comparison between (Key columns in) the first input and those in the second input is one of:
second equals first (option 'Equals' Value default 0)
first is empty (option 'First Is Empty' Value default 1)
second is empty (option 'Second Is Empty' Value default -2)
first is greater than second (option 'Greater Than' Value default 2)
first is less than second (option 'Less Than' Value default -1)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray, according to the Developers Guide, page 33-3, in the example, if there is no difference between any of the columns then diff gets a value of 2. This is for the Difference stage.