Compare stage, output columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

cviewjones
Participant
Posts: 4
Joined: Thu Jul 14, 2005 12:11 pm

Compare stage, output columns

Post by cviewjones »

I am playing with the compare stage for the first time, and can't get the ouput columns (or find any notes on it online).

I am trying to peek the results of a compare, but am getting either non-compilations (I have a peek output and don't declare the output columns), or runtime errors where the first.rec schema is dropped for whatever reason.

I am interested in doing something similar to what is listed in the parallel job devleopers manual, looking like (I apologize for the formatting, see the related table on page 563 of the manual):

Code: Select all

Result First Second
bcol0 bcol1 bcol2 bcol3 bcol4 bcol0 bcol1 bcol2 bcol3 bcol4
0 0 0 0 0 a 0 0 0 0 a
2 1 7 1 1 b 1 1 1 1 b
0 2 2 2 2 c 2 2 2 2 c
0 3 3 3 3 d 3 3 3 3 d
2 4 5 4 4 e 4 4 4 4 e
-1 5 2 5 5 f 5 5 5 5 f
0 6 6 6 6 g 6 6 6 6 g
0 7 7 7 7 h 7 7 7 7 h
0 8 8 8 8 i 8 8 8 8 i
0 9 9 9 9 j 9 9 9 9 j
http://www.ascentialsoftware.com/service/products/techLibrary/books/ds70/enu/parjdev.pdf
2 part question:
A) how do I label the output columns
B) and how do I determine what of the input columns are going to be compared (I only want to compare a subset).

Thanks
ukyrvd
Premium Member
Premium Member
Posts: 73
Joined: Thu Feb 10, 2005 10:59 am

Post by ukyrvd »

Columns getting dropped issue..
Runtime column propagation might be the culprit. Uncheck RCP on atleast one of the inputs and compile. OR try renaming one set of columns .. say add "_sec" to each column name

A. Define them in output columns grid
B. Use KEY option on properties tab of the stage to specify the columns to compare.
thank you
- prasad
cviewjones
Participant
Posts: 4
Joined: Thu Jul 14, 2005 12:11 pm

Post by cviewjones »

A) turns out that runtime column propogation was NOT checked, so I was not getting any help (or interference) from that. (With that said, I experimented with checking it, and it still gave me compile time error on the compare stage. Do I need to recreate the stage from scratch or add another option in order for RCP to work? Ideally, I like the idea of not having to note my output columns if it would work correctly....)

B) Key option is what I was looking for. Thanks

C) About the output of columns. Since many of the input columns have the same name in both links, when I specify the OUTPUT column name, how do I specify which input link it was defined from? There seems to be no mapping option/tab. Or does the the compare stage's output tab not work like the other stages I am thinking of.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

I am facing a similar issue, I would like to compare all columns in the 2 datasets and if there is any change in any of the columns I want them to updated and if they are new those records needs to be inserted.
When I turn on the RCP I am not able to see any column names and also even view data disappears. After CMP stage I would want to access the first column and based on that I would want to filter the records. Since I donot see any columns I am not sure if I can put the constraint. and when I define my own columns with different names for 2 datasets and a result column I am getting the following error:
main_program: Fatal Error: Attempt to drop schema variable: "first.rec"; this may be due to dropping a subrec which contains a schema variable; this operation is not currently supported; to work around this limitation, insert a copy operator between the modify and the upstream operator.
. I am getting this error regardless of RCP turned on or not. Also Can I acheive the similar functionality using DIFF stage?I think it only outputs one of the 2 datasets. I want everything from the new dataset and the diff/result column which says if the record is changed or not.
Thanks in advance.
Kris

Where's the "Any" key?-Homer Simpson
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Difference stage outputs all columns (individually) from both input Data Sets, plus the result of the comparison as a coded value (codes are set up as properties) as one additional column.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

ray.wurlod wrote:Difference stage outputs all columns (individually) from both input Data Sets, plus the result of the comparison as a coded value (codes are set up as properties) as one additional column.
I am trying to use the difference stage, but if I want to get all columns from both datasets in one row, like

Code: Select all

a.col1,a.col2,b.col1,b.col2,diff.
If I the column names for both datasets are same then it outputs only one record at a time like

Code: Select all

a.col1,a.col2,diff
b.col1,b.col2,diff
The problem with the above method is I only want the columns from the after dataset 'a' so how can I identify what records are from after dataset becomes one it comes out of diff stage both datasets are mixed.
Manual says if I rename the column names then I can get all the columns from both datasets but then it complains about renamed columns not being found as they have different name now.Is there a way to identify what records come from what dataset once they come out of the diff stage?
Kris

Where's the "Any" key?-Homer Simpson
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Compare stage, output columns

Post by michaeld »

For the compare stage, all you have to do is define the following output columns:

result : tinyint
first : unknown (these are subrecords)
second: unknonw

If you don't want to use the data because all you want is the result code then you can add a MODIFY stage and specify "KEEP result".

I hope that helps.
nkln@you
Premium Member
Premium Member
Posts: 271
Joined: Wed Nov 17, 2004 5:15 am
Location: US

Post by nkln@you »

I am new to PX and I am facing "first.rec schema is dropped " error.

1.how to uncheck Run time column propagation .

2.Also how to define the columns in Output Tab.

"For the compare stage, all you have to do is define the following output columns:

result : tinyint
first : unknown (these are subrecords)
second: unknonw "
Aim high
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Post by michaeld »

I just wrote that in the message above yours:

Output schema:

result : tinyint
first : unknown (these are subrecords)
second: unknonw


Runtime column propegation is checked off in the DS admin and also in the the Advanced tab of the stage.
Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try defining the subrecords so that the individual column names within them do not clash. The following is conceptual, may not be syntactically accurate.

Code: Select all

record schema (
   result: int8;
   oldrecord: subrecord {
                      oldfirstcol: int32;
                      oldsecondcol: string[max=32];
                      oldthirdcol: nullable date;
                  }
   newrecord: subrecord {
                      newfirstcol: int32;
                      newsecondcol: string[max=32];
                      newthirdcol: nullable date;
                  }
)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

ray.wurlod wrote:Difference stage outputs all columns (individually) from both input Data Sets, plus the result of the comparison as a coded value (codes are set up as properties) as one additional column. ...
Ray,
Can you slightly elaborate on this...
plus the result of the comparison as a coded value (codes are set up as properties) as one additional column.
I'm getting a colum called diff which gives a coded value [0/1] etc and trying to decipher this...!![/b]
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Read about the codes in Parallel Developer's Guide. Its all there in the Change Capture chapter.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

diff is simply the name of the output column.

There are five properties in the stage that let you specify the output value when the result of the comparison between (Key columns in) the first input and those in the second input is one of:
  • second equals first (option 'Equals' Value default 0)

    first is empty (option 'First Is Empty' Value default 1)

    second is empty (option 'Second Is Empty' Value default -2)

    first is greater than second (option 'Greater Than' Value default 2)

    first is less than second (option 'Less Than' Value default -1)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

Ray, according to the Developers Guide, page 33-3, in the example, if there is no difference between any of the columns then diff gets a value of 2. This is for the Difference stage.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

YOU can define the encoding using stage properties. The defaults are different for different stage types (Compare, Difference, Change Capture).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply