Page 1 of 2

RCP - CDC Stage - Partitoning and Sorting

Posted: Thu Jul 02, 2015 12:13 pm
by koti9
Hello all,

We are trying to do one single job with CDC Stage(comparing file and db) in it and also RCP Enabled. Did anyone try that, if yes, how did you handle the Hash Partitioning and sorting in it? Any help would be greatly appreciated.

Thanks & Regards
Koti

Posted: Thu Jul 02, 2015 2:03 pm
by Mike
That most likely depends on your plan for handling a variable number of key fields.

I opted for using a single concatenated key called KEY, so the partitioning and sorting was simply by KEY.

I found a generic RCP-enabled shared container to be more useful than a generic RCP-enabled job. Gives you more flexibility on prepping the new incoming data, especially if you need to do complex transformations.

Mike

Posted: Sun Jul 05, 2015 4:28 am
by priyadarshikunal
I second the thought of having single concatanated field for key. You can even have the key field location or start end position parameterized.

Posted: Sun Jul 05, 2015 4:35 am
by priyadarshikunal
viewtopic.php?t=153932

This post should give you little more detail on how to do it for files.

Posted: Tue Jul 07, 2015 1:37 pm
by koti9
Hi Mike,

Thanks for your input, I too have dynamic number of keys based on table, and i want to follow your approach, exactly how did you generate the concatenated key (Through Parameter Set Values???), if yes , can you give me more details. Just to make things spicy, we want it be NLS Compatiable as we get different languages..So all three in one job(RCP,NLS and using CDC Stage)


Thanks & Regards
Koti

Posted: Wed Jul 08, 2015 3:01 am
by ShaneMuir
If you have a dynamic list of keys, then you need to be able to identify those key fields. Using your DBs system tables should allow you to do that.

Using the system tables to identify the key values you can then generate an SQL statement which concatenates those values into a single output column as well as naming any other fields that you require. You then pass that SQL as either a parameter, or write it to a file and pass the file name to your DB stage.

As long as your job is set up to receive those foreign language chars (ie NLS is enabled, you're using NVarChar or Varchar (Unicode), using the correct character set mapping etc) then NLS should not be an issue.

Posted: Thu Jul 09, 2015 12:52 pm
by koti9
Hi Shane/Mike,

Once again Thanks for your responses,
I got a break through with CDC and RCP.But it is working only with one key field(that i pass through parameter set). Here is my design.

Code: Select all

SeqFile----Sort-----CDC----Transformer----Modify-----Oralce
                     |
                     |
                     |
                   Oracle
I was able to pass the sort keys, CDC Keys via (parameter set value files),
But if i am trying for multiple use multiple keys(with nulls in some of them), Its failing.As long as the parameter set value file has some column in varaible name, its working. Did you try this method of RCP??

Let me know your thoughts...I can provide the error message also...i already searched error the forum for the same, with zero hits...

Thanks & Regards
Koti

Posted: Thu Jul 09, 2015 1:13 pm
by chulett
<edited your post to add code tags to preserve whitespace>

Posted: Thu Jul 09, 2015 2:06 pm
by Mike
Nice job so far Koti.

I tend to like using parameter set values files to support my generic multi-instance RCP jobs since it provides a nice container for organizing everything that needs parameterizing.

Now you can appreciate the challenge of handling a variable number of keys. And you also have the additional requirement of handling potential NULL key values.

Let me ask a question to start the thought process for handling your keys:
What is the maximum number of keys that you need to be able to support?

Mike

Posted: Thu Jul 09, 2015 3:09 pm
by koti9
Thanks Mike,

We are planning max of 5 keys as of now...


Thanks
Koti

Posted: Fri Jul 10, 2015 2:28 am
by ShaneMuir
Can you just quickly explain how you are setting your key value field for both the DB and the Seq file?

Are you passing a set of column names from a parameter set?

Posted: Fri Jul 10, 2015 7:13 am
by chulett
And Koti, when you do - please stop quoting everything you are replying to. Save that for when it makes sense, otherwise learn to love the Reply to topic link. :wink:

Posted: Fri Jul 10, 2015 10:58 am
by koti9
Sorry Craig, Will follow that...


Shane,

Thats right i am sending the Key columns via parameter set value file.

I was able get to a temporary work around, i am passing separate schema file just with keys and using ColumnExport Stage to concatenate dynamically.....As i can't mention the key in my original schema file...
So my Sequential File have original schema file and Column Export stage have another schema file with keys.

Posted: Fri Jul 10, 2015 11:09 am
by chulett
koti9 wrote:As i can't mention the key in my original schema file
Sorry, but... why can you not?

Posted: Fri Jul 10, 2015 12:16 pm
by koti9
Hi Craig,

There is no provision to mention the key in the schema file.
I think it is like that for a while....


Thanks & Regards
Koti