Join stage not working in CDC parallel job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mayurkb
Participant
Posts: 11
Joined: Thu Jan 14, 2016 3:53 pm

Join stage not working in CDC parallel job

Post by mayurkb »

We have a CDC parallel job running in continuous mirroring mode. We are using join stage to lookup extra information. The join stage pulls the data from the referenced table correctly for the first time. But for subsequent records, it does not pull anything from the referenced table. So with inner join, nothing gets to next stage. With left join, I get the data coming in CDC stage but all the data coming in from referenced table is NULL.
What could be the solution here? I tried setting buffer value to No Buffer but that did not work either.
We also tried Lookup stage to do this instead of join stage but we found out that it does not refresh the data when new data is added in the referenced table.
Any help is appreciated.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

By "continuous mirroring mode", I'm going to assume you're saying this is an ISD job. If so, you'll need to find the documentation on the data source restrictions for designing ISD jobs. I think this was in an IBM Redbook that's been out there for years.

Mike
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And from what little I know, you really shouldn't be doing more than moving the data straight to a staging area... continuously. Some other process should be doing all the other work, I do believe.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do some research into "units of work" and "Waves" in DataStage. CDC does not naturally split what it retrieves into units of work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yeah, or that. Have fun. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
mayurkb
Participant
Posts: 11
Joined: Thu Jan 14, 2016 3:53 pm

Post by mayurkb »

Thank you all for the inputs. I'll see if I find any books that puts out restrictions.
I did look into EOW (end of wave) markers so I'm using ODBC stage which has an option to emit those waves or not and I've disabled that so that it does not interfere with main data and its EOW markers.
I do not know what is ISD. I've CDC job where first stage is CDC which gives me changed data for 1 or more tables based on my subscription.
I need real time ETL, so even if I sent this to some kind of staging database and read and transform that data. Wouldn't I need another CDC + ETL to read and transform that data?
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

ISD is the product that lets you deploy DataStage and QualityStage jobs as real-time web services. These are "always-on" jobs, just like some CDC (data replication) jobs or MQ jobs may be set to always be running.

Read through "Chapter 16. Realtime data flow design" in this IBM Redbook.

InfoSphere DataStage Parallel Framework Standard Practices
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply