Page 1 of 1

Join stage not working in CDC parallel job

Posted: Fri May 20, 2016 2:01 pm
by mayurkb
We have a CDC parallel job running in continuous mirroring mode. We are using join stage to lookup extra information. The join stage pulls the data from the referenced table correctly for the first time. But for subsequent records, it does not pull anything from the referenced table. So with inner join, nothing gets to next stage. With left join, I get the data coming in CDC stage but all the data coming in from referenced table is NULL.
What could be the solution here? I tried setting buffer value to No Buffer but that did not work either.
We also tried Lookup stage to do this instead of join stage but we found out that it does not refresh the data when new data is added in the referenced table.
Any help is appreciated.

Posted: Fri May 20, 2016 4:54 pm
by Mike
By "continuous mirroring mode", I'm going to assume you're saying this is an ISD job. If so, you'll need to find the documentation on the data source restrictions for designing ISD jobs. I think this was in an IBM Redbook that's been out there for years.

Mike

Posted: Fri May 20, 2016 8:50 pm
by chulett
And from what little I know, you really shouldn't be doing more than moving the data straight to a staging area... continuously. Some other process should be doing all the other work, I do believe.

Posted: Mon May 23, 2016 4:34 am
by ray.wurlod
Do some research into "units of work" and "Waves" in DataStage. CDC does not naturally split what it retrieves into units of work.

Posted: Mon May 23, 2016 6:35 am
by chulett
Yeah, or that. Have fun. :wink:

Posted: Mon May 23, 2016 10:33 am
by mayurkb
Thank you all for the inputs. I'll see if I find any books that puts out restrictions.
I did look into EOW (end of wave) markers so I'm using ODBC stage which has an option to emit those waves or not and I've disabled that so that it does not interfere with main data and its EOW markers.
I do not know what is ISD. I've CDC job where first stage is CDC which gives me changed data for 1 or more tables based on my subscription.
I need real time ETL, so even if I sent this to some kind of staging database and read and transform that data. Wouldn't I need another CDC + ETL to read and transform that data?

Posted: Mon May 23, 2016 2:10 pm
by qt_ky
ISD is the product that lets you deploy DataStage and QualityStage jobs as real-time web services. These are "always-on" jobs, just like some CDC (data replication) jobs or MQ jobs may be set to always be running.

Read through "Chapter 16. Realtime data flow design" in this IBM Redbook.

InfoSphere DataStage Parallel Framework Standard Practices