Px design approach question - fishing for better ideas

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Px design approach question - fishing for better ideas

Post by ArndW »

I have a requirement that has me taking about 400 DB/2 tables and loading them into Oracle tables with the same table names and column names. The datatypes are being changed (different date handling, some trims, nulls modified, etc.) according to rules based on the datatypes and not on the column names.

My current approach is that I am going to write some code (in DataStage BASIC) to auto-generate .dsx files from the DB/2 and Oracle metadata along with the conversion rules - this is easier than hand-coding 400 jobs.

I have just started the coding and the thought came to me that someone here might have a better approach or suggestion for this set of jobs - that I might have fixated upon one solution and be ignoring some other simpler way of doing this (can't see the schema for all the columns or something along those lines).

Any and all suggestions accepted :roll:
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Why can't you put an Oracle stage on the palette and drag the link over from DB2 to Oracle and save everything but the connection information. As long as they do not change the column names then just import the new tables and overwrite the column definitons to change the data types. I would think you could do this very fast.

Changing a stage type in a DSX iis not something I have done. I would think that might be difficult. I have changed the column names in a DSX with BASIC. Send me what you got in a private email and I might be able to modify what I got to meet your needs.
Mamu Kim
logic
Participant
Posts: 115
Joined: Thu Feb 24, 2005 10:48 am

Post by logic »

Small fish ..big mouth..sorry
Kim's approach though simple would require coding for all 400 files that Arnd is getting...right?
Ash.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Borrowing Kim's assumptions that all names remain the same, and adding an assumption that column orders remain intact.

Just create 1 parameterized job and send the table name as a parameter.

1) DB2Read (select * from #table#)

link

2) OracleWrite (insert *)

If you make it a multi-instance job, you can run tables concurrently.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

bcarlson

I am not sure I follow what you are trying to do.

logic

I think each job just takes a few minutes. Ever try to switch stage types in a DSX? If so post your code.
Mamu Kim
logic
Participant
Posts: 115
Joined: Thu Feb 24, 2005 10:48 am

Post by logic »

Sorry Kim,
Never tried it...so was unable to recognise the tradeoff between (400*few minutes) Vs Stage type switching in a DSX...my bad.
Thanks.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

kduke wrote:bcarlson

I am not sure I follow what you are trying to do.
In your DB2 stage doing the read, set the Read Method to Table. If you do not specify a select list or where clause, this simply runs a 'select *' from the table you specify. Then parameterize the tablename. I believe Oracle uses a similar concept of tables as DB2, using a schemaname.tablename format? So create 3 parameters: ds_db2schema, ds_oraschema, and ds_tablename. Then, in your DB2 read, reference the source table as #ds_db2schema#.#ds_tablename#.

Link this to your Oracle stage. Now, I have not used Oracle in DS/PX, but I assume it works similarly to DB2's stage. If so, then supply the target table name as #ds_oraschema#.#ds_tablename#. You don't supply a select list so that it will default to all columns.

This method will, effectively, be the same as

insert into myOracleTargetTable
select * from myDB2SourceTable
;

We don't use the Sequencer, so I don't know how you would iterate through all 400 tables. We do everything in Unix, so you can use the dsjob program at the Unix commandline (probably Windows, too) that comes with PX. In Unix, it would be very easy to run through all your jobs. Better yet, split your 400 jobs into 4 sets of 100 and run 4 jobs concurrently.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I misread the original post. These are straight table copies. I was think this was 400 jobs that need to be converted from DB2 jobs to Oracle jobs. This is much easier.

Do one job exactly like you want. Then import all the tables. Export the one job and all the tables and send it to me. I can generate all 400 jobs. I send you the DSX back.

Make your link names like OrderDimIn and OrderDimOut for ORDER_DIM. Do you comments the same. Keep them generic. My job will parse the DSX change ORDER_DIM into new table to process and do string replaces for OrderDim to CustomerDim or whatever the new table is. I have already done this a couple times. Jobs even compile when done. I always have to fine tune it because DB2 stages are not something I have dealt with before. Should save you hours but do it the hard way if you want. The metadata is easily read from DS_METADATA. Ray posted code to show how to do that. Hacking a DSX was a little tougher.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

That is a good choice. The Oracle Migration Workbench I think is the name.

If you need to create the tables with DataStage then there is a routine to create the DDL from records in DS_METADATA called KgdCreateDdlMetadata. It is on my tips page. I think it has been posted on this site too.
Mamu Kim
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Wow - many more responses than I had expected!

I think one part that I didn't really stress is important, otherwise I could have used DS's PX facility of not having to declare DB columns - each column needs to have a transform applied to it. The table & columns are the same, but the datatypes are different. CHAR goes to VARCHAR (needs to be trimmed & CHAR(000) removed), all date, timestamp values need to be converted, a lot of decimal data types have different precisions, etc.

There are 21077 columns in total but I can distill the conversion rules down into a distinct list. Thus I thought of my approach. I actually get all the table & column data from the DS_METADATA already.

Thanks for all the great suggestions so far, even though they might not work for this particular flow they have certainly opened a number of avenues of thought for me, and hopefully for those reading this thread as well.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, lots of good information here - but it sounds like you should stick with your original approach, Arnd. I've worked with someone in the past to do something very similar to this and it worked out rather... nicely. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Craig,

I'm just starting to pluck apart the PX export file. It's pretty long and complex, but I think that Im on the right track now. With all those columns I have distilled it down to only 8 conversion rules - much less than what I had feared!

Thanks again for all the suggestions and help. It really is a fantastic place to go for both general questions as well as for a final resort with issues.
Post Reply