Informatica Table Definition Conversion to Datastage

rajan.n · Post by **rajan.n** » Tue Oct 27, 2009 9:44 am

Hi All
We are in process of converting Informatica to Datastage jobs. All I have
a question is can we import the Informatica Table Definition into
Datastage ? I tried exporting the Informatica Source Object in XML format and then converted to .xls and then took the Column names ... there is much manual work. Looking for some tool that can help us or any best method to convert all the Informatica Metadata to Datastage.

We have a different variety of source systems and we don't have any table
definition for them. All are like .dat files (COBOL files and don't have
copybook or COBOL file definition and we don't know from where they have been generated (some are mainframe, SASS...so on)

I heard like IBM Fast Track has something to do like this but am not pretty sure how far it helps. Can any one help in this ? Thank you.

Rajan

eostic · Post by **eostic** » Tue Oct 27, 2009 3:27 pm

FastTrack will help you create a source to target mapping, which then could lead to the generation of a DataStage Job....and it can import spreadsheets that already have source/target mappings....but that won't help you with the pure table definitions....

IBM Services has tools that can assist with this as a services engagement, and they have been discussed elsewhere in the forum.

There are lots of ways to get metadata for tables "into" DataStage, among them being reverse engineering of a table .dsx (export one and take a look at it --- it's not that difficult -- and also has been discussed in the forum in the past) ....you could create a dummy COBOL FD from your Informatica Metadata, or SQL CREATE statement and then import from those, or from ERwin or other design tool once you've created such an intermediary.

Ernie

vivekgadwal · Post by **vivekgadwal** » Tue Oct 27, 2009 5:41 pm

Ernie,

I have the following questions regarding your answer.

eostic wrote:There are lots of ways to get metadata for tables "into" DataStage, among them being reverse engineering of a table .dsx (export one and take a look at it --- it's not that difficult -- and also has been discussed in the forum in the past)

Can you shed more light on reverse engineering of a "table .dsx"? Did you mean, export the table definition as a .dsx file and edit it accordingly? If so, how does this help in importing tables that are not yet present in IIS?

eostic wrote: ....you could create a dummy COBOL FD from your Informatica Metadata, or SQL CREATE statement and then import from those, or from ERwin or other design tool once you've created such an intermediary.

As far as I understand about the Repository in IIS (which is less

), the table definitions get stored in an external Database (like Oracle) as opposed to the earlier versions right? I believe Informatica stores it in an external database (such as Oracle).
> If so, do you know if it is feasible to access those tables and convert them into a convenient format for IIS?
> How do we know where IIS stores its repository on (I mean which Database and what schema/tables)?
> And if we somehow manage to do the conversion, would we be able to view them directly inside the Table Definitions folder in DataStage?

These questions are in addition to what Rajan.n might have!!

Thanks for your help...

chulett · Post by **chulett** » Tue Oct 27, 2009 5:57 pm

vivekgadwal wrote:Can you shed more light on reverse engineering of a "table .dsx"? Did you mean, export the table definition as a .dsx file and edit it accordingly? If so, how does this help in importing tables that are not yet present in IIS?

He basically means to export an existing table definition (any one that would serve as a good example) and study it, learn the structures which are pretty straight-forward. Then you could build something to take the data from your Information files and 'convert' it into that .dsx format. That 'something' could even be a DataStage Server job.

rajan.n · Post by **rajan.n** » Tue Oct 27, 2009 7:07 pm

Hi Eostic,

Yes we are looking for IBM tools which helps in convertion of Table definition ,I am tried seaching in the forum and web on IBM tools I did not find any relevant to that. By chance if you remember can you please name it.

regarding "Reverse Engineering" Chulett has cleared me. Thanks to Chulett and Vivek.

I can export the Informatica Source Object in XML file. Is that what you mean in creating a dummy Cobol file Definition from Informatica ? There is a process like XML has to converted to XLS file then understand the data and create a CF definition. There are hundreds of files and each will have this kind of manual work.

SQL statement, probably this is the area I meight want to try. Please correct me If am wrong, Informatica can Generate an SQL statement no matter what kind of source is being used(like COBOL file, Flat File , DB2.. and so on). If that is true, is there is a chance to get it imported on Datastage ? else can you please let me know what tools need to used here apart from ERwin.

Thanks
Rajan

eostic · Post by **eostic** » Wed Oct 28, 2009 5:56 am

All the solutions are the same thing --- albeit with perhaps a few more steps in some cases than others.....

Find something (DataStage?) to read the xml metadata from the other tool. Manipulate that metadata and create whatever other file type that you are most comfortable with..... study a .dsx.......if you know cobol, create a COBOL FD.....if you like rdbms, generate a CREATE statement with complete DDL...... and if you still have access to the tool, use it to create SQL DDL.

Then import the new .dsx .....or import the COBOL, or generate the SQL table in any rdbms and import the table......

Each of these techniques merely generate a type of metadata that DS "can" import. You don't want to waste any time trying to do things directly with either tool's relational repository.

Ernie

vivekgadwal · Post by **vivekgadwal** » Thu Oct 29, 2009 9:51 am

eostic wrote:Find something (DataStage?) to read the xml metadata from the other tool. Manipulate that metadata and create whatever other file type that you are most comfortable with..... study a .dsx.......

Ernie,

Thanks a lot for your help. I am trying the "STUDY DSX AND EDIT IT" approach and I coded a few jobs for it. It seems to work fine (knock on wood). Just in case somebody else wonders what we meant by this discussion, the approach is as follows:

> Exported one of the existing CFD
> Created 3 template files from this existing IIS Table Definitions (for COBOL FDs)...1) Header 2) Sub-record 3) Footer
>> Header file has the header information in the .dsx file. This has to be changed for the date, time etc.
>> Sub-record has only the portion which starts with BEGIN DSSUBRECORD and ends with END DSSUBRECORD
>> Footer, just like header, has the footer information in the .dsx file. This too has to be edited.
> Export the source object from Informatica (exports as an XML file)
> Save that as a .csv file (remove any records that have more un-necessary fields like the row for a folder in Infa)
> Export it to the IIS server box
> Build a DS job to edit each of those Header, Footer and Sub-record files.
> Build a DataStage job to read the Informatica .csv file and call the job that edits the Sub-record template for each field.
> All of this edited data get captured into one file with extension of '.dsx'
> Bring the new .dsx file back to the Workstation (client) and import back into IIS DS.

Caveat is, I don't know how this process reacts when it encounters any occurs/redefines in the CFD. This is because, the template CFD table definition which I exported (in the first step) doesn't have any occurs/redefines.

Please suggest if there is a better approach.

Thanks for your help...

chulett · Post by **chulett** » Thu Oct 29, 2009 9:53 am

AFAIK, both 'occurs' and 'redefines' are not supported.

eostic · Post by **eostic** » Thu Oct 29, 2009 10:24 am

Now you are treading into deeper water. Complex flat files (with occurs, etc.) are handled in a more proprietary way by all the tools out there. Things aren't as standard. I'm sure it can still be done, but now you probably are going to need to have some deeper experience with the CFF stage and its behavior, and exports of the tables that are defined for use with it.

Non-occurs or non-recordtype structures and their metadata representation will be much more consistent among tools.

Ernie

vivekgadwal · Post by **vivekgadwal** » Thu Oct 29, 2009 11:38 am

chulett wrote:AFAIK, both 'occurs' and 'redefines' are not supported.

Craig, CFF stage supports Occurs as well as Redefines. I did not understand what you exactly meant by "not supported"!

eostic wrote:Now you are treading into deeper water. Complex flat files (with occurs, etc.) are handled in a more proprietary way by all the tools out there. Things aren't as standard. I'm sure it can still be done, but now you probably are going to need to have some deeper experience with the CFF stage and its behavior, and exports of the tables that are defined for use with it.

Very true Ernie. To achieve what I am doing with a table definition containing Occurs and Redefines, I have to do more "STUDY THE DSX" on them. I have to see how the pattern is, what is DS populating in the Sub-record fields etc. It can be coded for, but again, doing this itself was a little messy...as I had to do pattern search and stuff like that. So, I thought I will cross the bridge when I come to it.

The whole problem is, the client doesn't have any table structures available on hand (I mean CFDs). So, we are stuck with either doing all of this manually (copying field by field from Informatica) or follow the approach as suggested by you.

Question: I was going through some blogs/posts online and they were mentioning something about IBM Metabroker which can read any kind of metadata and store it in a repository. Does any one of you know about it? May be Ray Wurlod/Kim Duke can chip in if they know something about this too.(sorry to be so vague on this as I did not understand what they are trying to say in those posts)

chulett · Post by **chulett** » Thu Oct 29, 2009 12:01 pm

vivekgadwal wrote:Craig, CFF stage supports Occurs as well as Redefines. I did not understand what you exactly meant by "not supported"!

Well, by 'not supported' I literally meant not supported, as in it didn't support them and you have to remove / flatten them in the CFD before importing. Perhaps that's not true or no longer true in 8.x, I don't know as I don't use the stage personally and was just going on what I thought I recalled reading here. Ah well.

vivekgadwal · Post by **vivekgadwal** » Thu Oct 29, 2009 12:15 pm

Well...you are right Craig in the sense that, you have to flatten Occurs in order to read the data. CFF does that flattening work for us. Now, in 8.x, IBM has included the support for the following:

> GROUP
> REDEFINES
> OCCURS
> OCCURS DEPENDING ON

However, I too have a limited experience using CFF stage before, but now I have to use it extensively to read data. So, I am finding out the "nuances" in this stage!

chulett · Post by **chulett** » Thu Oct 29, 2009 12:23 pm

Interesting that they added support for OCCURS DEPENDING ON as well.

vivekgadwal · Post by **vivekgadwal** » Thu Oct 29, 2009 12:28 pm

True...and from what I understand from the documentation, ODO clause will not get flattened as opposed to using just the OCCURS clause.