Page 1 of 2

Export -- xml vs .dsx format

Posted: Mon Aug 02, 2004 11:39 am
by nkreddy
Hello,

I was wondering what is the advantage in exporting the project as an xml format rather than the proprietary .dsx format. Any clarification would be appreciated.

Thanks

Posted: Mon Aug 02, 2004 12:07 pm
by kduke
I think you said it. One is availble to any person that knows xml. That gives some flexibility maybe using it to create metadata. I don't know of anyone who has done that yet.

I think that dsx never has a problem importing and I am just more comfortable using it. It has been around a lot longer.

Most of us only use these only to backup our jobs so I always use dsx. Until I become better at XML then I think I will stick to dsx.

Posted: Mon Aug 02, 2004 12:40 pm
by chulett
Ditto what Kim said. I use the dsx format for backups and import/export. On occasion, I'll pull a full xml export and stash it away somewhere on my UNIX server and use it to search for metadata, typically grepping for tablenames or routines when I want to find out which jobs they are used in.

The first time you take a large xml export and try to re-import it, you'll realize why you want to stick with .dsx for something like that. :wink:

Posted: Mon Aug 02, 2004 2:33 pm
by ogmios
Just a little footnote: on the client side you have the application xml2dsx which will allow you to convert DataStage xml to dsx files.

If you want to do own processing on jobs the XML format is probably easier to handle in Perl/Java/... than the dsx format. For all the rest stay with dsx.

Ogmios

Posted: Mon Aug 02, 2004 2:58 pm
by chulett
ogmios wrote:on the client side you have the application xml2dsx which will allow you to convert DataStage xml to dsx files.
Which is run automatically when you import an xml export. This step can be rather painful for your pc to run, hence my comment.

Posted: Mon Aug 02, 2004 10:18 pm
by ray.wurlod
DSX usually gives a smaller export file than XML.

Since I often have to email these, that's an important consideration for me, as some of my interlocutors use dial-up connections.

Posted: Tue Aug 03, 2004 8:35 am
by clshore
At my client site, they uses ClearCase. It doesn't play nicely with a *.dsx file, but is OK with *.xml, hence that's what we export for checkin/checkout.

I also prefer *.dsx for my own uses.

Carter

Posted: Wed Nov 10, 2004 7:43 pm
by Gazelle
Can someone please expand on why ClearCase doesn't play nicely with .dsx files?

We are about to install ClearCase to control things like unix scripts and java code, and would like to also use it for Datastage jobs (we are using Datastage PX, v7.1).

We will need the ability to merge changes made by different developers to the same datastage job.

Version Control doesn't seem to do it.
ClearCase can do a merge, but I'm worried about whether it can handle .dsx (or even .xml) files, especially since the "elements" of the code can be in any order within the file.

Words of Wisdom would be much appreciated!

Thanks,

- g[/list]

Posted: Wed Nov 10, 2004 8:20 pm
by chulett
Gazelle wrote:We will need the ability to merge changes made by different developers to the same datastage job.
Curious how you are managing this. Version Control doesn't support this functionality because DataStage doesn't either. :? Only one developer can have a job open at any given time, so unless you are going out of your way with export/import... there won't be anything to 'merge'.

Posted: Wed Nov 10, 2004 8:48 pm
by Gazelle
The actual structure is still being debated, but there may be multiple Datastage Projects, with some "common" jobs. If the common jobs are changed, then the changes will need to be consolidated before they are released to the production environment.

But you are right; if we cannot easily consolidate changes, then we may need to keep one Project, or be very disciplined with who changes "common" jobs.

Has anyone worked in such a "parallel development" project?
How were changes "merged"?

- g

Posted: Wed Nov 10, 2004 8:55 pm
by ray.wurlod
Best practice in this case is to make the jobs as atomic as possible, so that there's never any need to merge except at the control (job sequence) level. IMHO.

Posted: Wed Nov 10, 2004 9:17 pm
by chulett
Agreed. You really shouldn't have any jobs so large that they require multiple developers to work on different parts of them.

We manage many projects as well, it's currently around 15, primarily divided by subject area. We also have common jobs and routines, but there are rules in place for where they are modified. We've settled on a main 'home' project for common objects, which is the only place changes are allowed. Any changes made are then proprogated out to the other projects where they are (typically) made read only. Version Control helps make this process fairly painless.

Posted: Wed Nov 10, 2004 9:52 pm
by dsxdev
Hi
Though a .xml file is larger than .dsx file it is much easier to read and go through. A .xml export of a DataStage job can be easily formated and is more readable.

In also has the advantage of integrating the code and metedata into some other code for parsing. This is not possible with .dsx file.

Posted: Wed Nov 10, 2004 10:52 pm
by Gazelle
It is not that the jobs are so large that they need multiple developers, but that the development will be split into separate projects.
With some jobs, since we are using PX, we may deliberately "combine" jobs into one large job to minimise the number of times the data lands to disk... but I imagine that this will be done by a single developer.

I like the idea of creating a separate project for "common" jobs, and sending "read-only" copies of the job to the other projects. I'll have to hit the manuals and work out how to do it! Thankyou all for your comments.

Regarding the use of xml:
Has anyone experienced problems with using ClearCase, with either *.dsx files or *.xml files?
It looks like xml is preferred, since it traps metadata changes. If we export the routines, do they also get included in the xml file?

Thanks,

- g

Posted: Thu Nov 11, 2004 2:55 am
by jzparad
At the site I'm currently working at, there are a whole lot of QA standards used with DS jobs (e.g. all stages must be commented, all stage variables must be commented, short and long job descriptions must be filled in.)

Normally you would have to open various DS objects to check all these when doing a review. I've found that the XML export is much easier specially as how it allows you to define a reference to an XSLT document.

I wrote an XSLT document that looks for all the requirements and highlights any contraventions to the standard.

It's a lot quicker and easier.