Maximum export size for a DSX

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Maximum export size for a DSX

Post by clarcombe »

Customer has a 75mb project dsx file which they are trying to import from integration to production.

Normally we get around the import timeouts by using an import selected and importing in 4 steps however now we cannot even get DS to show us the jobs to import!

We have looked at splitting the dsx but this would mean that we would have to incorporate common routines in different DSXs thus meaning doubling the maintenance work each time a common routine was modified

1) Does DS have a maximum size for a DSX ? Is there a recommended maximum size
2) If we leave the executable off, would this reduce the DSX file size by much
3) Any suggestions as to other work arounds ?

Many Thanks

Colin
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I've done some exports of projects in that size and hadn't gotten issues - but it was on fast and mostly idle hardware. By excluding the executables from your DSX files you will save quite a bit of space; the binaries are UUENCODED {remember that old technology from old e-mail systems?} which increases their already large sizes. The DSX and XML formats are, unfortunately for you, anything but compact.

I've found the selective import takes a long time and would have expected timeouts there, since it has to process the whole text file in one pass to get the list of valid objects.

What is keeping you from splitting your DSX file; either manually and re-inserting headers or by exporting the routines/functions and jobs separately?
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

Thanks for the information

The issue is that a lot of the jobs are interdependant. Whilst we can separate the routines , transforms etc relatively simply, the jobs are more difficult.

I am trying to avoid manual cut and paste of a 75mb file !!

I think I will have to get the developer to really analyse his code and work out where he can break it and then import in order so that the interdependencies are taken into account
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What problems are you getting importing interdependant jobs? If you just import a job sequence which calls a server job that hasn't been imported yet you won't get a problem (at least not until you compile it).
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I've imported from 200 or 300MB exports without anything timing out. Depending on the speed of your machine and how much RAM you have, it can take quite some time, however. An xml export will take even longer as it is first converted to a dsx 'under the covers' and then imported. Patience is the key here.

If the timeouts are from DataStage, use the Administrator to change your timeout value to 'Do not timeout' and see if that helps. Windows will definitely get to a state where it says it is 'not responding' but that can be ignored.

If you know you want everything in the export, then you really don't want to be doing a 'selective' import as you are at least doubling the amount of time it will take end to end. Also suggest you select the 'Overwrite without query' option unless you actually want it to stop if an object already exists in the repository. Nothing more frustrating than finding that 'Are you sure?' query waiting for you, buried behind other windows when it should have been done hours ago dagnabbit. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Get more memory on your client machine!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Specification of client and server

Post by clarcombe »

DS version 6
Server : HP 6 cpu PA 8900 16gb memory
Client : Pentium 4 2.8 ghz 512mb memory running XP

Is the 512mb not enough. Are there any ways of working out what size the client memory should be ?

Thanks
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

512MB is barely enough to run XP. :wink: Is there a workstation there with more? I have 2GB but even 1GB would help tremendously.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Instead of import selected search the forum for "dsx cutter". Steve Boyce posted a perl script (parsedsx) for exploding a .dsx into individual files for each job and routine in a large dsx file. He also posted a concatenator (catdsx) script for combining all files in a directory.

My point is, explode your fat dsx file using parsedsx and then pick the items for import using Windoze Explorer. Move them into another folder, run catdsx on them. Use the resulting file for import. Even better, catdsx lets you specify how many equal sized import files you want, so that you can run multiple imports simultaneously to more quickly setup a project.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

I don't see how that would work. Surely if I explode a DSX I have to ensure the import order follows a bottom up order i.e. routines first then jobs that use those routines and then jobs that use other jobs ?

Unless that is, the dsx cutter is intelligent and it knows which routines to associate with which jobs.
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Colin,

as I stated before the order isn't important within the dsx file.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Multiple import simulteneously may agian eat up the memory.
Command line import can be tryed to import all the dsx from a folder after scheduling.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

There's no import order in a dsx file. Routines do not have to precede jobs. Trust me, I've used this product for 8 years. :wink: Steve and I wrote those perl scripts to facilitate interfacing DS with 3rd party version control tools.

Since you're doing an import selected, you're manually choosing which jobs to import. The same thing can be done without waiting for the import selected to load the job/routine list into the selection window (which requires twice parsing the large dsx file, time consuming). By exploding the large dsx file into a file per job and routine, you can just use Explorer to tag the jobs for import and then merge them together.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

kumar_s wrote:Multiple import simulteneously may agian eat up the memory.
The problem is the "import selected". On large dsx files this is a fundamentally BAD METHOD. You basically wait while it "imports" the file into a selection list, then wait again while it rescans the dsx file and imports your chosen few. This is really slow and takes a long time, and also a lot of memory.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

Trust me, I've used this product for 8 years. Steve and I wrote those perl scripts to facilitate interfacing DS with 3rd party version control tools.

I would love to believe you are right but when I worked with the developer we did an export selected to 3 separate DSXs and then imported them to a new project and this failed with a whole bunch of constraint errors ( I don't have the exact error codes returned).

Going back to the memory issue, I thought this was a good route until I was told it would be easier to transport the Eiffel Tower to Egypt than to get a memory upgrade on a standard homogenised firewalled locked company PC :cry:

At lunchtime, we exported the DSX hierarchically and I am awaiting the results of that import.

Thanks all for your assistance
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
Post Reply