limitations on handling larger files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

limitations on handling larger files

Post by shivan »

Hi All,
Does DataStage has a limitation of handling files. Because whenever the feed is more than 100 MB, it fails.

Thanks
shivan
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Shivan,

even a cursory look at some of the posts mention 'hundreds of thousands' and 'millions' of rows; so the answer to your question is - "Yes, DataStage has limits, but they quite high" {Often the OS limits files to 2Gb, but that can be overridden}. At my site a 100Mb file is usually used as a small sample test file, the normal volumes are orders of magnitude higher.

The cause of your job failing at 100Mb is most likely not an inherent DataStage limit but something else.
shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

Post by shivan »

thanxs for replying. But the company i am working with everyone has a opion that it fails after 100 MB. As i am new to datastage.
So is there any other reason like Server capacity or anything else that may be the reason that it fails.
Is there any datastage official site which explains the capacity that datastage can handle?
thanks
shivan
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Shivan,

You can see that generally DS handles large volumes. Most of us deal with files > 100Mb daily, so the opinions of your colleagues don't really affect us much; since we disprove their assertions each time we load or extract large amounts of data.

1. Sequential files. If you have limited your UNIX settings or file system settings to 2Gb then the DataStage limit for sequential files is {guess what..} 2Gb. If you have not limited your maximum sequential file size then DataStage sequential files can be (2**64)-1 bits long. This is hard limit, but it is somewhat bigger than 100Mb

2. Hashed files. Default to approximately 2Gb of data. Can be increased

3. DataBase Files. Depending on the database settings, also dependant upon transaction size, rollback segments, logging status, etc. Most of the load methods are done row by row, so DataStage can merrily process records all day long.

You really should let the people in here assist you by telling us what the error messages are and what you are doing with your data. I seem to recall that you mentioned XML in another post today, and I recall that there are issues with large XML files - but 100Mb should still be OK.

Just as an example - If you are doing an aggregation or a sort in your job and it aborts with something along the lines of "no space" and you find out that your /tmp has filled up after processing 100Mb of data, do you consider this to be a DataStage limitation?

I am fairly confidant that your problem is along the lines of the example above or something similar. Some sort of a limit has been reached that is due to the system layout, configuration or perhaps even DataStage settings. But without an error message nobody has a chance to assist.
shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

Post by shivan »

thanxs.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Shivan,

please remember to post the cause for your problems when you find out so that others may profit from this thread.

Thanks,
shivan
Participant
Posts: 70
Joined: Mon Jul 25, 2005 9:29 am

Post by shivan »

It was a limitation(100 MB) the company setup while installing datastage. That was the reason, why the datastage couldnt handle more than 100 MB?
thanks
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I'll summarize for you what's been posted:

There is NO 100MB limitation on DS.


However, you can do many things to make it appear that there are arbitrary limits. For example, you may attempt to load a 100MB file into a table with commit set to 0 and blow thru your rollback space, thus giving the impression that there's a 100MB limit even though it's database related.

You may send 10,000,000 rows of 10 character unique values into the Aggregator stage and blow its mind because of such a large set of unique values without giving it some assistance thru sorting. Again, there's some practical things to do on certain stages to make them work well.

You just may create abysmal job designs that run forever, and ultimately either timeout database connections, FTP transfers, etc, due to some horrendous job designs. (And I've seen plenty in almost 8 years with this product and 30-40 customer sites).

Just to let you know 100MB is less than a daily load cycle in most of the projects I've designed. My current customer project involves processing files in the GB's hourly using just plain old Server jobs.

I'd suggest your management didn't know what they were doing, got bad advice, or had "experts" who weren't. That's my opinion and I'll hold it until you give us more information to tell us how 100 MB came about.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply