How to read a text file of 5GB

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Rajee
Participant
Posts: 46
Joined: Thu Mar 13, 2008 7:06 am
Location: India

How to read a text file of 5GB

Post by Rajee »

Hi,

Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee
karthikdsexchange
Participant
Posts: 15
Joined: Thu Aug 07, 2008 2:56 am

Re: How to read a text file of 5GB

Post by karthikdsexchange »

Use fileset stage instead of sequential file stage in your job as source reader.
Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee
Karthik
Make It Work Make It Right Make It Fast
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Why can't you read a file of over 5Gb in DataStage?
Rajee
Participant
Posts: 46
Joined: Thu Mar 13, 2008 7:06 am
Location: India

Re: How to read a text file of 5GB

Post by Rajee »

karthikdsexchange wrote:Use fileset stage instead of sequential file stage in your job as source reader.
Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee
Thanks for your response Karthik
But can a fileset stage read a text file?

Thanks,
Rajee
Thanks,
Rajee
Rajee
Participant
Posts: 46
Joined: Thu Mar 13, 2008 7:06 am
Location: India

Re: How to read a text file of 5GB

Post by Rajee »

karthikdsexchange wrote:Use fileset stage instead of sequential file stage in your job as source reader.
Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee
Thanks for your response Karthik
But can a fileset stage read a text file?

Thanks,
Rajee
Thanks,
Rajee
Rajee
Participant
Posts: 46
Joined: Thu Mar 13, 2008 7:06 am
Location: India

Post by Rajee »

ArndW wrote:Why can't you read a file of over 5Gb in DataStage? ...
Thanks for your respnse.
Even i do know that we can do that through Datastage but my question is which stage can be used for it.
Thanks,
Rajee
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sequential file stage for one. CFF as well.
Rajee
Participant
Posts: 46
Joined: Thu Mar 13, 2008 7:06 am
Location: India

Post by Rajee »

ArndW wrote:Sequential file stage for one. CFF as well. ...
Sequential file stage takes near about 35 minutes to read 2GB file itself.
What does CFF mean?
Thanks,
Rajee
anandsiva
Participant
Posts: 41
Joined: Wed May 21, 2008 7:58 pm
Location: Sydney

Post by anandsiva »

CFF is complex flat file
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

CFF is the Complex-flat-file stage. The Sequential stage is one of the fastest possible means of reading a file. Is the file fixed length or variable length?
stefanfrost1
Premium Member
Premium Member
Posts: 99
Joined: Mon Sep 03, 2007 7:49 am
Location: Stockholm, Sweden

Post by stefanfrost1 »

Depending on your datastage version you can improve performance (with varied results) based on how your file is structured using sequential file stage by setting the following properties in it.

Read From Multiple Nodes = Yes
or
Number of Readers Per Node = N

Furthermore, your performance is also depending on what you are doing after you've processed your file. As an example, you might improve performance by reading your file without metadata or everything on 1 row as one column and directly after that adding the schema in parallel using a Column Import stage....
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
Rajee
Participant
Posts: 46
Joined: Thu Mar 13, 2008 7:06 am
Location: India

Post by Rajee »

ArndW wrote:CFF is the Complex-flat-file stage. The Sequential stage is one of the fastest possible means of reading a file. Is the file fixed length or variable length?
The file is variable length.
Thanks,
Rajee
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That makes multiple readers a nonoption. How many columns does the file have and how long does it take to read the file when the only other stage in your job is a copy stage (or a peek stage)?
Rajee
Participant
Posts: 46
Joined: Thu Mar 13, 2008 7:06 am
Location: India

Post by Rajee »

stefanfrost1 wrote:Depending on your datastage version you can improve performance (with varied results) based on how your file is structured using sequential file stage by setting the following properties in it.

Read From Multiple Nodes = Yes
or
Number of Readers Per Node = N

Furthermore, your performance is also depending on what you are doing after you've processed your file. As an example, you might improve performance by reading your file without metadata or everything on 1 row as one column and directly after that adding the schema in parallel using a Column Import stage....
Thanks for your response.
Let me cross check with my job to make sure the options you have referred here are addressed.
Iam reading the entire row in a single column only.
I heard that a sequential file stage can read a file maximum of 2GB of size and not more than that and that is the limitation for a sequential file stage.But now,after discussion with you all i understand that whatever i heard is just a myth.

Thanks,
Rajee
Thanks,
Rajee
Rajee
Participant
Posts: 46
Joined: Thu Mar 13, 2008 7:06 am
Location: India

Post by Rajee »

ArndW wrote:That makes multiple readers a nonoption. How many columns does the file have and how long does it take to read the file when the only other stage in your job is a copy stage (or a peek stage)?
The maximum number of columns is 30,but i do have lookup,transform stages in the job.

Thanks,
Rajee
Thanks,
Rajee
Post Reply