Page 1 of 2

How to read a text file of 5GB

Posted: Sun Aug 16, 2009 11:25 pm
by Rajee
Hi,

Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee

Re: How to read a text file of 5GB

Posted: Mon Aug 17, 2009 12:54 am
by karthikdsexchange
Use fileset stage instead of sequential file stage in your job as source reader.
Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee

Posted: Mon Aug 17, 2009 1:19 am
by ArndW
Why can't you read a file of over 5Gb in DataStage?

Re: How to read a text file of 5GB

Posted: Mon Aug 17, 2009 1:19 am
by Rajee
karthikdsexchange wrote:Use fileset stage instead of sequential file stage in your job as source reader.
Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee
Thanks for your response Karthik
But can a fileset stage read a text file?

Thanks,
Rajee

Re: How to read a text file of 5GB

Posted: Mon Aug 17, 2009 1:20 am
by Rajee
karthikdsexchange wrote:Use fileset stage instead of sequential file stage in your job as source reader.
Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee
Thanks for your response Karthik
But can a fileset stage read a text file?

Thanks,
Rajee

Posted: Mon Aug 17, 2009 1:25 am
by Rajee
ArndW wrote:Why can't you read a file of over 5Gb in DataStage? ...
Thanks for your respnse.
Even i do know that we can do that through Datastage but my question is which stage can be used for it.

Posted: Mon Aug 17, 2009 1:29 am
by ArndW
Sequential file stage for one. CFF as well.

Posted: Mon Aug 17, 2009 1:36 am
by Rajee
ArndW wrote:Sequential file stage for one. CFF as well. ...
Sequential file stage takes near about 35 minutes to read 2GB file itself.
What does CFF mean?

Posted: Mon Aug 17, 2009 1:42 am
by anandsiva
CFF is complex flat file

Posted: Mon Aug 17, 2009 1:42 am
by ArndW
CFF is the Complex-flat-file stage. The Sequential stage is one of the fastest possible means of reading a file. Is the file fixed length or variable length?

Posted: Mon Aug 17, 2009 2:17 am
by stefanfrost1
Depending on your datastage version you can improve performance (with varied results) based on how your file is structured using sequential file stage by setting the following properties in it.

Read From Multiple Nodes = Yes
or
Number of Readers Per Node = N

Furthermore, your performance is also depending on what you are doing after you've processed your file. As an example, you might improve performance by reading your file without metadata or everything on 1 row as one column and directly after that adding the schema in parallel using a Column Import stage....

Posted: Mon Aug 17, 2009 2:40 am
by Rajee
ArndW wrote:CFF is the Complex-flat-file stage. The Sequential stage is one of the fastest possible means of reading a file. Is the file fixed length or variable length?
The file is variable length.

Posted: Mon Aug 17, 2009 2:47 am
by ArndW
That makes multiple readers a nonoption. How many columns does the file have and how long does it take to read the file when the only other stage in your job is a copy stage (or a peek stage)?

Posted: Mon Aug 17, 2009 2:49 am
by Rajee
stefanfrost1 wrote:Depending on your datastage version you can improve performance (with varied results) based on how your file is structured using sequential file stage by setting the following properties in it.

Read From Multiple Nodes = Yes
or
Number of Readers Per Node = N

Furthermore, your performance is also depending on what you are doing after you've processed your file. As an example, you might improve performance by reading your file without metadata or everything on 1 row as one column and directly after that adding the schema in parallel using a Column Import stage....
Thanks for your response.
Let me cross check with my job to make sure the options you have referred here are addressed.
Iam reading the entire row in a single column only.
I heard that a sequential file stage can read a file maximum of 2GB of size and not more than that and that is the limitation for a sequential file stage.But now,after discussion with you all i understand that whatever i heard is just a myth.

Thanks,
Rajee

Posted: Mon Aug 17, 2009 2:53 am
by Rajee
ArndW wrote:That makes multiple readers a nonoption. How many columns does the file have and how long does it take to read the file when the only other stage in your job is a copy stage (or a peek stage)?
The maximum number of columns is 30,but i do have lookup,transform stages in the job.

Thanks,
Rajee