How to read a text file of 5GB

Rajee · Post by **Rajee** » Sun Aug 16, 2009 11:25 pm

Hi,

Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee

karthikdsexchange · Post by **karthikdsexchange** » Mon Aug 17, 2009 12:54 am

Use fileset stage instead of sequential file stage in your job as source reader.

Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee

ArndW · Post by **ArndW** » Mon Aug 17, 2009 1:19 am

Why can't you read a file of over 5Gb in DataStage?

Rajee · Post by **Rajee** » Mon Aug 17, 2009 1:19 am

karthikdsexchange wrote:Use fileset stage instead of sequential file stage in your job as source reader.
Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee

Thanks for your response Karthik
But can a fileset stage read a text file?

Thanks,
Rajee

Rajee · Post by **Rajee** » Mon Aug 17, 2009 1:20 am

karthikdsexchange wrote:Use fileset stage instead of sequential file stage in your job as source reader.
Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.

Thanks,
Rajee

Thanks for your response Karthik
But can a fileset stage read a text file?

Thanks,
Rajee

Rajee · Post by **Rajee** » Mon Aug 17, 2009 1:25 am

ArndW wrote:Why can't you read a file of over 5Gb in DataStage? ...

Thanks for your respnse.
Even i do know that we can do that through Datastage but my question is which stage can be used for it.

ArndW · Post by **ArndW** » Mon Aug 17, 2009 1:29 am

Sequential file stage for one. CFF as well.

Rajee · Post by **Rajee** » Mon Aug 17, 2009 1:36 am

ArndW wrote:Sequential file stage for one. CFF as well. ...

Sequential file stage takes near about 35 minutes to read 2GB file itself.
What does CFF mean?

anandsiva · Post by **anandsiva** » Mon Aug 17, 2009 1:42 am

CFF is complex flat file

ArndW · Post by **ArndW** » Mon Aug 17, 2009 1:42 am

CFF is the Complex-flat-file stage. The Sequential stage is one of the fastest possible means of reading a file. Is the file fixed length or variable length?

stefanfrost1 · Post by **stefanfrost1** » Mon Aug 17, 2009 2:17 am

Depending on your datastage version you can improve performance (with varied results) based on how your file is structured using sequential file stage by setting the following properties in it.

Read From Multiple Nodes = Yes
or
Number of Readers Per Node = N

Furthermore, your performance is also depending on what you are doing after you've processed your file. As an example, you might improve performance by reading your file without metadata or everything on 1 row as one column and directly after that adding the schema in parallel using a Column Import stage....

Rajee · Post by **Rajee** » Mon Aug 17, 2009 2:40 am

ArndW wrote:CFF is the Complex-flat-file stage. The Sequential stage is one of the fastest possible means of reading a file. Is the file fixed length or variable length?

The file is variable length.

ArndW · Post by **ArndW** » Mon Aug 17, 2009 2:47 am

That makes multiple readers a nonoption. How many columns does the file have and how long does it take to read the file when the only other stage in your job is a copy stage (or a peek stage)?

Rajee · Post by **Rajee** » Mon Aug 17, 2009 2:49 am

stefanfrost1 wrote:Depending on your datastage version you can improve performance (with varied results) based on how your file is structured using sequential file stage by setting the following properties in it.

Read From Multiple Nodes = Yes
or
Number of Readers Per Node = N

Furthermore, your performance is also depending on what you are doing after you've processed your file. As an example, you might improve performance by reading your file without metadata or everything on 1 row as one column and directly after that adding the schema in parallel using a Column Import stage....

Thanks for your response.
Let me cross check with my job to make sure the options you have referred here are addressed.
Iam reading the entire row in a single column only.
I heard that a sequential file stage can read a file maximum of 2GB of size and not more than that and that is the limitation for a sequential file stage.But now,after discussion with you all i understand that whatever i heard is just a myth.

Thanks,
Rajee

Rajee · Post by **Rajee** » Mon Aug 17, 2009 2:53 am

ArndW wrote:That makes multiple readers a nonoption. How many columns does the file have and how long does it take to read the file when the only other stage in your job is a copy stage (or a peek stage)?

The maximum number of columns is 30,but i do have lookup,transform stages in the job.

Thanks,
Rajee

DSXchange

How to read a text file of 5GB

How to read a text file of 5GB

Re: How to read a text file of 5GB

Re: How to read a text file of 5GB

Re: How to read a text file of 5GB