Split a source file into multiple files dynamically

its_jkm · Post by **its_jkm** » Wed Jan 27, 2016 12:01 pm

Hi ,

I have following requirement:

I have source file with 10 columns.
I need to split the file into target files to be used downstream dynamically based on the selective columns specified in a metadata table for the target files

Can you please let me know how this can be done in datastage.

Example:

Code: Select all

Source file: A.csv

empno, empname,age,telephone number,salary

Metadata Table 1:

Source file Source column 
A.csv         empno
              empname
              age
              telephone 
              Salary

Metadata Table 2:

Source file Target File Target Column 
A.csv        X.csv         empno
                           empname
A.csv        Y.csv         empno
                           age 
                           salary

So in this scenario I need the job to split source file a.csv into x.csv and y.csv which can increase to n number of files with different set of columns.

It need to be a generic job.

its_jkm · Post by **its_jkm** » Wed Jan 27, 2016 12:04 pm

to clarify:

x.csv should have data for empno,empname
y.csv should have data for empno,age,salary

chulett · Post by **chulett** » Wed Jan 27, 2016 12:35 pm

Define what you mean by 'generic' here, please.

mouthou · Post by **mouthou** » Wed Jan 27, 2016 9:32 pm

As far as my understanding goes, I guess this may be achieved by using MODIFY stage which handles the dynamic selection & dropping of columns. Please check the documentation for the stage usage.

If the DataStage implementation gets complicated, as another resort unix scripting will also help as the source is the file and its the matter of picking the column wanted. Just pass the field# to Cut command -f option or awk command

source file
-bash-4.3$ cat a.txt
col1,col2,col3,col4
1,2,3,4
11,22,33,44

-bash-4.3$ cut -d',' -f1,4 a.txt
col1,col4
1,4
11,44
-bash-4.3$

its_jkm · Post by **its_jkm** » Thu Jan 28, 2016 5:30 am

Generic mean...

We need a single job that read input file once and splits into target files.

There will be different set of input files with different metadata so the job should read the input file based on the metadata defined and split into target files as defined in the meteadata table.

The example I gave is just one instance.
Example 2:

Input file
B.csv
company name, turnover, location, profilt,COMPANY CODE

Output files:

xx.csv
company code,company name

yy.csv
company code, location,turnover

zz.csv
companycode, company name, profit

In awk as well, can we read the file ones and split into multiple target files depending on the requirement defined in the metadata table?

please let me know if you need any further info

naveenkumar.ssn · Post by **naveenkumar.ssn** » Fri Feb 05, 2016 12:40 am

Hi,

This can be performed using schema files in datastage.

Create the schema file for the target as per your requirement.

Regards
Naveen

ShaneMuir · Post by **ShaneMuir** » Mon Feb 08, 2016 10:30 am

You could also try using a combination of a peek stage and a lookup then outputting the values to an external target stage.

Eg: Set a peek to return all records, all columns, all partitions, and the output mode to Output (Rather than log), Show column names should be true. This will output the data in the format
ColumnName:Value|ColumnName:Value etc (where | is the delimter as set in the peek stage (under options)). You should then be able to split the values into column name and value columns in a transformer stage, and perform a subsequent lookup on the column name, keeping only those where there is a match. Also return the target file name.

If you then concat all the required output columns and put the fully qualified target file name as the first column, then you should be able to then use an external target stage with a simple awk command to split the records into as many files you require.

DSXchange

Split a source file into multiple files dynamically

Split a source file into multiple files dynamically

Re: Split a source file into multiple files dynamically

Re: Split a source file into multiple files dynamically

Re: Split a source file into multiple files dynamically