Page 1 of 1

Selecting fields using Schema files

Posted: Sat Sep 19, 2015 11:55 pm
by samratisking
Hi All,

I am trying to build a job with SequentialFile-->Transformer-->Join--->Target(SQL Server Table).

For this I am trying to use the Schema file to take the table definition in the Sequential file(.csv file), as my source metadata might change frequently and I'm using RCP in my job.

My sequential file has around 600 fields, out of which only 60 are relevant and these might change later(add new fields or delete some fields). I am trying to find out if the Schema file considers the metadata based on field names or based on the column position using the delimiter specified.

As the fields I need to select are across the file randomly, I need to select the fields dynamically(my reason for using schema file).

Please let me know if you have any ideas of implementing this.

Best Regards,
Sam

Posted: Sun Sep 20, 2015 7:44 am
by chulett
Don't have any direct experience with this but it seems to me it would be easy enough to create a small test job with a handful of columns and try this out for yourself. Then let us know. :wink:

What purpose is the "Join" serving? That seems to me to be a deal breaker with RCP...

Posted: Sun Sep 20, 2015 6:13 pm
by samratisking
I tried it and could not get the result I was looking for.

The join is being used with a Key that I am bringing out of the Source explicitly and I do not have any concerns about the join.

My only issue is with the Schema file and the way it works.

Any ideas please?

Regards,
Sam

Posted: Sun Sep 20, 2015 6:26 pm
by chulett
Why not detail what exactly was the "it" that you tried and what the result was that you actually received? People should be able to give more targeted help from there.

Posted: Sun Sep 20, 2015 6:55 pm
by samratisking
Sure.

Here you go.

To test the schema file, I took a simple job SeqFile--->SeqFile.

I defined a simple schema file as below.

Code: Select all

record {final_delim=end,delim=',',record_delim='\n',quote=none,padchar='#'}
(
        Col3:string[max=255];
        Col1:string[max=255]
)
My input file is as follows.

Col1,Col2,Col3,Col4
Hi,Hello,First,Last
How,are,you,?
How,is,your,Day?

I have selected "First line is columns" property in my job.

The output that I expected is:

Col3, Col1
First,Hi
you,How
your,How

But the output I got is:

Col3
Col1,Col2,Col3,Col4
Hi,Hello,First,Last
How,are,you,?
How,is,your,Day?

When I set the "First line is columns" property in the Source file to "False", this is the output

Col3
Hi,Hello,First,Last
How,are,you,?
How,is,your,Day?


Regards,
Sam

Posted: Sun Sep 20, 2015 10:35 pm
by ray.wurlod
It is the very nature of sequential files that you must read every column. You must, indeed, read past every single byte to get to the next. Using a schema file does not get around this requirement, though you may be able to leverage the "Drop on Import" property - something I've not seen done with schema files, but it may nonetheless be possible. Over to you to investigate.

Posted: Mon Sep 21, 2015 7:34 am
by chulett
Dang, should have caught that aspect of this right away... the nature of sequential media, indeed.

Posted: Mon Sep 21, 2015 11:44 pm
by moalik
Hi Samrat,

I hope the concept of Partial Schema's might come handy. I haven't tried a hand on this. You can have a look at the below link from IBM.

https://www-01.ibm.com/support/knowledg ... hemas.html

Please update us if it resolves you problem :D

Thanks