Page 1 of 1

Dataset and schemas

Posted: Wed Jun 01, 2011 12:48 am
by evee1
I was wondering whether I can somehow use a schema to define the input from a data set.
I was planning to use data sets to pass the data between my jobs, as the documentation suggests "using data sets wisely can be key to good performance in a set of linked jobs".
However I can't find in the doco how to actually do it.

Job 1 looks like this:
SeqFile --> ColumnImport -> Transformer --> Dataset
\
\--> SeqFileOut

ColumnImport and SeqFileOut are using the same schema file. RCP is enabled for all ouput links. This job works fine. I can't verify exaclty what is stored in a dataset, but the contents of the SeqFileOut has all the expected columns and values.

Job 2 should be able to read in the dataset created by Job2. Something like this:
Dataset --> <Some processing> --> DBTable

I'm not sure how I can retrieve the data from the Dataset created in Job1 using the same schema. Can I instruct dataset to use schema at all?
I suspect that it might not be, as I can't find any option to set in the Dataset stage.
Are there any alternative ways to read the dataset using a schema file?

Re: Dataset and schemas

Posted: Wed Jun 01, 2011 12:50 am
by evee1
evee1 wrote: SeqFile --> ColumnImport -> Transformer --> Dataset
\
\--> SeqFileOut
Oops! The link to SeqFileOut should start in the Transformer stage.

Posted: Wed Jun 01, 2011 12:52 am
by ray.wurlod
Wrap it in Code tags.

Posted: Wed Jun 01, 2011 1:02 am
by evee1
Never used them and not sure what they are, but will try to find out.

Would you be able to point me to where I can read about it please. A generic heading will do.
Thanks!

Posted: Wed Jun 01, 2011 5:26 am
by ray.wurlod
Select all of your "picture" and click on the Code button just above the editing area.

Posted: Wed Jun 01, 2011 6:27 am
by chulett
Whitespace (i.e. formatting) is preserved only in 'code', otherwise the forum software removes all those silly extra, unneeded spaces it thinks you accidentally put in.

Posted: Wed Jun 01, 2011 4:08 pm
by jwiles
You could use a fileset instead of a dataset if you wish, then you can specify a schema file. If your schema won't be changing over time, you can store the metadata in a table definition and load the column definitions from that in Job 2.

View the contents of the dataset using either the dataset management tool GUI or orchadmin from a command shell.

Regards,

Posted: Wed Jun 01, 2011 5:07 pm
by evee1
Wrap it in Code tags.
I first thought it was a solution to my dataset/schema problem :oops: :lol:

James,
File sets work perfectly. Thanks for the suggestion!!