Dataset and schemas

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Dataset and schemas

Post by evee1 »

I was wondering whether I can somehow use a schema to define the input from a data set.
I was planning to use data sets to pass the data between my jobs, as the documentation suggests "using data sets wisely can be key to good performance in a set of linked jobs".
However I can't find in the doco how to actually do it.

Job 1 looks like this:
SeqFile --> ColumnImport -> Transformer --> Dataset
\
\--> SeqFileOut

ColumnImport and SeqFileOut are using the same schema file. RCP is enabled for all ouput links. This job works fine. I can't verify exaclty what is stored in a dataset, but the contents of the SeqFileOut has all the expected columns and values.

Job 2 should be able to read in the dataset created by Job2. Something like this:
Dataset --> <Some processing> --> DBTable

I'm not sure how I can retrieve the data from the Dataset created in Job1 using the same schema. Can I instruct dataset to use schema at all?
I suspect that it might not be, as I can't find any option to set in the Dataset stage.
Are there any alternative ways to read the dataset using a schema file?
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Re: Dataset and schemas

Post by evee1 »

evee1 wrote: SeqFile --> ColumnImport -> Transformer --> Dataset
\
\--> SeqFileOut
Oops! The link to SeqFileOut should start in the Transformer stage.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Wrap it in Code tags.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Post by evee1 »

Never used them and not sure what they are, but will try to find out.

Would you be able to point me to where I can read about it please. A generic heading will do.
Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Select all of your "picture" and click on the Code button just above the editing area.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Whitespace (i.e. formatting) is preserved only in 'code', otherwise the forum software removes all those silly extra, unneeded spaces it thinks you accidentally put in.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

You could use a fileset instead of a dataset if you wish, then you can specify a schema file. If your schema won't be changing over time, you can store the metadata in a table definition and load the column definitions from that in Job 2.

View the contents of the dataset using either the dataset management tool GUI or orchadmin from a command shell.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Post by evee1 »

Wrap it in Code tags.
I first thought it was a solution to my dataset/schema problem :oops: :lol:

James,
File sets work perfectly. Thanks for the suggestion!!
Post Reply