Dataset retrieval
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 5
- Joined: Wed Aug 31, 2005 9:19 pm
- Location: Mumbai
- Contact:
Dataset retrieval
hi all
this is regarding the dataset stage. i am not able to retrieve the source data file from the server to datset . its showing orchestrate error.. but the same source file is retreived by sequential file.
Also when i run a seperate job taking SEQ ----> Dataset. the job runs successfully. i saved file in dataset as target.ds.
i have taken target.ds as source for dataset in earlier job it was working fine...what i want to know is only .ds extension file can be loaded in dataset..if not what's the alternative to use dataset as source stage instead of sequential file..
jamshid
this is regarding the dataset stage. i am not able to retrieve the source data file from the server to datset . its showing orchestrate error.. but the same source file is retreived by sequential file.
Also when i run a seperate job taking SEQ ----> Dataset. the job runs successfully. i saved file in dataset as target.ds.
i have taken target.ds as source for dataset in earlier job it was working fine...what i want to know is only .ds extension file can be loaded in dataset..if not what's the alternative to use dataset as source stage instead of sequential file..
jamshid
Hello jamshid,
datasets in PX can be called by any filename and can be located anywhere on the system. They only contain schema and other information pointing to that actual data files and thus are small. I don't quite understand your problem or question, especially where you are getting an orchestrate error - what is the error?
You cannot take a sequential file, rename it as a .ds file and then read it using the PX dataset file type; but I am not sure if that is your question.
datasets in PX can be called by any filename and can be located anywhere on the system. They only contain schema and other information pointing to that actual data files and thus are small. I don't quite understand your problem or question, especially where you are getting an orchestrate error - what is the error?
You cannot take a sequential file, rename it as a .ds file and then read it using the PX dataset file type; but I am not sure if that is your question.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 103
- Joined: Wed Jul 06, 2005 12:29 am
Re: Dataset retrieval
Hi Jamshid,
Can you tell us in more precise way like what exactly are you trying to do.
Cheers,
Rajeev.
Can you tell us in more precise way like what exactly are you trying to do.
Cheers,
Rajeev.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
A Data Set contains data in internal (binary) format. A persistent Data Set (one that is on disk) must have been created with a Data Set stage - there is no other way. There is one or more data files on each processing node; the control file (the one whose name ends in ".ds") describes the location and number of these data files (each is max 2GB). The control file for a virtual Data Set (which is in memory) has a name ending in ".v"; you can see the use of these by inspecting generated osh script.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 5
- Joined: Wed Aug 31, 2005 9:19 pm
- Location: Mumbai
- Contact:
hi
i am trying to load a file from my unix server(/auto/user/jstage/country.dat) to dataset stage. which is my source in my job..when i am trying to directly use the dataset my mentioning above path in the stage.....its showing the orchestrate error while running the job...its saying that path mentioned is missing from orchestate framework....
but same is retreivable using a sequential file.....and the job run successfully...but when i am using the dataset instead of sequential file as my source stage ..its showing above problem....
thanx
jamshid
i am trying to load a file from my unix server(/auto/user/jstage/country.dat) to dataset stage. which is my source in my job..when i am trying to directly use the dataset my mentioning above path in the stage.....its showing the orchestrate error while running the job...its saying that path mentioned is missing from orchestate framework....
but same is retreivable using a sequential file.....and the job run successfully...but when i am using the dataset instead of sequential file as my source stage ..its showing above problem....
thanx
jamshid
Jamshid,
you cannot read a flat file using the dataset stage. Read it using the sequential file stage and write it to a dataset stage.
you cannot read a flat file using the dataset stage. Read it using the sequential file stage and write it to a dataset stage.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 5
- Joined: Wed Aug 31, 2005 9:19 pm
- Location: Mumbai
- Contact:
ArndW wrote:Jamshid,
you cannot read a flat file using the dataset stage. Read it using the sequential file stage and write it to a dataset stage.
hi Arnd
So exactly when's the scenario that we need to use dataset stages..when exactly it helps in incresing the performance....
and if i am sticking to sequential stage itself for my job..it will affect my performance??....or i need to go for any other alternative?...
thanx
jamshid
If your source is a sequential file of variable length records then you will not experience any gains by first writing it to a dataset and then processing it. The PX speed performance comes from it's ability to do things in parallel - but a sequential file read cannot be processed in parallel (unless it is of fixed record length).
Datasets are used to store data and read data that would be done in a sequential file in Server jobs. These files can be read and written very quickly in Px. They can also be used directly as lookups. Think of datasets as parallel sequential files and try to use them where possible instead of sequential files.
Datasets are used to store data and read data that would be done in a sequential file in Server jobs. These files can be read and written very quickly in Px. They can also be used directly as lookups. Think of datasets as parallel sequential files and try to use them where possible instead of sequential files.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Persistent (on disk) Data Sets are intended to be used for those occasions where one parallel job prepares and stages data for a subsequent parallel job to use.
If no staging is required, persistent Data Sets are not required; the data can be passed to and fro between virtual (in memory) Data Sets.
If no staging is required, persistent Data Sets are not required; the data can be passed to and fro between virtual (in memory) Data Sets.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.