Read same Persistent DataSet multiple times in a job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Read same Persistent DataSet multiple times in a job

Post by rwierdsm »

Folks,

I'm creating a job that needs to look into a list of values using multiple fields in the incoming file. In server it is quite common practice to set up a list in a hashed file and have multiple links from the single hashed file to the transformer performing the lookup.

The EE version doesn't like that so much. What I've done is created multiple Dataset stages, all pointing to the same underlying persistent dataset and linked each one separately to the lookup stage.

Thus far I have been quite successful with this approach, encountering no errors.

I have not been able to find anything in the manuals or on these forums that explicitly states that this is or is not a permissible thing to do with persistent datasets.

Does anyone out there have any experience using datasets this way?

Thanks in advance for your responses,

Rob W
Rob Wierdsma
Toronto, Canada
bartonbishop.com
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

DataSets can be read from simultaneously by different processes, but cannot be written to and read from at the same time.

Look at them as glorified sequential files when it comes to concurrency control.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

Thanks, Arnd.

No internal locking for reading or anything like that?

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
Nageshsunkoji
Participant
Posts: 222
Joined: Tue Aug 30, 2005 2:07 am
Location: pune
Contact:

Post by Nageshsunkoji »

rwierdsm wrote:Thanks, Arnd.

No internal locking for reading or anything like that?

Rob
hi,

Rather than going for multiple datsets .The better approach is, use one Dataset in your job and use copy stage in the down stream and make that many copies to filter your data. It will save your memory and increase the performance. As arnd said, you can access tha data at the same time from a underlying persistent dataset. But, you can't write the data at the same time. Better approach is make copies of the dataset by using Copy stage.
NageshSunkoji

If you know anything SHARE it.............
If you Don't know anything LEARN it...............
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

Hi Nageshsunkoji,

I am only using one DataSet, but referring to it multiple times by using multiple stages. Each stage points to the same underlying dataset in the OS.

No need to copy anything. I just needed to know that I could access the same dataset multiple times in the same job.

Rob W
Rob Wierdsma
Toronto, Canada
bartonbishop.com
Nageshsunkoji
Participant
Posts: 222
Joined: Tue Aug 30, 2005 2:07 am
Location: pune
Contact:

Post by Nageshsunkoji »

rwierdsm wrote:Hi Nageshsunkoji,

I am only using one DataSet, but referring to it multiple times by using multiple stages. Each stage points to the same underlying dataset in the OS.

No need to copy anything. I just needed to know that I could access the same dataset multiple times in the same job.

Rob W
Thats ok If your using only one datset. But, in your post you have written What I've done is created multiple Dataset stages, all pointing to the same underlying persistent dataset and linked each one separately to the lookup stage.

If your using only one Dataset and accessing the same dataset multiple times. I don't think so, any problem is there. We are also using the similar manner and we haven't faced any problem.
NageshSunkoji

If you know anything SHARE it.............
If you Don't know anything LEARN it...............
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

If your using only one Dataset and accessing the same dataset multiple times. I don't think so, any problem is there. We are also using the similar manner and we haven't faced any problem.
Good news.

Thanks.
Rob Wierdsma
Toronto, Canada
bartonbishop.com
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I've used this in the past; no problems with concurrent reads on DataSets or LookupFileSets.
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

Thanks for the input, folks.

Rob W
Rob Wierdsma
Toronto, Canada
bartonbishop.com
Post Reply