Read and write from single dataset in job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
shaimil
Charter Member
Charter Member
Posts: 37
Joined: Fri Feb 28, 2003 5:37 am
Location: UK

Read and write from single dataset in job

Post by shaimil »

I'm trying to use a single dataset(once as lookup and again as target) in the same job. Can someone please let me know if this is possible as I keep getting the following error.

Operator initialization: A link between two operators should be named with a .v; insert a copy operator to save a persistent copy of the data

Thanks
sanjumsm
Premium Member
Premium Member
Posts: 64
Joined: Tue Oct 17, 2006 11:29 pm
Location: Toronto

Post by sanjumsm »

In a job two dataset can not have same name. You have to split the job.
sanjeev kumar
shaimil
Charter Member
Charter Member
Posts: 37
Joined: Fri Feb 28, 2003 5:37 am
Location: UK

Post by shaimil »

So is it not possible to implement the equivalent of a hash file r/w in a single job using PX?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It is not. It's what's called a "blocking operation" and would interfere with pipeline parallelism, so is forbidden.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shaimil
Charter Member
Charter Member
Posts: 37
Joined: Fri Feb 28, 2003 5:37 am
Location: UK

Post by shaimil »

Ray,

I guess the lookup file set doesn't suffer from the same restrictions as that seemed to work, although I'm still testing to establish whether the lookup fileset does the equivalent of a truncate and load rather than an append.

What in yor mind would be a reasonable approach to replicating the Server hash file lookup/write.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There isn't one. Stop thinking like a server job developer. Start envisaging virtual Data Sets, and learn how an index to a virtual Data Set is built "on the fly" on a reference input link to a Lookup stage, except when that link is serviced by a Lookup File Set or when sparse lookup is specified.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

This is not a case for server job or enterprise job. It's problem how enterprise edition can handle it. Such as for surrogate key case: you generate a surrogate key for a new customer. In the same run, the new customer can be present in more than one service providers, so can have more than one instance. When you generate a key for the first instance, and when the second comes in, you need to find his/her key in the dataset, or it will generate another key for him/her.

So how do you handle this scenario?

There are other situations fit in the scenario, such as in web service, a session token from previous request, is used in following request, ...

I try looking up directly from a transfomer, Datastage states it's not allowd a cycle operation. I try it with sequential file, it's allowed, but look up returns nothing even a match is there.

Thanks,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

olgc - let's keep your question in one place rather than repeating it when you find a similar topic from the past. This one is much more 'on topic' so I removed your post in the other thread.

Thanks.
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

olgc wrote:I try looking up directly from a transfomer, Datastage states it's not allowd a cycle operation. I try it with sequential file, it's allowed, but look up returns nothing even a match is there.
If you are doing a normal lookup, then it reads all the reference data into memory before processing the main input data.

If you are able to do a sparse lookup, then it does not read any reference data into memory, rather it looks up each record one by one. Sparse lookup should handle your scenarios.
Choose a job you love, and you will never have to work a day in your life. - Confucius
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

Thanks, Eric, for your good suggestion. But only database table look up can be configured with sparse look up as statement below. For look up into sequential file, there is no sparse look up possible. Thanks,
Configuring sparse lookup operations
Data that is read by a database stage can serve as reference data to a Lookup stage. By default, this reference data is loaded into memory like any other reference link. When directly connected as the reference link to a Lookup stage, you can configure the Lookup Type property of the DB2 connector to Sparse and send individual SQL statements to the database for each incoming Lookup row.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Perhaps you could remove duplicates prior to assigning surrogate keys.
Choose a job you love, and you will never have to work a day in your life. - Confucius
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

Even they are the same customer, they are from different service providers, so it's a different account, you can not remove them. Business needs both of them in the system.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I guess I don't understand what you are asking.
olgc wrote:In the same run, the new customer can be present in more than one service providers, so can have more than one instance. When you generate a key for the first instance, and when the second comes in, you need to find his/her key in the dataset, or it will generate another key for him/her.

So how do you handle this scenario?
If both of them are needed in the system, then shouldn't each of them have unique keys generated?
Choose a job you love, and you will never have to work a day in your life. - Confucius
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

what you can do is to hold the records and not write before you de-duplicate it and then insert only one customer record in the target.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Post Reply