Read and write from single dataset in job

shaimil · Post by **shaimil** » Tue Aug 05, 2008 7:11 am

I'm trying to use a single dataset(once as lookup and again as target) in the same job. Can someone please let me know if this is possible as I keep getting the following error.

Operator initialization: A link between two operators should be named with a .v; insert a copy operator to save a persistent copy of the data

Thanks

sanjumsm · Post by **sanjumsm** » Tue Aug 05, 2008 7:59 am

In a job two dataset can not have same name. You have to split the job.

shaimil · Post by **shaimil** » Tue Aug 05, 2008 9:02 am

So is it not possible to implement the equivalent of a hash file r/w in a single job using PX?

ray.wurlod · Post by **ray.wurlod** » Tue Aug 05, 2008 4:21 pm

It is not. It's what's called a "blocking operation" and would interfere with pipeline parallelism, so is forbidden.

shaimil · Post by **shaimil** » Tue Aug 05, 2008 4:36 pm

Ray,

I guess the lookup file set doesn't suffer from the same restrictions as that seemed to work, although I'm still testing to establish whether the lookup fileset does the equivalent of a truncate and load rather than an append.

What in yor mind would be a reasonable approach to replicating the Server hash file lookup/write.

ray.wurlod · Post by **ray.wurlod** » Tue Aug 05, 2008 5:32 pm

There isn't one. Stop thinking like a server job developer. Start envisaging virtual Data Sets, and learn how an index to a virtual Data Set is built "on the fly" on a reference input link to a Lookup stage, except when that link is serviced by a Lookup File Set or when sparse lookup is specified.

olgc · Post by **olgc** » Thu Sep 11, 2014 8:01 am

This is not a case for server job or enterprise job. It's problem how enterprise edition can handle it. Such as for surrogate key case: you generate a surrogate key for a new customer. In the same run, the new customer can be present in more than one service providers, so can have more than one instance. When you generate a key for the first instance, and when the second comes in, you need to find his/her key in the dataset, or it will generate another key for him/her.

So how do you handle this scenario?

There are other situations fit in the scenario, such as in web service, a session token from previous request, is used in following request, ...

I try looking up directly from a transfomer, Datastage states it's not allowd a cycle operation. I try it with sequential file, it's allowed, but look up returns nothing even a match is there.

Thanks,

chulett · Post by **chulett** » Thu Sep 11, 2014 8:18 am

olgc - let's keep your question in one place rather than repeating it when you find a similar topic from the past. This one is much more 'on topic' so I removed your post in the other thread.

Thanks.

qt_ky · Post by **qt_ky** » Thu Sep 11, 2014 9:17 am

olgc wrote:I try looking up directly from a transfomer, Datastage states it's not allowd a cycle operation. I try it with sequential file, it's allowed, but look up returns nothing even a match is there.

If you are doing a normal lookup, then it reads all the reference data into memory before processing the main input data.

If you are able to do a sparse lookup, then it does not read any reference data into memory, rather it looks up each record one by one. Sparse lookup should handle your scenarios.

olgc · Post by **olgc** » Thu Sep 11, 2014 12:44 pm

Thanks, Eric, for your good suggestion. But only database table look up can be configured with sparse look up as statement below. For look up into sequential file, there is no sparse look up possible. Thanks,

Configuring sparse lookup operations
Data that is read by a database stage can serve as reference data to a Lookup stage. By default, this reference data is loaded into memory like any other reference link. When directly connected as the reference link to a Lookup stage, you can configure the Lookup Type property of the DB2 connector to Sparse and send individual SQL statements to the database for each incoming Lookup row.

qt_ky · Post by **qt_ky** » Thu Sep 11, 2014 7:38 pm

Perhaps you could remove duplicates prior to assigning surrogate keys.

olgc · Post by **olgc** » Fri Sep 12, 2014 7:20 am

Even they are the same customer, they are from different service providers, so it's a different account, you can not remove them. Business needs both of them in the system.

qt_ky · Post by **qt_ky** » Fri Sep 12, 2014 9:38 am

I guess I don't understand what you are asking.

olgc wrote:In the same run, the new customer can be present in more than one service providers, so can have more than one instance. When you generate a key for the first instance, and when the second comes in, you need to find his/her key in the dataset, or it will generate another key for him/her.

So how do you handle this scenario?

If both of them are needed in the system, then shouldn't each of them have unique keys generated?

priyadarshikunal · Post by **priyadarshikunal** » Mon Sep 15, 2014 4:24 am

what you can do is to hold the records and not write before you de-duplicate it and then insert only one customer record in the target.