need a help on job design

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

need a help on job design

Post by dr.murthy »

hi all,

could you please share your thoughts for job design of the following reqirement
mine source data is

ID,LOC,PHONE,COUNTRY
1,Home,1234,IND
1,Work,0012,IND
2,Home,0345,AUS
3,Work,0234,JAP
4,Work,1357,USA
4,Home,5678,USA

i need an out put like

ID,HOME_PHONE,WORK_PHONE,COUNTRY
1,1234,0012,IND
2,0345,NULL,AUS
3,NULL,0234,JAP
4,5678,1357,USA


ANY SUGGESTIONS????
D.N .MURTHY
dhanashreepanse
Participant
Posts: 25
Joined: Fri Jan 11, 2008 12:49 am
Location: Pune, India

Post by dhanashreepanse »

1. Split your input file into 2 streams based on the LOC (home or work). This will split into 3 rows each. Have all the columns except LOC.

2. Do a Full outer join on the 2 streams thus generated. Keys should be ID and COUNTRY.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Split the phone into two columns and pass through aggregator to max
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Will the country ever be different for the same value of ID?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

Post by dr.murthy »

ray.wurlod wrote:Will the country ever be different for the same value of ID? ...

yes ray country code would be different .different id's having same country code.
D.N .MURTHY
jhmckeever
Premium Member
Premium Member
Posts: 301
Joined: Thu Jul 14, 2005 10:27 am
Location: Melbourne, Australia
Contact:

Post by jhmckeever »

Just to clarify: You're stating that a specific ID value may be associated with more than one COUNTRY? (In your example data, each ID has only a single COUNTRY associated to it)
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

Post by dr.murthy »

jhmckeever wrote:Just to clarify: You're stating that a specific ID value may be associated with more than one COUNTRY? (In your example data, each ID has only a single COUNTRY associated to it)
yes, here id means it's an emp_id field,so in each COUNTRY n number of emp_id's are available.COUNTRY it's not an key field
D.N .MURTHY
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

dr.murthy wrote:
jhmckeever wrote:Just to clarify: You're stating that a specific ID value may be associated with more than one COUNTRY? (In your example data, each ID has only a single COUNTRY associated to it)
yes, here id means it's an emp_id field,so in each COUNTRY n number of emp_id's are available.COUNTRY it's not an key field
So Which Country (Home or work) you are planning to put in country field (Output) if they can be different for same Emp ID.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

Post by dr.murthy »

So Which Country (Home or work) you are planning to put in country field (Output) if they can be different for same Emp ID.[/quote]

hi ,
please find my source data format MY SOURCE DATA IS

EMPID,LOC,PHONE,COUNTRY
1,Home,1234,IND
2,Home,0345,IND
3,Work,0234,JAP
4,Work,1357,USA
1,Work,0012,IND
4,Home,5678,USA

I NEED AN OUT PUT SOME THING LIKE

EMPID,HOME_PHONE,WORK_PHONE,COUNTRY
1,1234,0012,IND
2,0345,NULL,IND
3,NULL,0234,JAP

in source data some empid's doesnot have neither home nor work phone numbers.so i design the job some thing like

SEQFILE---->SWITCH----->


IN SWITCH I SPLITTED THE DATA BASED ON LOC MEANS HOME REC GOES TO ONE LINK AND WORK RECORDS GOES TO SOME OTHER LINK
FINALLY I JOINED THEM BOTH BASED ON EMPID AND RENAME THE FIELD NAMES.SUPPOSE IF THE RECORDS COMMING FROM LOC LINK THEN I RENAMED THE PHONE FIELD AS LOC_PHONE SIMILARLY IF THE RECORDS COMMING FROM HOME LINK IT WAS RENAMED AS HOME_PHONE

O/P GIVES ONLY FOUR FIELDS

EMPID,HOME_PHONE,WORK_PHONE,COUNTRY

BUT ISSUE IS IF SOME RECORDS DOESNOT HAVE HOME PHONE OR WORK PHONE WE ARE NOT ABLE TO HANDLE THOSE RECORDS
D.N .MURTHY
dhanashreepanse
Participant
Posts: 25
Joined: Fri Jan 11, 2008 12:49 am
Location: Pune, India

Post by dhanashreepanse »

First time when you split your file, capture all such records that do not have HOME PHONE OR WORK PHONE in a separate file.

In the end you can append this file to the actual output file.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

dr.murthy wrote:So Which Country (Home or work) you are planning to put in country field (Output) if they can be different for same Emp ID.
hi ,
please find my source data format MY SOURCE DATA IS

EMPID,LOC,PHONE,COUNTRY
1,Home,1234,IND
2,Home,0345,IND
3,Work,0234,JAP
4,Work,1357,USA
1,Work,0012,IND
4,Home,5678,USA

I NEED AN OUT PUT SOME THING LIKE

EMPID,HOME_PHONE,WORK_PHONE,COUNTRY
1,1234,0012,IND
2,0345,NULL,IND
3,NULL,0234,JAP

in source data some empid's doesnot have neither home nor work phone numbers.so i design the job some thing like

SEQFILE---->SWITCH----->


IN SWITCH I SPLITTED THE DATA BASED ON LOC MEANS HOME REC GOES TO ONE LINK AND WORK RECORDS GOES TO SOME OTHER LINK
FINALLY I JOINED THEM BOTH BASED ON EMPID AND RENAME THE FIELD NAMES.SUPPOSE IF THE RECORDS COMMING FROM LOC LINK THEN I RENAMED THE PHONE FIELD AS LOC_PHONE SIMILARLY IF THE RECORDS COMMING FROM HOME LINK IT WAS RENAMED AS HOME_PHONE

O/P GIVES ONLY FOUR FIELDS

EMPID,HOME_PHONE,WORK_PHONE,COUNTRY

BUT ISSUE IS IF SOME RECORDS DOESNOT HAVE HOME PHONE OR WORK PHONE WE ARE NOT ABLE TO HANDLE THOSE RECORDS[/quote]

Do you really need the records without any details.
If yes then pass the records having null values in the location field to third link and perform a full outer join on in all three links. But this depends on the requirement as if a value in number is present and location is not present then where to put that number.

EMPID,LOC,PHONE,COUNTRY
1,Home,1234,IND
1,Work,0345,USA

then what should be the output or you are sure as this will not happen.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

Post by dr.murthy »

yes i need all records,for example in my source data empid =2 doesnot have a work phone so i need a output record with work_phone is null.

but in my job design iam not able to capture those records in full outer join.can you please tell me is any other way to design he job.
D.N .MURTHY
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

dr.murthy wrote:yes i need all records,for example in my source data empid =2 doesnot have a work phone so i need a output record with work_phone is null.

but in my job design iam not able to capture those records in full outer join.can you please tell me is any other way to design he job.

Code: Select all

If yes then pass the records having null values in the location field to third link and perform a full outer join on in all three links. But this depends on the requirement as if a value in number is present and location is not present then where to put that number. 
As posted in other thread (Your current design). It may fail if data comes like.

EMPID,LOC,PHONE,COUNTRY
1,Home,1234,IND
1,,,IND

as you will get two records which might not be needed.

I cannot explain each and everything. Give the problem a thought based on all the suggestions you have got. Also think on the two probable resolutions you have in this thread.

One More thing is you haven't replied to my question in previous post mentioned in last few lines. :evil:
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

Post by dr.murthy »

hi,
i got a solution finally, i develop a job but it seems very complicated.i had to use four lookups in my job.

as you specified
EMPID,LOC,PHONE,COUNTRY
1,Home,1234,IND
1,,,IND

if the empid =1 doesnot have a work phone number in source only one record is comming like
EMPID,LOC,PHONE,COUNTRY
1,Home,1234,IND
but in output i need work phone number is null FORempid=1

like
EMPID,WRK_PHN,HOME_PHN,LOC
1, NULL ,1234,,IND
IS THERE ANY SIMPLE WAY TO DESIGN THIS JOB
D.N .MURTHY
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No need to shout! (All capitals = shouting)

There is a very simple solution, to which others have alluded and which requires no lookups at all.

The simple design is to use data sorted by ID, assemble the record in a Transformer stage populating the two phone numbers into separate fields with appropriate null handling, and using stage variables to remember the previous record. The output from the Transformer stage is then run through a single Aggregator stage which outputs the Last of each group; the data are grouped on ID. No lookups are required.

This solution is right for server job, because you posted in the server job forum. If it really is a parallel job, substitute a Remove Duplicates stage for the Aggregator stage and make sure that the data are hash or modulus partitioned by ID.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply