SQL query to be done using DataStage

ramesh_ct · Post by **ramesh_ct** » Mon Sep 13, 2004 1:02 am

Hi All,

I have the following SQL query to be done using DataStage...

SELECT distinct(Cd_corretor), Cd_administradora, Cd_filial,Dv_tipo_pessoa,Nm_endereco, Nm_bairro, Nm_municipio, Nm_uf, Nm_cep,Nm_fone,Nm_fax, Nm_corretor, Nm_email, Nm_endereco1,Nm_endereco2,a.Cd_uf,a.Cd_orgao_produtor,a.Dt_admissao, a.Cd_status
FROM Corretor a
LEFT OUTER JOIN Estado b
ON a.Cd_uf = b.Cd_uf
where Cd_corretor <> '0'
Order By Cd_corretor

I'm unable to accomplish the same in DataStage... Can someone please tell me how to achieve the same using DataStage

1. Selecting Distinct records
2. WHERE clause
3. Joining the tables

thanks & best regards
Ramesh

ray.wurlod · Post by **ray.wurlod** » Mon Sep 13, 2004 1:09 am

Probably the easiest thing to do is to specify user-defined SQL. Then you can use your exact query; depending on your stage type the tab caption may change from "View SQL" to "Edit SQL".

The other thing that must be borne in mind is that the columns in your design must agree absolutely with the number and data type of the columns in your query.

Were it not for the DISTINCT requirement, you can specify the GROUP BY in the column definitions (on the Columns tab), the join in the Table field (as "Corretor a LEFT OUTER JOIN Estado b ON a.Cd_uf = b.Cd_uf") and the WHERE and ORDER BY pieces on the Selection tab.

You did not mention which stage type you are using. You might even be able to get away with preceding the first column on the Columns tab with the keyword DISTINCT in the Derivation column of the grid. It's worth trying; the second best way to learn is by doing.

denzilsyb · Post by **denzilsyb** » Mon Sep 13, 2004 1:15 am

depending on how many records you are working with, you have various options open to you.

1. use the ODBC stage and specify the db and select USER defined query and paste this select into the SQL box. you need to specify the column names in the columns tab, once that is done you can run the select.

2. With this technique you will bring the data down to DS and then process the data within DS

2.1 ODBC to select relevant records from Corretor to a SEQuential stage
2.2 ODBC to select relevant records from Estado to a HASH stage
2.3 in a transformer have the SEQ stage from 2.1 designated as the STREAM input reference the HASH file (2.2) on the key column
2.4 The constraint in the transform stage should be something like:

NOT(LINK.NOTFOUND) and LINK.CD_CORRETOR <> '0'
2.5 have this output sent to a HASH stage and dupicates will be removed based on the key you select

the order by can be done using the SORT stage, but I would rather send it to the DB and select the data again

ogmios · Post by **ogmios** » Mon Sep 13, 2004 1:50 am

Note also that in SQL the distinct keyword applies to all of the attributes and cannot be applied to only a few of them (as the original query seems to indicate).

Ogmios

ramesh_ct · Post by **ramesh_ct** » Mon Sep 13, 2004 3:29 am

Dear Denzil

Thanks for your help.

I did not understand the step 2.3 and please explain me when to use Sequetial Stage & Hashed file stage.

best regards
Ramesh

denzilsyb · Post by **denzilsyb** » Mon Sep 13, 2004 4:00 am

Ramesh

We use the HASH file stage mostly as a lookup stage in our DS jobs. this means that we populate the HASH stage with a key column and the detail columns required if a lookup is successfull.

for example, we will place cust_id and cust_name in the HASH stage with cust_id defined as the key.

When we pull data from a database source we usually first populate a SEQuential file so that the load is off the db as fast as possible. The SEQuential file becomes our STREAM input into the transformer.

Lets say for instance we want an output SEQuential file generated that has cust_name and amount_spent columns. cust_name will come from the HASH stage where cust_id in the SEQuential file matches cust_id in the HASH file.

illustrated:

Code: Select all


             HASH
                |
SEQ ---------> TFM ------> OUTPUT

This is about as simple as i can make it. Just remember that the HASH file needs a key column and that your STREAM input (regardless of whether it is a ODBC or SEQ) needs to match up to the key column - much like a SELECT... FROM a, b WHERE a.col_a = b.col_b

good luck

vmcburney · Post by **vmcburney** » Mon Sep 13, 2004 5:32 pm

It's a good design to move the outer joined table into a hash file and bring it in as a lookup. It lets you select from the Estado table using column generated SQL instead of user defined, this is more maintainable and robust. It probably removes the need to use the DISTINCT keyword. If your Estado has a non unique primary key and you are selecting all the key fields in your job then you don't need the DISTINCT clause.