Duplicated entries although using RemoveDuplicates stage
Moderators: chulett, rschirm, roy
Could you add a peek of all rows between remove duplicates and lookup and manually check row 2257 +- 1?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 291
- Joined: Wed Sep 26, 2007 11:23 am
- Location: Madrid, Spain
I know that Peek stage gets data from an input (in this case, the rem dup stage) ... but....can output this same data to the lookup?ArndW wrote:Could you add a peek of all rows between remove duplicates and lookup and manually check row 2257 +- 1?
How shall I check those records manually? in the peek? how?
Thanks a lot
-
- Premium Member
- Posts: 291
- Joined: Wed Sep 26, 2007 11:23 am
- Location: Madrid, Spain
OK, I did it myself. I put a Peek stage between RemDup and Lookup stage, but I did not get too much help.manuel.gomez wrote:I know that Peek stage gets data from an input (in this case, the rem dup stage) ... but....can output this same data to the lookup?ArndW wrote:Could you add a peek of all rows between remove duplicates and lookup and manually check row 2257 +- 1?
How shall I check those records manually? in the peek? how?
Thanks a lot
I configured the peek like this:
It gave me this, but I dont know if I am getting row 2257, it does not seem to be correctAll records = False
Number of records = 3
Period = 2256
Peek_50,0: REF_OFERTA:10 CIF:A08829699 COD_GESTOR:mruizji COD_EST_OF:AC IN_ANEXOS:NO EST_PRO:NO
Peek_50,0: REF_OFERTA:12880 CIF:Q2800540C COD_GESTOR:acalvo1 COD_EST_OF:AC IN_ANEXOS:NO EST_PRO:NO
Peek_50,0: REF_OFERTA:15076 CIF:B38480752 COD_GESTOR:rperezd6 COD_EST_OF:AC IN_ANEXOS:NO EST_PRO:NO
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
use a copy stage between remove duplicate and lookup and from copy stage pass it to peek(to anlyze that in log)/sequential file/dataset where you want to analyze tha data.manuel.gomez wrote:I know that Peek stage gets data from an input (in this case, the rem dup stage) ... but....can output this same data to the lookup?ArndW wrote:Could you add a peek of all rows between remove duplicates and lookup and manually check row 2257 +- 1?
How shall I check those records manually? in the peek? how?
Thanks a lot
try this
split the job in to 2 in job1 read it from ODBC and after the transformation put it in a dataset.
in job2 perform lookup
run that and reply with the result
Regards,
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
1 more thing check the keys defined in remove duplicate and lookup i hope both are same
and dont add period
it will give you 1 record after each XXXX record defined as period
use skip instead.
get record number 2257 and add the keys in transformer above in constraint and check how many records you get?
Can you tell us what properties actually you have defined in remove duplicate explicitly (Not the default properties)
and dont add period
it will give you 1 record after each XXXX record defined as period
use skip instead.
get record number 2257 and add the keys in transformer above in constraint and check how many records you get?
Can you tell us what properties actually you have defined in remove duplicate explicitly (Not the default properties)
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Premium Member
- Posts: 291
- Joined: Wed Sep 26, 2007 11:23 am
- Location: Madrid, Spain
uuuupssss........it seems you found the problem......priyadarshikunal wrote:1 more thing check the keys defined in remove duplicate and lookup i hope both are same
Anyway, this fixes the warning in the lookup stage, but still I cant get desired results
This just became a sql query issue, you may help me, because I must be missing something really stupid
How is it possible this query:
returns 9851 rows ,SELECT G.REF_OFERTA , R.COD_GR_EMP
FROM
REL_GR_EMP_EMP R, GRUPOS_EMPRESARIALES GR
COFER_LISTA_CIFS C, COFER_DATOS_GRAL G,
WHERE
R.COD_GR_EMP = GR.COD_GR_EMP AND
R.CIF = C.CIF AND
C.REF_OFERTA = G.REF_OFERTA AND
G.COD_EST_OF='AC'
and this one only 9785:
For me, they are doing the same (but they obviously dont, as I dont get same results)SELECT A.COD_GR_EMP
FROM
(SELECT
REL_GR_EMP_EMP.COD_GR_EMP,
REL_GR_EMP_EMP.CIF
FROM
dbo.GRUPOS_EMPRESARIALES AS GRUPOS_EMPRESARIALES
INNER JOIN
dbo.REL_GR_EMP_EMP AS REL_GR_EMP_EMP
ON GRUPOS_EMPRESARIALES.COD_GR_EMP = REL_GR_EMP_EMP.COD_GR_EMP
GROUP BY
REL_GR_EMP_EMP.COD_GR_EMP,
REL_GR_EMP_EMP.CIF ) A ,
(SELECT COFER_LISTA_CIFS.CIF
FROM
dbo.COFER_DATOS_GRAL AS COFER_DATOS_GRAL
INNER JOIN
dbo.COFER_LISTA_CIFS AS COFER_LISTA_CIFS
ON COFER_DATOS_GRAL.REF_OFERTA = COFER_LISTA_CIFS.REF_OFERTA
WHERE COFER_DATOS_GRAL.COD_EST_OF = 'AC'
GROUP BY COFER_LISTA_CIFS.CIF ) B
WHERE A.CIF = B.CIF
Thanks for your help!!!!
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
as we don't know the keys and the relationship between all the tables used in the query, i am unable to answer.manuel.gomez wrote:uuuupssss........it seems you found the problem......priyadarshikunal wrote:1 more thing check the keys defined in remove duplicate and lookup i hope both are same
Anyway, this fixes the warning in the lookup stage, but still I cant get desired results
This just became a sql query issue, you may help me, because I must be missing something really stupid
How is it possible this query:returns 9851 rows ,SELECT G.REF_OFERTA , R.COD_GR_EMP
FROM
REL_GR_EMP_EMP R, GRUPOS_EMPRESARIALES GR
COFER_LISTA_CIFS C, COFER_DATOS_GRAL G,
WHERE
R.COD_GR_EMP = GR.COD_GR_EMP AND
R.CIF = C.CIF AND
C.REF_OFERTA = G.REF_OFERTA AND
G.COD_EST_OF='AC'
and this one only 9785:
For me, they are doing the same (but they obviously dont, as I dont get same results)SELECT A.COD_GR_EMP
FROM
(SELECT
REL_GR_EMP_EMP.COD_GR_EMP,
REL_GR_EMP_EMP.CIF
FROM
dbo.GRUPOS_EMPRESARIALES AS GRUPOS_EMPRESARIALES
INNER JOIN
dbo.REL_GR_EMP_EMP AS REL_GR_EMP_EMP
ON GRUPOS_EMPRESARIALES.COD_GR_EMP = REL_GR_EMP_EMP.COD_GR_EMP
GROUP BY
REL_GR_EMP_EMP.COD_GR_EMP,
REL_GR_EMP_EMP.CIF ) A ,
(SELECT COFER_LISTA_CIFS.CIF
FROM
dbo.COFER_DATOS_GRAL AS COFER_DATOS_GRAL
INNER JOIN
dbo.COFER_LISTA_CIFS AS COFER_LISTA_CIFS
ON COFER_DATOS_GRAL.REF_OFERTA = COFER_LISTA_CIFS.REF_OFERTA
WHERE COFER_DATOS_GRAL.COD_EST_OF = 'AC'
GROUP BY COFER_LISTA_CIFS.CIF ) B
WHERE A.CIF = B.CIF
Thanks for your help!!!!
however both queries are different, once grouped on few columns makes the result a set of unique records. Get help from someone in your team, hope others can find one's fault easily.
and also don't post two queries in one thread.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
manuel.gomez - just an additional tidbit of information regarding the PEEK stage. You can put it inline into a data stream, but you need to ensure that you make it use ALL data rows and just drag the input columns to output.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 291
- Joined: Wed Sep 26, 2007 11:23 am
- Location: Madrid, Spain
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Relating to "duplicates coming out of Remove Duplicates" stage, no-one seems to have picked up on the possibility that the data are not partitioned on the keys used to identify duplicates. This would cause the symptom described.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 22
- Joined: Wed Jul 02, 2008 7:01 am
- Location: London
Sort the row before removing duplicates
Hi,
When ever you use "Remove Duplicate" stage, the incoming data should be sorted. As per your design, i don't think your imcoming are not properly sorted. Please try to sort the row before the "Remove Duplicate" stage.
When ever you use "Remove Duplicate" stage, the incoming data should be sorted. As per your design, i don't think your imcoming are not properly sorted. Please try to sort the row before the "Remove Duplicate" stage.