RxNorm to HL7 RIM

Here we try to reconstruct SPL like data element from RxNorm.

Roles

drop table rxn_role_stage
go
create table rxn_role_stage as
select distinct * from (
select id, 
       scoperId, scoperType, scoperName,
       case when playerId is null then moietyId else playerId end as playerId,
       playerName, 
       moietyId, moietyName,
       str as roleName, 
       substr(str,0,a-1) as roleNameStem, 
       substr(str,a+1,b-a-1) as quantityNumber, 
       substr(str,b+1) as quantityUnit
  from (
select scdc.rxcui AS id, 
       scd.rxcui AS scoperId,  scd.tty AS scoperType, scd.str AS scoperName,
       ping.rxcui AS playerId, ping.str AS playerName,
       ing.rxcui AS moietyId,  ing.str AS moietyName,
       scdc.str, instr(scdc.str,' ',-1,1) b, instr(scdc.str,' ',-1,2) a
  from umls.rxnconso scdc
    inner join umls.rxnrel ring on(ring.rxcui2 = scdc.rxcui and ring.rela='has_ingredient')
    inner join umls.rxnconso ing on(ing.rxcui = ring.rxcui1)
    inner join umls.rxnrel rscd on(rscd.rxcui2 = scdc.rxcui and rscd.rela='constitutes')
    inner join umls.rxnconso scd on(scd.rxcui = rscd.rxcui1)
    left outer join (
      select rm1.rxcui2 as ing_rxcui, rm2.rxcui1 as scdc_rxcui, m1.* 
        from umls.rxnrel rm1
          inner join umls.rxnconso m1 on(m1.rxcui = rm1.rxcui1)
          inner join umls.rxnrel rm2 on(rm2.rxcui2 = m1.rxcui and rm2.rela='precise_ingredient_of')
        where rm1.rela='has_form'
          and m1.tty = 'IN' and m1.sab='RXNORM'
    ) ping on(ping.ing_rxcui = ing.rxcui AND ping.scdc_rxcui = scdc.rxcui)
  where scdc.tty='SCDC' and scdc.sab='RXNORM'
    and scd.tty IN ('SCD', 'SBD') and scd.sab='RXNORM'
    and ing.tty='IN' and ing.sab = 'RXNORM'
))

71339 record(s) affected

Now with this staging data, we should be able to create Entity and Role objects to RxNorm. We won't create them now, but we shall run the same analysis as for SPL on them and then compare.

We can now have the ingredient vs. precise ingredient lined up with SPL:

  • precise ingredient is only mentioned if it is different from the moiety
  • ingredient is the active moiety and always present

Then we compare how many of the abstract forms match in RxNorm.

Ingredient Classes

The short version that we used for SPL classification is somewhat inefficient because we have such huge numbers. So here is a version which should be right yet much faster (a minute instead of 6 hours.) It should work because if we match all the pairs and count the number of features that match between them, that number of features should be equal to the number of features that each item in the pair has.

DROP TABLE rxn_ecm_am_SubjectTrait 
GO
CREATE TABLE rxn_ecm_am_SubjectTrait AS (
  SELECT DISTINCT
         i.scoperId AS subjectId,
         i.playerId AS traitValue
    FROM rxn_role_stage i
) --  71251 record(s) affected 
go
CREATE INDEX rxn_ecm_am_SubjectTrait_idx ON rxn_ecm_am_SubjectTrait(subjectId, traitValue)
go
DROP TABLE rxn_ecm_am_SubjectTraitCount
GO
CREATE TABLE rxn_ecm_am_SubjectTraitCount AS (
  SELECT subjectId,
         count(1) AS traitCount
    FROM rxn_ecm_am_SubjectTrait
    GROUP BY subjectId
) -- 48273 record(s) affected 
go
DROP INDEX rxn_ecm_am_SubjectTraitCnt_pk
GO
CREATE UNIQUE INDEX rxn_ecm_am_SubjectTraitCnt_pk ON rxn_ecm_am_SubjectTraitCount(subjectId, traitCount)
go
DROP TABLE rxn_ecm_am_eq_candidate
GO
CREATE TABLE rxn_ecm_am_eq_candidate AS (
  SELECT s1.subjectId AS subjectId,
         s2.subjectId AS subject2Id,
         count(1) AS traitCount
    FROM rxn_ecm_am_SubjectTrait s1
      INNER JOIN rxn_ecm_am_SubjectTrait s2 USING(traitValue)
   GROUP BY s1.subjectId, s2.subjectId
)
-- 26833301 record(s) affected 
GO
DROP TABLE rxn_ecm_am_Equivalence
go
CREATE TABLE rxn_ecm_am_Equivalence AS (
  SELECT s.*
    FROM rxn_ecm_am_eq_candidate s
      INNER JOIN rxn_ecm_am_SubjectTraitCount c1 ON(c1.subjectId = s.subjectId AND c1.traitCount = s.traitCount)
      INNER JOIN rxn_ecm_am_SubjectTraitCount c2 ON(c2.subjectId = s.subject2Id AND c2.traitCount = s.traitCount)
)
-- 2478829 record(s) affected 
go
DROP TABLE rxn_ecm_am_ClassMember
go
CREATE TABLE rxn_ecm_am_ClassMember AS
 SELECT MIN(subject2Id) AS classId, subjectId AS memberId, traitCount
    FROM rxn_ecm_am_Equivalence
    GROUP BY subjectId, traitCount
-- 48273 record(s) affected 

We can already see that there are exactly as many class-member relationships here as there are available members (48273) which is good. Secondly we know that the number of traits needs to be the same for all classes, so

select classId, count(1) as memberCount,  min(traitCount) as minTraitCount
  from rxn_ecm_am_ClassMember 
  group by CLASSID having  min(traitCount) <> max(traitCount)

returns nothing, correct. Here is the class makeup:

select count(1) as classCount,
       min(memberCount), median(memberCount), avg(memberCount), max(memberCount),
       min(traitCount), median(traitCount), avg(traitCount), max(traitCount)
  from (
select classId, count(1) as memberCount, min(traitCount) as traitCount
  from rxn_ecm_am_ClassMember
  group by classId
)
thennow
CLASSESCOUNT72847353
MEMBER COUNTMIN11
MEDIAN22
AVG6.62736.7431
MAX346344
TRAIT COUNTMIN11
MEDIAN22
AVG1.88921.8930
MAX3232

Then we think that there should be one SCD for each class, because that really is the class? Or how about the dose form? O.K. we would have to classify by dose form to make that happen.

select count(1) as classCount,
       min(memberCount), median(memberCount), avg(memberCount), max(memberCount),
       min(traitCount), median(traitCount), avg(traitCount), max(traitCount)
  from (
select classId, count(1) as memberCount, min(traitCount) as traitCount
  from (
    select classId, memberId, traitCount
      from rxn_ecm_am_ClassMember cm
        inner join rxn_role_stage on(scoperId = memberId and scoperType = 'SCD')
    group by classId, memberId, traitCount
  ) group by classId
)
CLASSESCOUNT72787353
MEMBER COUNTMIN11
MEDIAN22
AVG4.18854.2485
MAX135135
TRAIT COUNTMIN11
MEDIAN22
AVG1.88951.8930
MAX3232

We find there are more SCDs than SBDs. Which is weird. We think we should only consider SBDs to be fair. So, let's exclude all classes that have no SBDs. That would get rid of 4565 classes leaving only 2487 classes. Now that looks a bit closer to SPL!

SELECT COUNT(1) FROM (
SELECT classId, sum(scd) as scd, sum(sbd) as sbd FROM (
SELECT classId, memberId,
       case when scoperType = 'SCD' then 1 else 0 end as SCD,
       case when scoperType = 'SBD' then 1 else 0 end as SBD
  from rxn_ecm_am_ClassMember 
    inner join rxn_role_stage on(scoperId = memberId)
) GROUP BY classId
) WHERE sbd > 0

So, let's see how many of these classes we can match up with SPL.

We need to go through our UNII_RXN_MAP for that.

create index unii_rxn_map_unii_idx on unii_rxn_map(code_code, code_codeSystem)
go
drop table spl_rxn_match_candidate
go
create table spl_rxn_match_candidate as
select e.classId as spl_classId, 
       e.memberId as spl_memberId, 
       e.traitCount as spl_traitCount,
       r.playerInternalId as spl_ingredientId,
       q.subject2Id as spl_matching_stuffId,
       m.code_code as spl_matching_unii,
       m.rxcui as rxn_ingredientId,
       rxn_eq.traitCount as rxn_traitCount,
       rxn_eq.memberId as rxn_memberId,
       rxn_eq.classId as rxn_classId
  from ecm_ActiveIngredients e
    inner join mw.Role_ r on(r.scoperInternalId = classId and r.classCode='ACTI')
    inner join SPL_STUFFEQUIVALENCE q on(q.subject1Id = r.playerInternalId)
    inner join mw.Entity s on(s.internalId = q.subject2Id)
    inner join unii_rxn_map m on(m.code_code = s.code_code and m.code_codeSystem = s.code_codeSystem)
    inner join rxn_role_stage rxn on(rxn.moietyId = m.rxcui)
    inner join rxn_ecm_am_ClassMember rxn_eq on(rxn_eq.memberId = rxn.scoperId)

741600 record(s) affected (now: 1096478)

drop table spl_rxn_match
go
create table spl_rxn_match as
select spl_classId, spl_memberId, rxn_memberId, rxn_classId, spl_traitCount as traitCount, count(1) as matchingTraitCount
  from spl_rxn_match_candidate
  where spl_traitCount = rxn_traitCount
  group by spl_classId, spl_memberId, rxn_memberId, rxn_classId, spl_traitCount
  having spl_traitCount = count(1)

178496 record(s) affected (now: 268823)

select 'spl classes', count(distinct classId) from ecm_ActiveMoieties
union all
select 'rxn classes', count(distinct classId) from rxn_ecm_am_ClassMember
union all
select 'matching spl classes', count(distinct spl_classId) from spl_rxn_match
union all
select 'matched rxn classes', count(distinct rxn_classId) from spl_rxn_match
union all
select 'candidate spl classes', count(distinct spl_classId) from spl_rxn_match_candidate
union all
select 'candidate rxn classes', count(distinct rxn_classId) from spl_rxn_match_candidate

Then:

  • spl classes 950
  • rxn classes 7284
  • matching spl classes 954
  • matched rxn classes 955
  • candidate spl classes 1011
  • candidate rxn classes 3101

Now:

  • spl classes 1172
  • rxn classes 7353
  • matching spl classes 1177
  • matched rxn classes 1146
  • candidate spl classes 1250
  • candidate rxn classes 3514

Now we need to go two ways.

  1. Check if those drugs that match on NDC code also agree on their ingredients.
  2. Gap analysis: what are the ones that don't match from SPL into RxN and vice versa?
  3. NDC codes vendor analysis, are the non matching vendors explained, do the matching vendors match completely?
  4. NDC listing files, are the ones discontinued by the innovator marked in the FDA files?

Check NDC Codes Have Equivalent Ingredients

with a as (
select ndc, rxcui as rxn_memberId, e.internalId as spl_memberId
  from rxn_ndc
    inner join spl_druglisting using(ndc)
    inner join mw.Entity e on(code_code = mndc)
)
select count(distinct ndc) from a
union all
select count(distinct ndc) 
  from a
    inner join spl_rxn_match using(spl_memberId, rxn_memberId)
  • total 7083 (now: 12703)
  • with matching ingredients 6476 (now: 12064)

If NDCs agree but content does not agree it could be because of a mapping problem of one ingredient. It should never be because of difference in the number of ingredients. And bingo:

  • MATCHED_NDCS 7083
  • AGREEING_NDCS 6794

Most of them match fine, but some of them are different, and most of those that are different are disagreeing in number of ingredients. It gets even better:

with ndc_match as (
select ndc, rxcui as rxn_memberId, e.internalId as spl_memberId
  from rxn_ndc
    inner join spl_druglisting using(ndc)
    inner join mw.Entity e on(code_code = mndc)
), matched_ndcs as (
  select distinct ndc from ndc_match
), agreeing_ndcs as (
  select distinct ndc from ndc_match
    inner join spl_rxn_match using(spl_memberId, rxn_memberId)
), ndcs_totally_disagreeing as (
  select ndc from ndc_match
  minus
  select distinct ndc from ndc_match
    inner join spl_rxn_match_candidate using(spl_memberId, rxn_memberId)
), ndcs_disagreeing_in_number as (
  select distinct ndc from ndc_match
    inner join spl_rxn_match_candidate using(spl_memberId, rxn_memberId)
    where spl_traitCount <> rxn_traitCount
), ndcs_disagreeing_spl_lt_rxn as (
  select distinct ndc from ndc_match
    inner join spl_rxn_match_candidate using(spl_memberId, rxn_memberId)
    where spl_traitCount < rxn_traitCount
), ndcs_disagreeing_spl_gt_rxn as (
  select distinct ndc from ndc_match
    inner join spl_rxn_match_candidate using(spl_memberId, rxn_memberId)
    where spl_traitCount > rxn_traitCount
)
select
 (select count(1) from matched_ndcs) as matched_ndcs,
 (select count(1) from agreeing_ndcs) as agreeing_ndcs,
 (select count(1) from ndcs_disagreeing_in_number) as ndcs_disagreeing_in_number,
 (select count(1) from ndcs_disagreeing_spl_lt_rxn) as ndcs_disagreeing_spl_lt_rxn,
 (select count(1) from ndcs_disagreeing_spl_gt_rxn) as ndcs_disagreeing_spl_gt_rxn,
 (select count(1) from ndcs_totally_disagreeing) as ndcs_totally_disagreeing
from dual
MATCHED_NDCS 7083 12703
AGREEING_NDCS 6794 12064
NDCS_DISAGREEING_IN_NUMBER 70 90
NDCS_DISAGREEING_SPL_LT_RXN 8 22
NDCS_DISAGREEING_SPL_GT_RXN 62 68
NDCS_TOTALLY_DISAGREEING 128 355
drop table ndc_analysis
go
create table ndc_analysis as 
with ndc_match as (
select ndc, rxcui as rxn_memberId, e.internalId as spl_memberId
  from rxn_ndc
    inner join spl_druglisting using(ndc)
    inner join mw.Entity e on(code_code = mndc)
), matched_ndcs as (
  select distinct ndc from ndc_match
), agreeing_ndcs as (
  select distinct ndc from ndc_match
    inner join spl_rxn_match using(spl_memberId, rxn_memberId)
), ndcs_totally_disagreeing as (
  select ndc from ndc_match
  minus
  select distinct ndc from ndc_match
    inner join spl_rxn_match_candidate using(spl_memberId, rxn_memberId)
), ndcs_disagreeing_in_number as (
  select distinct ndc from ndc_match
    inner join spl_rxn_match_candidate using(spl_memberId, rxn_memberId)
    where spl_traitCount <> rxn_traitCount
), ndcs_disagreeing_spl_lt_rxn as (
  select distinct ndc from ndc_match
    inner join spl_rxn_match_candidate using(spl_memberId, rxn_memberId)
    where spl_traitCount < rxn_traitCount
), ndcs_disagreeing_spl_gt_rxn as (
  select distinct ndc from ndc_match
    inner join spl_rxn_match_candidate using(spl_memberId, rxn_memberId)
    where spl_traitCount > rxn_traitCount
)
select 'matched_ndcs' as what, ndc from matched_ndcs union all 
select 'agreeing_ndcs' as what, ndc from agreeing_ndcs union all 
select 'ndcs_totally_disagreeing' as what, ndc from ndcs_totally_disagreeing union all 
select 'ndcs_disagreeing_in_number' as what, ndc from ndcs_disagreeing_in_number union all 
select 'ndcs_disagreeing_spl_lt_rxn' as what, ndc from ndcs_disagreeing_spl_lt_rxn union all 
select 'ndcs_disagreeing_spl_gt_rxn' as what, ndc from ndcs_disagreeing_spl_gt_rxn

14145 record(s) affected

with pair as (
  select distinct what, ndc, rxcui as rxn_scoperId, e.internalId as spl_scoperId
    from ndc_analysis
      inner join rxn_ndc r using(ndc)
      inner join spl_druglisting l using(ndc)
      inner join mw.Entity e on(e.code_code = mndc and e.code_codeSystem = '2.16.840.1.113883.6.69')
    where what in ('ndcs_totally_disagreeing', 'ndcs_disagreeing_spl_lt_rxn', 'ndcs_disagreeing_spl_gt_rxn')
), spl_traits as (
  select r.scoperInternalId as scoperId, max(n.trivialname) as playerName, min(traitCount)
    from mw.Role_ r 
      inner join mw.Entity_name n on(n.entityInternalId = r.playerInternalId)
      inner join ecm_ActiveIngredients eam on(memberId = r.scoperInternalId)
    where r.classCode='ACTI'
    group by r.scoperInternalId, r.playerInternalId
), rxn_traits as (
  select scoperId as scoperId, moietyName||'|'||playerName as playerName, traitCount
    from rxn_role_stage
      inner join rxn_ecm_am_ClassMember eam on(memberId = scoperId)
) select p.*, 'spl', t.*
    from pair p
      inner join spl_traits t on(t.scoperId = p.spl_scoperId)
union all
select p.*, 'rxn', t.*
    from pair p
      inner join rxn_traits t on(t.scoperId = p.rxn_scoperId)

Shows some minor differences. Shows more some glitches in SPL is than any real interesting discrepancies.

There was an issue with ecm_ActiveMoieties

And that again, oh sadness, falls flat on its face. Turns out the differences in ingredient numbers come about because of a problem in the ActiveMoiety? traits. There are some things listed as ActiveMoieties? that should really be ingredients.

Example:

medingrmoietyname
692bc0af95f9triamcinolone
692bc0afc0aftriamcinolone acetonide

So, let's do the entire spiel again with ecm_ActiveIngredient instead.