Interaction Study
Objective
To test the value of SPL and its related terminologies for improving drug-interaction issue detection in general. Specifically to show that a "definitional" approach to drug classes using NDF-RT Mechanism of Action (MoA) and Physiologic Effect (PE) as well as ingredient chemical classes (from NDF-RT which is same as MeSH) will reduce the knowledge management effort and increase the sensitivity of drug-interaction issue detection systems while retaining a reasonable specificity.
Approach
- Defined Gopher interaction sets and computed the extension of these sets in Gopher terms.
- Analyze the difference between the ad-hoc defined Gopher sets and the extension computed from our definitions.
- Emulate the the Gopher drug-interaction method on our database
- Implement our detection method on our database
- Find all interaction issues using either method and compare the difference
The set-up of the evaluation is quite analogous to my AllergyStudy.
NDF-RT Definitions
See the attached spreadsheet.
In order to make the following faster, I just materialize Linas' Gopher to NDF-RT mapping:
create table MAP_GOPHER_NDFRT as select * from lsimonaitis.MAP_TERM_NDFRT_200903 gi create index MAP_GOPHER_NDFRT_I2 on MAP_GOPHER_NDFRT(SOURCE_ID)
Computing the extension of these sets.
First we noticed that we were missing Gopher terms that were mapped but not included in the gopher set definitions, because our gopher set extensions were outdated. See PseudoGopher for detial. This creates the tables PSEUDO_GOPHER_CONCEPT and PSEUDO_GOPHER_SET_PLUS.
drop table lsimonaitis.set_extension_20090309
go
create table lsimonaitis.set_extension_20090309 as
(
select distinct to_number(sd.term_id) as set_term, to_number(m.TERM) as term, query_form_2
-- select *
from lsimonaitis.SET_DEFINITION sd
inner join umls.NDFRT_CONCEPT c on(c.NAME = sd.parameter_1) -- !!
inner join umls.NDFRT_ISA_CLOSURE isa on (isa.TARGET_ID = c.id)
inner join umls.NDFRT_ROLE r on(r.TARGET_ID = isa.SOURCE_ID AND r.ROLE_DEF_ID = sd.parameter_t)
inner join umls.NDFRT_ISA_CLOSURE isa2 on (isa2.TARGET_ID = r.SOURCE_ID)
inner join lsimonaitis.MAP_TERM_NDFRT_200903 m on (m.source_ID = isa2.SOURCE_ID)
where sd.query_form_2 in ('QFA', 'QFB')
and sd.deprecated is null
intersect
select distinct to_number(sd.term_id) as set_term, to_number(m.TERM) as term, query_form_2
from lsimonaitis.SET_DEFINITION sd
inner join umls.NDFRT_CONCEPT c on(c.NAME = sd.parameter_2) -- !!
inner join umls.NDFRT_ISA_CLOSURE isa on (isa.TARGET_ID = c.id)
inner join umls.NDFRT_ROLE r on(r.TARGET_ID = isa.SOURCE_ID AND r.ROLE_DEF_ID = sd.parameter_t2)
inner join umls.NDFRT_ISA_CLOSURE isa2 on (isa2.TARGET_ID = r.SOURCE_ID)
inner join lsimonaitis.MAP_TERM_NDFRT_200903 m on (m.source_ID = isa2.SOURCE_ID)
where sd.query_form_2 in ('QFA', 'QFB')
and sd.deprecated is null
)
union
select distinct to_number(sd.term_id) as set_term, to_number(m.TERM) as term, query_form_2
from lsimonaitis.SET_DEFINITION sd
inner join umls.NDFRT_CONCEPT c on(c.NAME IN (sd.parameter_1, sd.parameter_2)) -- !!
inner join umls.NDFRT_ISA_CLOSURE isa on (isa.TARGET_ID = c.id)
inner join umls.NDFRT_ROLE r on(r.TARGET_ID = isa.SOURCE_ID AND r.ROLE_DEF_ID = sd.parameter_t)
inner join umls.NDFRT_ISA_CLOSURE isa2 on (isa2.TARGET_ID = r.SOURCE_ID)
inner join lsimonaitis.MAP_TERM_NDFRT_200903 m on (m.source_ID = isa2.SOURCE_ID)
where sd.query_form_2 in ('QF1', 'QF2', 'QF3', 'QF4', 'QF6', 'QF7', 'QF9')
and sd.deprecated is null
go
drop table lsimonaitis.set_diff_20090309
go
create table lsimonaitis.set_diff_20090309 as
select distinct set_term, term,
case when query_form_2 is null then 0 else 1 end as ndfrt,
case when flag is null then 0 else 1 end as gopher
from (
select to_number(s.ID) as set_term, to_number(t.ID) as term, '1' as flag
from lsimonaitis.pseudo_GOPHER_CONCEPT s
inner join lsimonaitis.pseudo_GOPHER_SET_PLUS r on (s.id=r.TARGET_ID)
inner join lsimonaitis.pseudo_GOPHER_CONCEPT t on (t.id=r.SOURCE_ID)
where t.isSet IS NULL
and exists (select 1 from lsimonaitis.set_definition sd where sd.TERM_ID = s.ID)
) full outer join lsimonaitis.set_extension_20090309 using(set_term, term)
(added distinct in the last query, for we cannot be sure that there aren't replicate links between the connected classes).
Analysis of Differences
select sum(ndfrt), sum(gopher) from lsimonaitis.set_diff_20090309
- NDFRT: 1912 now 1937 now 2089 now, after dedupe 2056
- GOPHER: 1229 now 1290 now 1320 now, after dedupe 1271
And:
select x.*, ndfrt - gopher from (
select set_term,
sum(ndfrt) as ndfrt, sum(gopher) as gopher
from lsimonaitis.set_diff_20090309
group by set_term
) x
- 65 sets we win
- 53 sets gopher wins
- there are 142 sets total (so the rest we are on par with Gopher)
(all unchanged after dedupe)
Spreadsheet for Review
This table is best for review:
select set_term, sc.name as set_name, x.term, dc.name, diff, sndfrt, ndfrt, (ndfrt * gopher) as agree, gopher, sgopher,
(CASE WHEN EXISTS (SELECT 1 FROM MAP_GOPHER_NDFRT m WHERE m.term = x.term) THEN 1 ELSE 0 END) as mapped
from (
select * from (
select x.*, sndfrt - sgopher as diff from (
select set_term,
sum(ndfrt) as sndfrt, sum(gopher) as sgopher
from lsimonaitis.set_diff_20090309
group by set_term
) x
) inner join lsimonaitis.set_diff_20090309 USING(set_term)
) x
inner join lsimonaitis.pseudo_gopher_concept sc ON(sc.id = x.set_term)
inner join lsimonaitis.pseudo_gopher_concept dc ON(dc.id = x.term)
order by diff, set_name, name
-- optionally filter by where agree = 0
(not regenerated after dedupe, as this was done outside on the current worksheet.)
Gopher Interaction Detection
As discussed here, Gopher has two places to store interaction information, a master RX_INTERACTIONS table and concept relationships (from term dictionary table). It turns out that the master RX_INTERACTIONS table is non-redundant (801 rows vs. ~5700) and actually has more interesting data.
It should be noted that the interaction table does not only relate 2 sets but can relate other things.
SELECT '||'||a.type||'||'||b.type||'||'||UPPER(x.type)||'||'||count(1)
FROM UMLS.GOPHER_RXINTERACTION x
INNER JOIN UMLS.GOPHER_CONCEPT a ON(a.id = x.drug_a)
INNER JOIN UMLS.GOPHER_CONCEPT b ON(b.id = x.drug_b)
GROUP BY ROLLUP(a.type, b.type, UPPER(x.type))
| DS | RX EFFECTS RX | 1 | |
| DS | 1 | ||
| 1 | |||
| DR | RX EFFECTS RX | 1 | |
| DR | 1 | ||
| DR | AN | 4 | |
| DR | AN | RX CAUSES DX | 1 |
| DR | AN | DX CHANGES RX | 54 |
| DR | AN | RX AGGREVATES DX | 86 |
| DR | AN | 145 | |
| DR | DR | 10 | |
| DR | DR | RX CAUSES DX | 1 |
| DR | DR | RX EFFECTS RX | 143 |
| DR | DR | RX AGGREVATES DX | 1 |
| DR | DR | 155 | |
| DR | DS | 16 | |
| DR | DS | RX CAUSES DX | 4 |
| DR | DS | RX AFFECTS RX | 3 |
| DR | DS | RX EFFECTS RX | 125 |
| DR | DS | 148 | |
| DR | 449 | ||
| DS | AN | 4 | |
| DS | AN | RX CAUSES DX | 7 |
| DS | AN | DX CHANGES RX | 24 |
| DS | AN | RX EFFECTS RX | 1 |
| DS | AN | RX AGGREVATES DX | 61 |
| DS | AN | 97 | |
| DS | DR | 18 | |
| DS | DR | RX AFFECTS RX | 2 |
| DS | DR | RX EFFECTS RX | 86 |
| DS | DR | 106 | |
| DS | DS | 38 | |
| DS | DS | RX CAUSES DX | 1 |
| DS | DS | RX AFFECTS RX | 5 |
| DS | DS | RX EFFECTS RX | 97 |
| DS | DS | RX CHANGES LAB | 2 |
| DS | DS | RX AGGREVATES DX | 1 |
| DS | DS | RX EFFECTS RX;RX CAUSES DX | 1 |
| DS | DS | RX EFFECTS RX;RX CHANGES LAB | 1 |
| DS | DS | RX AGGREVATES DX;RX CAUSES DX | 1 |
| DS | DS | 147 | |
| DS | NT | RX EFFECTS RX | 1 |
| DS | NT | 1 | |
| DS | 351 | ||
| 801 |
Especially where DR terms are direct participants, we need to expand them in our NDF-RT approach through ingredient anyway.
Test Interaction Detection
Here goes our first interaction detection query:
SELECT a.id, a.name, i.type, b.id, b.name, i.*
FROM (
SELECT --+FIRST_ROWS
a.code AS a_code, b.code AS b_code
FROM ALG_MED_ORDER a INNER JOIN ALG_MED_ORDER b USING(pid)
WHERE ABS(a.time - b.time) < 30
AND a.code < b.code
) x
INNER JOIN UMLS.GOPHER_CONCEPT a ON(a.id = x.a_code AND a.type IN ('DR','DS'))
INNER JOIN UMLS.GOPHER_CONCEPT b ON(b.id = x.b_code AND b.type IN ('DR','DS'))
INNER JOIN LSIMONAITIS.SET_EXTENSION sa ON(sa.term = x.a_code)
INNER JOIN LSIMONAITIS.SET_EXTENSION sb ON(sb.term = x.b_code)
ON( (i.drug_a = sa.set_term AND i.drug_b = sb.set_term)
OR (i.drug_b = sa.set_term AND i.drug_a = sb.set_term))
and lo and behold we find a whole bunch of stuff. But it's not pretty the way we narrow down the combinations and then check the interaction link both ways.
But first we meditate over the detail:
147 Morphine Inj NARCOTIC MEDS RX CAUSES DX 7721 Temazepam BENZODIAZEPINES Excessive respiratory and CNS depression
That type "RX CAUSES DX" is obviously wrong here. Seems like errors in the data entry to the knowledge base. Similarly:
17778 Benazepril Ace Inhibitors RX CHANGES LAB 8850 Triamterene-75/HCTZ-50 K+SPARERS Hyperkalemia
The "RX CHANGES LAB" is bad here. The issue is not that it changes lab (and thereby makes lab hard to interpret) but it is that it favors hyperkalemia, which is a "DX" not a "LAB", but "RX CAUSES DX" wouldn't be right either. It is the interaction situation which has the risk of hyperkalemia.
Evaluation
First we make an RX - RX co-incidence summary table from our test data
CREATE TABLE INT_RX_RX_SUMMARY AS
SELECT a_code, b_code, sum(icount) as icount, count(1) as pcount FROM (
SELECT pid, a.code AS a_code, b.code AS b_code, count(1) as icount
FROM ALG_MED_ORDER a INNER JOIN ALG_MED_ORDER b USING(pid)
WHERE ABS(a.time - b.time) < 1
AND a.code < b.code
AND EXISTS (SELECT 1 FROM UMLS.GOPHER_CONCEPT a WHERE a.id = x.a_code AND a.type IN ('DR','DS'))
AND EXISTS (SELECT 1 FROM UMLS.GOPHER_CONCEPT b WHERE b.id = x.b_code AND b.type IN ('DR','DS'))
GROUP BY pid, a.code, b.code
) GROUP BY a_code, b_code
and let's do the stronger criterion where the time difference needs to be less than one day.
Makes 173038 such summary record (oops, initailly forgot to limit to DR and DS terms only, i.e., added the AND EXISTS ... clauses later, that removes 491 items)
NDF RT
Let's make sure we have a reflexive closure of the set extension relationship. Really that makes it a class extension, not a set extension (a set is never included in itself, but it is true to say that a class A is an A, with classes the relationship is actually that of a subset-or-equal, not element-of.)
DROP TABLE LSIMONAITIS.SET_EXTENSION_0 GO CREATE TABLE LSIMONAITIS.SET_EXTENSION_0 AS SELECT to_number(set_term) AS set_term, to_number(term) as term FROM LSIMONAITIS.SET_EXTENSION UNION SELECT to_number(set_term), to_number(set_term) FROM LSIMONAITIS.SET_EXTENSION UNION SELECT to_number(term), to_number(term) FROM LSIMONAITIS.SET_EXTENSION
Then we do the interaction table:
DROP TABLE INT_ISSUES_NDF_SUMMARY
GO
CREATE TABLE INT_ISSUES_NDF_SUMMARY AS
SELECT a_code, sa.set_term AS a_class,
b_code, sb.set_term AS b_class,
icount, pcount
FROM INT_RX_RX_SUMMARY x
INNER JOIN LSIMONAITIS.SET_EXTENSION_0 sa ON(sa.term = x.a_code)
INNER JOIN LSIMONAITIS.SET_EXTENSION_0 sb ON(sb.term = x.b_code)
INNER JOIN UMLS.GOPHER_RXINTERACTION i ON(i.drug_a = sa.set_term AND i.drug_b = sb.set_term)
UNION
SELECT b_code, sb.set_term,
a_code, sa.set_term,
icount, pcount
FROM INT_RX_RX_SUMMARY x
INNER JOIN LSIMONAITIS.SET_EXTENSION_0 sa ON(sa.term = x.a_code)
INNER JOIN LSIMONAITIS.SET_EXTENSION_0 sb ON(sb.term = x.b_code)
INNER JOIN UMLS.GOPHER_RXINTERACTION i ON(i.drug_b = sa.set_term AND i.drug_a = sb.set_term)
It's very fast to run, less than a second. And finds 3694 kinds of interaction issues.
Gopher
Let's make sure we have a reflexive closure of the set relationship, see comment above on classes.
CREATE TABLE UMLS.GOPHER_SET_PLUS_0 AS SELECT * FROM UMLS.GOPHER_SET_PLUS UNION SELECT SOURCE_ID, SOURCE_ID, 0 FROM UMLS.GOPHER_SET_PLUS UNION SELECT TARGET_ID, TARGET_ID, 0 FROM UMLS.GOPHER_SET_PLUS
Now the interactions:
DROP TABLE INT_ISSUES_GPH_SUMMARY
GO
CREATE TABLE INT_ISSUES_GPH_SUMMARY AS
SELECT a_code, sa.target_id AS a_class,
b_code, sb.target_id AS b_class,
icount, pcount
FROM INT_RX_RX_SUMMARY x
INNER JOIN UMLS.GOPHER_SET_PLUS_0 sa ON(sa.source_id = x.a_code)
INNER JOIN UMLS.GOPHER_SET_PLUS_0 sb ON(sb.source_id = x.b_code)
INNER JOIN UMLS.GOPHER_RXINTERACTION i ON(i.drug_a = sa.target_id AND i.drug_b = sb.target_id)
UNION
SELECT b_code, sb.target_id,
a_code, sa.target_id,
icount, pcount
FROM INT_RX_RX_SUMMARY x
INNER JOIN UMLS.GOPHER_SET_PLUS_0 sa ON(sa.source_id = x.a_code)
INNER JOIN UMLS.GOPHER_SET_PLUS_0 sb ON(sb.source_id = x.b_code)
INNER JOIN UMLS.GOPHER_RXINTERACTION i ON(i.drug_b = sa.target_id AND i.drug_a = sb.target_id)
This time we get 1877 records, cool, that is again about half! Our better approach to everything seems to always increase our sensitivity over 2-fold.
Analysis of Differences
CREATE TABLE INT_DIFF_SUMMARY AS
SELECT a_code, a_class, b_code, b_class,
COALESCE(g.icount,0) AS g_icount, COALESCE(g.pcount,0) AS g_pcount,
COALESCE(n.icount,0) AS n_icount, COALESCE(n.pcount,0) AS n_pcount
FROM INT_ISSUES_NDF_SUMMARY n
FULL OUTER JOIN INT_ISSUES_GPH_SUMMARY g USING(a_code, a_class, b_code, b_class)
First let's do the numbers:
SELECT sum(g_icount), sum(g_pcount),
sum(n_icount), sum(n_pcount),
sum(n_icount - g_icount),
sum(n_pcount - g_pcount),
sum(c_count)
FROM (SELECT a_class, b_class,
sum(g_icount) AS g_icount,
sum(g_pcount) AS g_pcount,
sum(n_icount) AS n_icount,
sum(n_pcount) AS n_pcount,
count(1) AS c_count
FROM INT_DIFF_SUMMARY
GROUP BY a_class, b_class)
| Gopher | NDF-RT | Diff | |
| Issue instances | 147374 | 197456 | 50082 |
| Patients with same issue instance | 41891 | 64799 | 22908 |
Total number of kinds of issues: 4291
And now to view it all.
SELECT a_code, a.name AS a_name, a_class, ac.name AS a_class_name,
b_code, b.name AS b_name, b_class, bc.name AS b_class_name,
g_icount, g_pcount, n_icount, n_pcount,
n_icount - g_icount AS idiff,
n_pcount - g_pcount AS pdiff,
x.effect, x.management, x.mode_of_action, x.type, x.id, x.reference, x.study
FROM INT_DIFF_SUMMARY
INNER JOIN UMLS.GOPHER_CONCEPT a ON(a.id = a_code)
INNER JOIN UMLS.GOPHER_CONCEPT b ON(b.id = b_code)
INNER JOIN UMLS.GOPHER_CONCEPT ac ON(ac.id = a_class)
INNER JOIN UMLS.GOPHER_CONCEPT bc ON(bc.id = b_class)
INNER JOIN UMLS.GOPHER_RXINTERACTION x
ON( (x.drug_a = a_class AND x.drug_b = b_class)
OR (x.drug_b = a_class AND x.drug_a = b_class))
ORDER BY n_pcount - g_pcount DESC
That's a long table, why don't we roll it up by interaction classes:
SELECT a_class, ac.name AS a_class_name,
b_class, bc.name AS b_class_name,
g_icount, g_pcount, n_icount, n_pcount,
n_icount - g_icount AS idiff,
n_pcount - g_pcount AS pdiff,
x.effect, x.management, x.mode_of_action, x.type, x.id, x.reference, x.study
FROM (SELECT a_class, b_class,
sum(g_icount) AS g_icount,
sum(g_pcount) AS g_pcount,
sum(n_icount) AS n_icount,
sum(n_pcount) AS n_pcount,
count(1) AS c_count
FROM INT_DIFF_SUMMARY
GROUP BY a_class, b_class) x
INNER JOIN UMLS.GOPHER_CONCEPT ac ON(ac.id = a_class)
INNER JOIN UMLS.GOPHER_CONCEPT bc ON(bc.id = b_class)
INNER JOIN UMLS.GOPHER_RXINTERACTION x
ON( (x.drug_a = a_class AND x.drug_b = b_class)
OR (x.drug_b = a_class AND x.drug_a = b_class))
ORDER BY n_pcount - g_pcount DESC
Now it's also ordered by discrepancy first the ones where we win, then where Gopher wins. Wow, that is very interesting. But we are not quite done here. I think we need to weigh the difference by severity of interaction. We should probably go back to the list of effects and weigh them by seriousness: 1, 2 (similar to the AERS seriousness assessment.) We should also map the effects to MedDRA and SNOMED CT.
Move this to SPL Terminology
drop table umls.spl_interactions
go
create table umls.spl_interactions as
select a.QUERY_FORM_2 as aqf, a.PARAMETER_T as aqpt, a.PARAMETER_1 as aqp1, an.target_value as a_nui, amc.code as a_mesh,
b.QUERY_FORM_2 as bqf, b.PARAMETER_T as bqpt, b.PARAMETER_1 as bqp1, bn.target_value as b_nui, bmc.code as b_mesh,
effect, management, mode_of_action, reference, type
from umls.gopher_rxinteraction
inner join lsimonaitis.set_definition a on(a.term_id = drug_a)
inner join umls.ndfrt_concept ac on(ac.name = a.parameter_1)
inner join umls.ndfrt_role an on(an.source_id = ac.id and an.role_def_id = 'P262794')
left outer join umls.ndfrt_role amr on(amr.source_id = ac.id and amr.role_def_id = 'P135')
left outer join umls.mesh_chemical amc on(amc.code = amr.target_value)
inner join lsimonaitis.set_definition b on(b.term_id = drug_b)
inner join umls.ndfrt_concept bc on(bc.name = b.parameter_1)
inner join umls.ndfrt_role bn on(bn.source_id = bc.id and bn.role_def_id = 'P262794')
left outer join umls.ndfrt_role bmr on(bmr.source_id = bc.id and bmr.role_def_id = 'P135')
left outer join umls.mesh_chemical bmc on(bmc.code = bmr.target_value)
where a.PARAMETER_2 is null and b.parameter_2 is null
Now create the issues:
- subjectOf - substanceAdmin - consumable - materialKind - code
- subjectOf - subatanceAdmin - consumable - materialKind - code
- text combine text into one
Use NUI if no MeSH exists.
select case when a_mesh is null then a_nui else a_mesh end as a_code, aqp1 as a_displayName, case when b_mesh is null then b_nui else b_mesh end as b_code, bqp1 as b_displayName, case when effect is not null then 'Using '||aqp1||' with '||bqp1||' may cause '||effect||'.' end || case when management is not null then CHR(10)||management end || case when mode_of_action is not null then CHR(10)||'Mode of action: '||mode_of_action end as text from umls.spl_interactions
Closer Analysis of NDFRT queries
Attachments
- rxinteractions_sets_as_NDFRT_queries.xls (0.7 MB) - added by gschadow 18 months ago.
- rx_interactions_dated_20061221.xls (0.9 MB) - added by gschadow 18 months ago.
- Interaction Study Tally.2.xls (28.0 KB) - added by anonymous 18 months ago.
- Interaction Study Tally.xls (1.4 MB) - added by gschadow 18 months ago.
-
set-diff.xls
(344.0 KB) - added by gschadow
9 months ago.
The review of differences spreadsheet from 2009-03
-
set-diff2.xls
(355.0 KB) - added by gschadow
9 months ago.
using the PseudoGopher set definitions
-
drug-class-diff_modified_by_Linas_and_Gunther.xls
(1.2 MB) - added by gschadow
9 months ago.
Worksheet with analysis and some fixes
-
rxinteractions_sets_as_NDFRT_queries-2.xls
(0.8 MB) - added by gschadow
9 months ago.
Worksheet with analysis and some fixes
-
set-diff3.xls
(372.5 KB) - added by gschadow
9 months ago.
Final data with all 142 sets, definition forms for combinations
-
set-diff_20090312_1500.xls
(3.6 MB) - added by gschadow
8 months ago.
Final data, fixed, and with all counts, including weights from order counts.
![(please configure the [header_logo] section in trac.ini)](/mw/chrome/site/logo100.png)