This repository has been archived by the owner on Jan 10, 2023. It is now read-only.
forked from OHNLP/MedXN
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
111 lines (90 loc) · 6.05 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
############
Introduction
############
Medication Extraction and Normalization (MedXN, pronounced [med-eks-en]) is a Apache UIMA-based medication information extraction system that focuses on assigning the most specific RxNorm RxCUI to medication description. MedXN finds medication and its complete attributes and normalize them to the most specific RxNorm RxCUI using flexible matching, abbreviation expansion, inference, etc. MedXN uses externalized resources (ie, medication dictionary, attribute definitions, and regular expression attribute patterns) to allow a simple customization process for the needs of end users.
<MedXN High-level Algorithm>
Text: "Sulfasalazine [AZULFIDINE] 500-mg 2 tabs by mouth two times a day"
Step 1: Medication Extraction
Eg) Sulfasalazine [AZULFIDINE] RxCUI="9524::IN::202770::BN"
Step 2: Attribute Extraction
Eg) 500-mg (strength), 2 (dose), tabs (form), mouth (route), two times a day (frequency)
Step 3: Medication & Attribute Association
Eg) <Sulfasalazine [AZULFIDINE]> + <500-mg, 2, tabs, mouth, two times a day>
Step 4: Convert to RxNorm Standard
Eg) sulfasalazine <in>500 mg<st> oral tablet<df>azulfidine<bn>
Step 5: Convert to RxCUI Representation
Eg) 9524<in>500 mg<st>317541<df>202770<bn>
Step 6: Normalize to Specific RxCUI
Eg) Sulfasalazine 500 MG Oral Tablet [AZULFIDINE] RxCUI=208437::SBD
######################################
Sub directories included in MedXN v1.0
######################################
javasrc - source folder includes java source codes distributed
autosrc - source folder automatically generated by JCas_gen for the centralized type system
descsrc - source folder contains types and primitive analysis engines, collection readers, and cas consumers (not recommended to modify if you have limited knowledge with respect to UIMA)
desc - example descriptors for aggregate analysis engines and collection processing engines
medtaggerdescsrc -- source folders contains types and primitive analysis engines, collection readers, and cas consumers in MedTagger
lib - a collection of libraries that MedXN is based on
resources - resources required to run MedXN (including those for running MedTagger)
testdata - shipped test data
##########################
Installation and Execution
##########################
To install, download distribution and unzip the package
To execute MedXN for a collection of documents, simply go to MedXN installation home
and run runMedXNCVD.bat (runMedXNCVD.sh) or runMedXNCPE.bat (runMedXNCPE.sh)
which will test processable analysis engines and collection processing engines.
In Windows:
>java -Xms512M -Xmx2000M -cp resources;desc;descsrc;medtaggerdescsrc;MedXN-1.0.1.jar org.apache.uima.tools.cvd.CVD
>java -Xms512M -Xmx2000M -cp resources;desc;descsrc;medtaggerdescsrc;MedXN-1.0.1.jar org.apache.uima.tools.cpm.CpmFrame
In Unix/Linux:
>java -Xms512M -Xmx2000M -cp resources:desc:descsrc:medtaggerdescsrc:MedXN-1.0.1.jar org.apache.uima.tools.cvd.CVD
>java -Xms512M -Xmx2000M -cp resources:desc:descsrc:medtaggerdescsrc:MedXN-1.0.1.jar org.apache.uima.tools.cpm.CpmFrame
It will fire up UIMA Cas Visual Debugger (CVD) or the collection processing engine (CPE) GUI.
To visualize a specific aggregate engine through CVD, go to load AE under the Run menu, choose
$MedXNHOME/desc/medxndesc/aggregate_analysis_engine
To process a collection of documents, go to the FILE menu and open the corresponding CPE descriptor file
available in $MedXNHOME/desc/collection_processing_engine
##########################
MedXN v1.0 UIMA components
##########################
<MedXN descriptors>
Aggregate TAE
MedXNAggregateTAE.xml
Collection Processing Engine
MedXN_CPE.xml
Primary Annotators
ACLookupDrugAE.xml: extracts medication name
MedAttrAE.xml: extract medication attributes
MedExtAE.xml: associates medication name and its attributes
MedNormAE.xml: normalizes medication information to RxNorm standard
ACLookupDrugNormAE.xml: maps medication information to a specific RxNorm name
MedNormRxCUIAE.xml: convert medication information to RxCUI representation
ACLookupRxCUIDrugNormAE.xml: maps RxCUI-represented medication information to a specific RxNorm name
Cas Consumer
MedXNCC.xml: prints out results.
Parameters:
OutputFile – output file path and name
Delimiter – a delimiter of medication information in the output
Output format:
filename|medication::b::e|medication Rxcui|strength::b::e|dose::b::e|form::b::e|route::b::e|frequency::b::e|duration::b::e|specific RxNorm name|specific RxCUI|sentence (b: begin offset, e: end offset)
<MedXN dictionary resources>
- under /resources/medxnresources/lookup
RxNorm_BNIN.alphanum.BnInPinMinSyn.txt: a dictionary for medication names compiled from RxNorm ingredient and brand name (ie, IN, PIM, MIN, BN, and manually compiled abbreviations). Also, includes any other medication variations that have the same RxCUI as the above medications.
Format: medication name (lower-cased, non-alphanumeric replaced with space, tokens are separated by tap)|RxCUI|RxNorm term type|RxNorm name
Example: aspirin|1191|IN|Aspirin
RxNorm_Name.norm.txt: a dictionary of full medication descriptions complied from RxNorm SCDC, SCDF, SCD, SBDC, SBDF, SBD, and SY
Format: full medication description (lower-cased, [] removed, tokens are separated by tap)|RxCUI|RxNorm term type|RxNorm name
Example: aspirin 81 mg oral tablet|243670|SCD|Aspirin 81 MG Oral Tablet
RxCUI.norm.txt: RxCUI representation of RxName.norm.txt – ie, medication name and dose form are replaced with RxCUI
Format: RxCUI representation|RxCUI|RxNorm term type|RxNorm name
Example: 1191 81 mg 317541|243670|SCD|Aspirin 81 MG Oral Tablet
doseDict.txt: list of RxNorm dose forms and its RxCUI
Format: dose form (lower-cased)|RxCUI|RxNorm name
Example: oral tablet|317541|Oral Tablet
falseMedDic.txt: list of potential false medication – ie, these are in RxNorm but potentially false drugs in clinical notes
Format: lower-cased medication
Example: today
<Regular expressions file>
- under /resources/medxnresources
regExPatterns.txt: contains medication attribute patterns written in Java regular expression (includes usage descriptions).