2.2 Eligibility for HFrEF trials 57 Unstructured Information Management Architecture Ruta Rules16. Both steps were implemented within a natural language processing (NLP) tool known as CLAMP (Clinical Language Annotation, Modelling, and Processing).17 During manual annotation, we randomly sampled ten percent (n=37) of the phase III trials and annotated the clinical entities within the eligibility criteria. Relationships between interdependent entities such as a laboratory measurement and its value were also specified. We developed a first version of an annotation guide based on categories consistent with the Observational Health Data Sciences and Informatics Common Data Model health data standards.18 These categories were condition (includes diagnosis and medical history), measurement (numerical or categorical) and their corresponding values and units, demographic characteristic or drug. Information that are not present in routine medical records, for example, ability to comply with follow-up, are excluded from analysis. Manual text annotation was done within LANN, a team-based annotation tool that is compatible with CLAMP.19 We compared annotations between the two annotators (YMFL and WJW) and revised the guidelines iteratively. A set of gold standard criteria were determined based on the final agreement between annotators. Overall inter-annotator agreement was 0.615 for Cohen’s Kappa and 0.841 for F1 performance measure, an average of precision and recall which ranges between 0 and 1.20 The finalized annotations were used as input data for a ML model training using five-fold cross validation and conditional random fields algorithm from the CRFSuite library. F1 scores for condition, measurement, measurement value, demographics and drugs were 0.581, 0.646, 0.719, 0.364 and 0.448, respectively. Cross-checking with the original text was done up to a point where every relevant entity was extracted (in 40% of trials). The final structured clinical entities were then exported as individual text files for analysis in R. In calculating trial eligibility, binary criterion such as presence of comorbidities can be applied as an inclusion or exclusion criterion whereas continuous variables, i.e., laboratory, ECG, physical examination measurements were specified as ranges with minimum and maximum thresholds. Arbitrary limits of minimum of 0 and maximum of 2000 were used where upper and lower limits were not explicitly specified.21
RkJQdWJsaXNoZXIy MjY0ODMw