Doc2Hpo

A webservice to extract human phenotype ontology terms from free-text based clinical notes or literatures.
Which parsing engine should I use?

Speed ranking (fast to slow): string-based (default) > MetaMap Lite > MetaMap = NCBO annotator > Ensemble

Recall rate (hige to low): Ensemble > MetaMap > MetaMap Lite = NCBO annotator > string-based (default)

In general, if the input is long (more than 1 WORD page), we suggest using default method or MetaMap Lite. If users want a better performance, please use the ensemble method

What is the performance of the full automated recognition?

Without negation function turn on, the precision is ~0.4 and the recall is ~0.7.

By turnning on the negation detection, the precision is ~0.7.

How does the negation detection work?

Negation detection is available by using Wendy Chapman's NegEx, which is a rule and keyword based method

How are the overlapped and repeated annotations processed?

If two annotations overlapped in the text, only the longest one will be used in display. But both will be stored in the JSON output.

If two annotations repeated, a random one will be picked. But both will be stored in the JSON output.

What is the ensemble method?

Ensemble method union the results generated from other parsinge engines.

What is the string-based method?

String-based method leveraging the Aho–Corasick algorithm for speedy concept extraction. The full input will be treated as one single string and search against all HPO terms and their synonyms under ‘phenotypic abnormality’ (HP:0000118).

by unchecking 'allow partial search', post-processing rules will be added to remove partial match, such as 'tic' in 'genetics'

What is the MetaMap method?

It splits the input into sentences, and then feed each sentence to a locally configured MetaMap server via the Java API.

MetaMap first identifies candidate clinical terms through lexical and syntactic analysis and maps them to standard UMLS concepts.

The UMLS concepts are then mapped to HPO concepts following the mapping at here.

What is the MetaMapLite method?

MetaMap is a fast version of MetaMap. It provide a near real-time named-entity recognizer which is not a rigorous as MetaMap but is much faster

Currenty, MetaMapLite does not support dynamic variant generation. Named Entities are found using longest match.

What is the NCBO annotator method?

It employs the online NCBO Annotator API for HPO concept recognition. Different options for NCBO Annotator are exposed to users via the Doc2Hpo interface to customize the parsing.

When the network is not good, it will cause certain delay if the input size is big