| Brief description of the program
|
NLProt is a tool for finding protein-names in natural language-text. It is based on Support Vector Machines (SVMs), which are trained on
contextual-features of named entities in scientific language. Additionally, simple filtering rules and a protein-name dictionary are used to increase performance.
NLProt reached a precicion (accuracy) of 70% at a recall (coverage) of 85% after running it on the 166 most recent abstracts of EMBL and Cell (Nov/Dec 2003). When run from
the command line, NLProt takes about 1 second per abstract to finish.
|