Text Analysis And Term Classification Process
The core of the system: Text Analysis and term classification process
The text analysis process is a complex operation and constitutes one of the key parts of the Tox-Hub (Figure 2) since it serves to build up the internal dictionary.
The word analysis and classification process is the most complex part of the Tox-Hub development, requiring skills and expertise that fall far beyond both toxicology and informatics. A too strict process (i.e., a too narrow “funnel” in Figure 2) would result in the loss of useful words, and conversely a too permissive process (broad funnel) would allow incorporating more words on the dictionary, but at the expense of some undesirable disturbing “noise”.
For that reason the analysis/classification process should be regarded as something subjected to permanent evolution and refinement by addition of new tools and procedures (or improvement of the already existing ones) that will be incorporated in forthcoming versions of Tox-Hub.
Some specific tools have been developed and internally implemented to help in the analysis and classification process:
- Identification of chemical names
- Identification of CAS numbers
- Identification of numbers and units
- On-line English Dictionary (Word Net, Princeton University)