Recognizing speculative language in research texts
Supervisor(es): Wonsever, Dina - Minel, Jean-Luc
Resumen:
This thesis studies the use of sequential supervised learning methods on two tasks related to the detection of hedging in scientific articles: those of hedge cue identification and hedge cue scope detection. Both tasks are addressed using a learning methodology that proposes the use of an iterative, error-based approach to improve classification performance, suggesting the incorporation of expert knowledge into the learning process through the use of knowledge rules. Results are promising: for the first task, we improved baseline results by 2.5 points in terms of F-score by incorporating cue cooccurence information, while for scope detection, the incorporation of syntax information and rules for syntax scope pruning allowed us to improve classification performance from an F-score of 0.712 to a final number of 0.835. Compared with state-of-the-art methods, the results are very competitive, suggesting that the approach to improving classifiers based only on the errors commited on a held out corpus could be successfully used in other, similar tasks. Additionaly, this thesis presents a class schema for representing sentence analysis in a unique structure, including the results of different linguistic analysis. This allows us to better manage the iterative process of classifier improvement, where different attribute sets for learning are used in each iteration. We also propose to store attributes in a relational model, instead of the traditional text-based structures, to facilitate learning data analysis and manipulation.
2013 | |
Inglés | |
Universidad de la República | |
COLIBRI | |
https://hdl.handle.net/20.500.12008/34294 | |
Acceso abierto | |
Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) |