Typical approaches to automatic summarization make efforts to generate a coherent document by arranging the order of sentences according to certain criteria such as the publication date of the text in which the expression appears. However, when describing a gene, there is no obvious order whatsoever among the facts to be presented. In this work, while generating a summary about a gene, we actually create the order from the unordered set of facts, by introducing new sentences that make associations among the main concepts of those facts.
The rapidly growing volume of biological and medical scientific articles calls for automatic language processing and information extraction techniques to help people access useful information efficiently, specifically when it comes to finding relationships or interactions between molecular substances. In this paper we propose a framework of Biomedical Relationship Networks (BRNs), which aims to represent explicit and implicit biomedical relationships as well as other hidden information across the whole corpus. The knowledge contained in BRNs can be directly applied to information extraction and information retrieval tasks on the corpora of medical documents.
The purpose of a textual link is to provide a one-to-one connection between a term and a related data object. However, this link is insufficient to deal with the conceptual and complex terms that are often used to refer to multiple data objects from heterogeneous databases. In this paper, we present a method that can dynamically create a link to a biological term by automatically constructing a database query for a search into the corresponding data object(s). This method can help the user to quickly build a hypothesis based on data drawn from text, as well as to understand the text by providing an access to relevant information for its biological terms.
In the framework of zone analysis in biological texts, we are creating an annotation dataset of the Result section of 100 journal articles involving four annotators. This plays an important role in verifying our scheme in terms of the inter-annotator and intra-annotator agreement, and in preparing a good amount of training data for machine learning toward automatic annotation. In this paper, we discuss the design process of our dataset and the theorecical and practical issues identified through the comparative analysis of the first set of annotation results by the four annotators and through mutual feedback to our insights. The discussions here will make an important basis of the creation of a high-quality zone annotation dataset.
In this paper, we define a Combinatory Categorial Grammar (CCG) to model and predict RNA secondary structures. The proposed CCG can be used to capture various RNA secondary structures, including stem-loop and pseudoknot structures. We also argue that the CCG can be used to predict possibly unknown RNA secondary structures, for example an undiscovered structure ¡®ternary-pseudoknots¡¯.