CORPUS - BASED ANALYSIS OF TERM EXTRACTION FOR ENGLISH MEDICAL TEXTS
Keywords:
CLIL, corpus, extract, medical terms, statistical softwareAbstract
This is a methodological research that is conducted against the background of a context in which Content and Language Integrated Learning (CLIL for short) has been regarded as an innovative educational philosophy across Europe and it is to be adopted in Vietnam by the year of 2020. It is a corpus-based study that employs the complementary searches with a focus on the search precision and recall values, based on two elements namely specialised occurrences (with prefixes in Stedman’s 2011 list) and frequency count (with a threshold at 12 times of appearance) to extract medical terms from 250 English medical texts that are included in the British Academic Written English (BAWE) corpus, which has been authorised to work on for the purpose of academic research. Thanks to the assistance of two free yet powerful statistical soft wares that are entitled AntConc and R (with logged instructions to be executed using the Text Mining package), a statistically workable definition of an English medical term is empirically established during the generation of a sample list of 45 items, with the validation carried out by 10 Vietnamese medical experts, both working in Vietnam and abroad, through an in-depth survey to analyse the key findings, followed by some pedagogical implications