CORPUS - BASED ANALYSIS OF TERM EXTRACTION FOR ENGLISH MEDICAL TEXTS

Hoàng Thị Khánh Tâm

Abstract


This is a methodological research that is conducted against the background of a context in which Content and Language Integrated Learning (CLIL for short) has been regarded as an innovative educational philosophy across Europe and it is to be adopted inVietnamby the year of 2020. It is a corpus-based study that employs the complementary searches with a focus on the search precision and recall values, based on two elements namely specialised occurrences (with prefixes in Stedman’s 2011 list) and frequency count (with a threshold at 12 times of appearance) to extract medical terms from 250 English medical texts that are included in the British Academic Written English (BAWE) corpus, which has been authorised to work on for the purpose of academic research. Thanks to the assistance of two free yet powerful statistical soft wares that are entitled AntConc and R (with logged instructions to be executed using the Text Mining package), a statistically workable definition of an English medical term is empirically established during the generation of a sample list of 45 items, with the validation carried out by 10 Vietnamese medical experts, both working in Vietnam and abroad, through an in-depth survey to analyse the key findings, followed by some pedagogical implications.


Keywords


CLIL, corpus, extract, medical terms, statistical software

Full Text:

PDF

References


Anthony, L. (2004). AntConc: A learner and classroom friendly, multi-platform corpus analysis toolkit. Proceedings of IWLeL 2004: An Interactive Workshop on Language e-Learning, pp. 7-13. Tokyo: Waseda University.

Ball, P. (2008). What is CLIL?. Retrieved on July 8, 2013, from http://www.onestopenglish.com.

Banay, G. L. (1948). An introduction to medical terminology I. Greek and Latin derivations. Bulletin of the Medical Library Association, 36(1), 1.

Bentley, K. (2010). The TKT course CLIL module. Cambridge: Cambridge University Press.

Bodenreider, O. (2004). The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research, 32, 267-270.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.

Ehrlich, A., & Schroeder, C. L. (2013). Medical terminology for health professions (7th ed.). Clifton Park, NY: Delmar, Cengage Learning.

Fabozzi, N. (2010). Kaiser’s donation of its convergent medical terminology dictionary puts the spotlight on the role of clinical terminology services in driving meaningful use of EHRs. Healthcare and Life Sciences, Frost and Sullivan.

Feinerer, I., & Hornik, K. (2016). tm: Text mining package. R package version 0.6. Retrieved from http://CRAN.R-project.org/package=tm.

Fletcher, W. H. (2007). Concordancing the web: promise and problems, tools and techniques. In M. Hundt, N. Nesselhauf & C. Biewer (Eds.), Corpus linguistics and the web (pp. 25-46). Amsterdam: Rodopi.

Flowerdew, L. (2008). Corpus-based analyses of the problem-solution pattern. Amsterdam: John Benjamins.

Gardner, S., & Nesi, H. (2012). A classification of genre families in university student writing. Applied Linguistics, 34(1), 1-29.

Geeraerts, D. (2010). The doctor and the Semantician. In D. Glynn & K. Fischer (Eds.), Quantitative methods in cognitive semantics: Corpus-driven approaches (pp. 63-78). Berlin: De Gruyter Mouton.

Gries, S. T. (2009). Quantitative corpus linguistics with R: A practical introduction. The United Kingdom: Taylor & Francis.

Hunston, S. (2002). Corpora in applied linguistics. Cambridge: Cambridge University Press.

Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27, 4-21.

Jalali, Z.S., Moini, M.R., & Arani, M.A. (2015). Structural and functional analysis of lexical bundles in medical research articles: A corpus-based study. International Journal of Information Science and Management, 13(1), 51-69.

Kennedy, G. (1998). An introduction to corpus linguistics. London: Longman.

Lindmark, K., Natt och Dag, J., & Willners, C. (2007). Lexical semantics for software requirements engineering – a corpus-based approach. In R. Facchinetti (Ed.), Corpus Linguistics 25 years on (pp. 365-385). Amsterdam: Rodopi.

Marco, L. (2000). Collocational frameworks in medical research papers. English for Specific Purposes, 19, 63-86.

Marsh, D. (2002). CLIL/EMILE - The European dimension: Actions, trends and foresight potential. Brussels, Belgium: The European Union.

McCray, A. T., & Nelson, S. J. (1995). The representation of meaning in the UMLS. Methods of information in Medicine, 34, 193-201.

Neufeld, S., Hancioğlu, N., & Eldridge, J. (2011). Beware the range in RANGE, and the academic in AWL. System, 39, 533-538.

Norvig, P. (2013). English letter frequency counts: Mayzner revisited or ETAOIN SRHLDCU. Retrieved on June 1, 2014 from http://norvig.com/mayzner.html.

Stedman, T. L. (2011). Stedman’s medical dictionary – illustrated in colour (28th ed.). Philadelphia, PA: Lippincott Williams & Wilkins.

Steiner, S. S. (2003). Quick medical terminology: A self-teaching guide (4th ed.). Hoboken, NJ: John Wiley & Sons.

Ting, Y-L. T. (2010). CLIL appeals to how the brain likes its information: Examples from CLIL-(Neuro)Science. International CLIL Research Journal, 1(3), 13-73.

Van de Craen, P. (2013). The emergence of a new paradigm. Approaches to language teaching and learning for multilingual education (December 18, 2013). Lecture conducted from Vrije Universiteit Brussel, Brussels, Belgium.

Venables, W. N., Smith, D. M., et al. (2016). An introduction to R - Notes on R: Programming environment for data analysis and graphics. Retrieved February 16, 2016 from CRAN.R-project.org.

Wang, J., Liang, S., & Ge, G. (2008). Establishment of a medical academic word list. English for Specific Purposes, 27, 442-458.

Wehrli, E., Seretan, V., & Nerima, L. (2010). Sentence analysis and collocation identification. Beijing: COLING Workshop on Multiword Expressions (MWE 2010).

Wermter, J. (2009). Collocation and term extraction using linguistically enhanced statistical methods. Thuringia, Germany: Friedrich Schiller University of Jena.

West, M. (1953). A general service list of English words with semantic frequencies and a supplementary word-list for the writing of popular science and technology. London: Longmans.

Williams, G. (2016). Hands – on data science with R: Text mining. Retrieved 16 February 2016 from Graham@togaware.com.

Xue, G., & Nation, I.S.P. (1984). A university word list. Language Learning and Communication, 3(2), 215-229.


Refbacks

  • There are currently no refbacks.