Rose Acen Upor


This study combines language assessment processes and interlanguage analysis techniques to determine rater agreement and disagreement in assessing English article acquisition. Employing native English speaking and non-native English speaking raters, picture sequence narratives that were written by English as a Foreign Language (EFL) learners (n=97) were coded and scored for suppliance-in-obligatory context (SOC) and target-like utterance (TLU). Although the kappa statistic revealed a fair agreement between raters (0.17 – 0.33), content analysis methods revealed much higher agreement (88.29% - 94.07%). Furthermore, language background effects between the raters could not be substantiated however the results demonstrated a discernable disagreement pattern between them. Thus, the study recommends the inclusion of a foreign language teaching background as a factor for rater selection to minimize language background effects on rating language assessments.


Article acquisition; Inter-rater agreement; SOC; TLU; EFL; Inter-rater disagreement; Language background effects

Full Text:



Barnwell, D. (1989). ‘Naïve’ native speakers and judgments of oral proficiency in Spanish. Language Testing, 6, 152–163.

Bickerton, D. (1981). Roots of language. Ann Arbor, MI: Karoma.

Brown, A. (1995). The effect of rater variables in the development of an occupation- specific language performance test. Language Testing, 12, 1–15.

Byrt, T., Bishop, J. & Carlin, J.B. (1993). Bias, prevalence and kappa. Journal of Epidemiology, 46(5): 423-429.

Caban, H. L. (2003). Rater group bias in the speaking assessment of four L1 Japanese ESL students. Second Language Studies, 21, 1–44.

Celce-Murcia, M. & Larsen-Freeman, D. (1999). The grammar book: An ESL/EFL teacher’s course (2nd Ed.), Boston: Heinle & Heinle Publishers.

Chalhoub-Deville, M. & Wigglesworth, G. (2005). Rater judgment and English language speaking proficiency. World Englishes, 24, 383–391.

Chalhoub-Deville, M. (1995). Deriving oral assessment scales across different tests and rater groups. Language Testing, 12, 16–33.

Chierchia, G. (1998). Plurality of mass nouns and the notion of ‘semantic parameter’. In S. Rothstein (Ed.), Events and Grammar (pp 53-103). Kluwer: Dordrecht.

Crisp, V. (2008). Exploring the nature of examiner thinking during the process of examination marking. Cambridge Journal of Education, 38, 247–264.

Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7, 31–51.

Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. Modern Language Journal, 86, 67–96.

Dilin, L. & Gleason, J.L. (2002). Acquisition of the article the by non-native speakers of English: An analysis of four non-generic uses, Studies in Second Language Acquisition, 24(1), 1-26.

Elder, C., Barkhuizen, G., Knoch, U., & von Randow, J. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing, 24, 37–64.

Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J. (2005). Individual feedback to enhance rater training: Does it work? Language Assessment Quarterly, 2, 175–196.

Fayer, J. M. & Krasinski, E. (1987). Native and nonnative judgements of intelligibility and irritation. Language Learning, 37, 313–326.

Feinsten, A. R. & Chicchetti, D.V. (1990). High agreement but low kappa: The problems of two paradoxes, Journal of Clinical Epidemiology, 43, 543-548.

Flight, L., & Julious, S. A. (2015). The disagreeable behavior of the kappa statistic. Pharmaceutical Statistics, 14(1), 74-78. https://doi.org/10.1002/pst.1659

Galloway, V. B. (1980). Perceptions of the communicative efforts of American students of Spanish. Modern Language Journal, 64, 428–433.

Hadden, B. L. (1991). Teacher and nonteacher perceptions of second-language communication. Language Learning, 41, 1–24.

Hawkins, R., Al-Eid, S., Almahboob, I., Athanasopoulos, P., Chaengchenkit, R., Hu, J., Rezai, M., Jaensch, C., Jeon, Y., Leung, Y-K.I., Matsunaga, K., Ortega, M., Sarko, G., Snape, N. & Velasco-Zarate, K. (2006) Accounting for English article interpretation by L2 speakers. In Foster-Cohen, S.H., Medved Krajnovic, M. and Mihaljevic Djigunovic, J. (eds) EUROSLA Yearbook, Volume 6, 7-25.

Holsti, O. R. (1969). Content analysis for the social sciences and humanities, reading. MA: Addison-Wesley.

Huebner, T. (1985). System and variability in interlanguage syntax. Language Learning, 35, 141-163.

Huebner, T. (1983). A longitudinal analysis of the acquisition of English. Ann Arbor, MI: Karoma.

Huot, B. A. (1993). The influence of holistic scoring procedures on reading and rating student essays. In M. M. Williamson & B. A.

Huot (Eds.), Validating holistic scoring for writing assessment: Theoretical and empirical foundations (pp. 206–236). Cresskill, NJ: Hampton Press.

Ionin, T., Ko, H. & Wexler, K. (2004) Article semantics in L2 acquisition: The role of specificity. Language Acquisition, 12(1), 3-69.

Jaensch, C. (2008). L3 acquisition of articles in German by native Japanese speakers. In Proceedings of the 9th Generative Approaches to Second Language Acquisition Conference (GASLA 2007). Somerville, MA: Cascadilla Proceedings Project (Vol. 8189, No. 2009, p. L3).

Johnson, J. S., & Lim, G. S. (2009). The influence of rater language background on writing performance assessment. Language Testing, 26, 485–505.

Kaku, K. (2006). Second language learners’ use of English articles: A case of native speakers of Japanese. Cahiers Linguistiques d’Ottawa/Ottawa Papers in Linguistics, 34, 63-74.

Kim, Y.-H. (2009). An investigation into native and non-native teachers’ judgments of oral English performance: A mixed methods approach. Language Testing, 26, 187–217.

Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training?. Assessing Writing, 12, 26–43.

Lombard, M., Snyder-Duch, J., & Bracken, C. C. (2002). Content analysis in mass communication: Assessment and reporting of intercoder reliability. Human Communication Research, 28, 587-604.

Lu, C.F-C. (2001). The acquisition of English articles by Chinese learners, Second Language Studies, 20, 43-78.

Lumley, T. (2005). Assessing second language writing: The rater’s perspective. Frankfurt, Germany: Lang.

Lyons, C. (1999). Definiteness. Cambridge: Cambridge University Press.

Master, P. A. (1987). A cross-linguistic interlanguage analysis of the acquisition of the English article system (Doctoral dissertation, UCLA).

McHugh, M. L. (2012). Interrater reliability: The kappa statistic, Biochem Med (Zagreb), 22(3), 276-282.

McNamara, T. (1996). Measuring second language performance. New York, NY: Addison Wesley Longman Limited.

Milanovic, M., Saville, N., & Shuhong, S. (1996). A study of the decision-making behaviour of composition markers. In M. Milanovic & N. Saville (Eds.), Performance testing, cognition and assessment: Selected papers from the 15th Language Testing Research Colloquium (pp. 92–114). Cambridge, UK: Cambridge University Press.

Murphy, S. (1997). Knowledge and production of English articles by advanced second language learners, Unpublished doctoral dissertation, University of Texas at Austin.

Nickalls, R. (2013). Inter-rater reliability testing of article error tags: an argument for framework simplicity. Poster session presented at the Learner Corpus Research Conference, Bergen, Norway, Retrieved from


Norris, J. & Ortega, L. (2003). Defining and Measuring SLA. In C. J. Doughty & M.H. Long (Eds.) The Handbook of Second Language Acquisition (pp 717 – 760).


Ogawa, M. (2008). The acquisition of English articles by advanced EFL Japanese learners: Analysis based on noun types, Journal of Language and Culture Language and Information 3, 133-151,

Parrish, B. (1987). A new look at methodologies in the study of article acquisition for learners of ESL, Language Learning 37, 361-83.

Pica, T. (1983). Methods of morpheme quantification: Their effect on the interpretation of second language data. Studies in Second Language Acquisition, 6(1), 69-78.

Sakyi, A. A. (2000). Validation of holistic scoring for ESL writing assessment: How raters evaluate. In Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (Vol. 9, p. 129). Cambridge University Press.

Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25, 465–493.

Stolarova, M., Wolf, C., Rinker, T., & Brielmann, A. (2014). How to assess and compare inter-rater reliability, agreement and correlation of ratings: An exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs. Frontiers in psychology, 5, 509.

Tang, W., Hu, J., Zhang, H., Wu, P., & He, H. (2015). Kappa coefficient: A popular measure of rater agreement. Shanghai Archives of Psychiatry, 27(1), 62-67.


Tarone, E. (1985). Variability in interlanguage use: A study of style-shifting in morphology and syntax, Language Learning, 35, 373-404

Trademan, J. (2002). The acquisition of English article system by native speakers of Spanish and Japanese: a cross-linguistic comparison (Unpublished PhD dissertation, University of New Mexico).

Vaughan, C. (1991). Holistic assessment: What goes on in the rater’s mind? In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 111–125). Norwood, NJ: Ablex.

Wakabayashi, S. (1997). The acquisition of functional categories by learners of English (Unpublished doctoral dissertation, University of Cambridge).

Wang, W. (2011). A content analysis of reliability in advertising content analysis studies. Electronic Theses and Dissertations, p.1375. http://dc.etsu.edu/etd/1375

Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10, 305–335.

Yamada, J. (1982). The use of the English articles among Japanese students. RELC Journal, 13(1), 50-63.

Zdorenko, T. & Paradis, J. (2008). The acquisition of articles in child second language English: fluctuation, transfer or both?, Second Language Research, 24(2), 227-250.

DOI: https://doi.org/10.24071/llt.v24i1.2603

DOI (PDF): https://doi.org/10.24071/llt.v24i1.2603.g2198


  • There are currently no refbacks.

Copyright (c) 2021 Rose Acen Upor

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indexed and abstracted in:



LLT Journal Sinta Certificate (S3 = Level 3)

We would like to inform you that LLT Journal: A Journal on Language and Language Teaching has been nationally accredited Sinta 3 by the Ministry of Research Technology and Higher Education of the Republic of Indonesia based on the decree SK No. 30/E/KPT/2018. (Validity: Vol 20 No 1, 2017 till Vol 24 No 1, 2021 [5 years])



This work is licensed under CC BY-SA.

Creative Commons Attribution-ShareAlike 4.0 International License


Free counters!

 LLT Journal: A Journal on Language and Language Teaching is published twice a year, namely in April and October by the English Language Education Study Programme of Teacher Training and Education Faculty of Sanata Dharma University, Yogyakarta, Indonesia.