<?xml version="1.0" encoding="UTF-8"?><?xml-model type="application/xml-dtd" href="https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" specific-use="Marcalyc 1.3" dtd-version="1.3" article-type="research-article" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="index">3314</journal-id>
<journal-title-group>
<journal-title specific-use="original" xml:lang="en">RIED-Revista Iberoamericana de Educación a Distancia</journal-title>
<abbrev-journal-title abbrev-type="publisher" xml:lang="en">RIED</abbrev-journal-title>
</journal-title-group>
<issn pub-type="ppub">1138-2783</issn>
<issn pub-type="epub">1390-3306</issn>
<publisher>
<publisher-name>Asociación Iberoamericana de Educación Superior a Distancia</publisher-name>
<publisher-loc>
<country>España</country>
<email>ried@edu.uned.es</email>
</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="art-access-id" specific-use="redalyc">331483192028</article-id>
<article-id pub-id-type="doi">10.5944/ried.45486</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Estudios e investigaciones</subject>
</subj-group>
</article-categories>
<title-group>
<article-title xml:lang="en">Measuring writing skills in Spanish as a foreign language with generative artificial intelligence</article-title>
<trans-title-group>
<trans-title xml:lang="es">
<bold>Medición de la habilidad escrita en español como lengua extranjera con inteligencia artificial generativa</bold>
</trans-title>
</trans-title-group>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="no">
<contrib-id contrib-id-type="orcid">https://orcid.org/0009-0008-7052-7322</contrib-id>
<name name-style="western">
<surname>Cantero Romero</surname>
<given-names>María-Victoria</given-names>
</name>
<xref ref-type="aff" rid="aff1"/>
</contrib>
<contrib contrib-type="author" corresp="no">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-2874-0401</contrib-id>
<name name-style="western">
<surname>Martín-Valdivia</surname>
<given-names>María-Teresa</given-names>
</name>
<xref ref-type="aff" rid="aff2"/>
</contrib>
<contrib contrib-type="author" corresp="no">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-0440-6107</contrib-id>
<name name-style="western">
<surname>Ortiz-Colón</surname>
<given-names>Ana María</given-names>
</name>
<xref ref-type="aff" rid="aff3"/>
</contrib>
<contrib contrib-type="author" corresp="no">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-3274-8825</contrib-id>
<name name-style="western">
<surname>Jiménez-Zafra</surname>
<given-names>Salud María</given-names>
</name>
<xref ref-type="aff" rid="aff4"/>
</contrib>
</contrib-group>
<aff id="aff1">
<institution content-type="original">Universidad de Jaén, UJA</institution>
<country country="ES">España</country>
<institution-wrap>
<institution content-type="orgname">Universidad de Jaén, UJA</institution>
<institution-id institution-id-type="crossref">https://api.crossref.org/funders/501100007064</institution-id>
</institution-wrap>
</aff>
<aff id="aff2">
<institution content-type="original">Universidad de Jaén, UJA</institution>
<country country="ES">España</country>
<institution-wrap>
<institution content-type="orgname">Universidad de Jaén, UJA</institution>
<institution-id institution-id-type="crossref">https://api.crossref.org/funders/501100007064</institution-id>
</institution-wrap>
</aff>
<aff id="aff3">
<institution content-type="original">Universidad de Jaén, UJA</institution>
<country country="ES">España</country>
<institution-wrap>
<institution content-type="orgname">Universidad de Jaén, UJA</institution>
<institution-id institution-id-type="crossref">https://api.crossref.org/funders/501100007064</institution-id>
</institution-wrap>
</aff>
<aff id="aff4">
<institution content-type="original">Universidad de Jaén, UJA</institution>
<country country="ES">España</country>
<institution-wrap>
<institution content-type="orgname">Universidad de Jaén, UJA</institution>
<institution-id institution-id-type="crossref">https://api.crossref.org/funders/501100007064</institution-id>
</institution-wrap>
</aff>
<pub-date pub-type="epub-ppub">
<season>January-June</season>
<year>2026</year>
</pub-date>
<volume>29</volume>
<issue>1</issue>
<fpage>353</fpage>
<lpage>379</lpage>
<history>
<date date-type="received" publication-format="dd mes yyyy">
<day>01</day>
<month>06</month>
<year>2025</year>
</date>
<date date-type="accepted" publication-format="dd mes yyyy">
<day>09</day>
<month>09</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-year>2026</copyright-year>
<copyright-holder>Asociación Iberoamericana de Educación a Distancia (AIESAD)</copyright-holder>
<ali:free_to_read/>
<license xlink:href="https://creativecommons.org/licenses/by-nc/4.0/">
<ali:license_ref>https://creativecommons.org/licenses/by-nc/4.0/</ali:license_ref>
<license-p>Esta obra está bajo una Licencia Creative Commons Atribución-NoComercial 4.0 Internacional.</license-p>
</license>
</permissions>
<abstract xml:lang="en">
<title>Abstract</title>
<p>The emergence of Generative Artificial Intelligence (GAI)—particularly Large Language Models (LLMs) such as ChatGPT—is transforming the educational landscape, especially in the field of foreign language instruction. This article explores the potential of these technologies to automate the assessment of writing proficiency in Spanish as a Foreign Language (SFL), a task that is especially time-consuming at the beginning of university-level courses for Erasmus students. The study is based on three experiments conducted using the Spanish Learner Corpus compiled by the Instituto Cervantes. The first experiment applied a zero-shot learning approach by prompting the model with level descriptors from the Instituto Cervantes’s Curriculum Plan. In the second and third experiments, the model was adjusted through fine-tuning using 90% and 80% of the corpus, respectively, with the remaining data reserved for testing and validation. The results indicate that the fine-tuned models significantly outperform the zero-shot configuration in identifying the correct proficiency levels of learner texts. These findings demonstrate that LLMs can be effectively employed to streamline the initial placement process in SFL courses, thus reducing the workload of instructors and improving efficiency. The study concludes that GAI can serve as a valuable complementary tool in multilingual and multicultural educational settings, provided its use is guided by sound pedagogical principles.</p>
</abstract>
<trans-abstract xml:lang="es">
<title>Resumen</title>
<p>La irrupción de la Inteligencia Artificial Generativa (IAG), y en particular de los Modelos de Lenguaje de gran tamaño (LLMs) como ChatGPT, está transformando el ámbito educativo, especialmente en la enseñanza de lenguas extranjeras. Este artículo analiza el potencial de estas tecnologías para automatizar la evaluación de la competencia escrita en español como lengua extranjera (ELE), una tarea especialmente laboriosa al inicio de los cursos universitarios dirigidos a estudiantes Erasmus. La metodología se basa en tres experimentos con el Corpus de Aprendices de Español del Instituto Cervantes. En el primero, se utilizó la técnica de <italic>zero-shot learning</italic>, proporcionando al modelo un <italic>prompt </italic>con los descriptores del Plan Curricular del Instituto Cervantes. En el segundo y tercer experimentos, se ajustó el modelo mediante <italic>fine-tuning</italic> con el 90 % y el 80 % del corpus, respectivamente, reservando el resto para validación y prueba. Los resultados muestran que los modelos ajustados son capaces de identificar el nivel de competencia escrita con una precisión significativamente superior al enfoque sin entrenamiento previo. Estos hallazgos evidencian que los LLMs pueden emplearse para agilizar procesos de evaluación inicial en cursos de ELE, reduciendo la carga docente y mejorando la eficiencia. Se concluye que la IAG representa una herramienta complementaria valiosa en contextos educativos multiculturales y multilingües, siempre que su uso esté guiado por criterios pedagógicos sólidos.</p>
</trans-abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>language instruction</kwd>
<kwd>writing</kwd>
<kwd>artificial intelligence</kwd>
<kwd>teaching method innovations</kwd>
</kwd-group>
<kwd-group xml:lang="es">
<title>Palabras clave</title>
<kwd>enseñanza de lenguas</kwd>
<kwd>expresión escrita</kwd>
<kwd>inteligencia artificial</kwd>
<kwd>innovación pedagógica relevante</kwd>
</kwd-group>
<counts>
<fig-count count="4"/>
<table-count count="4"/>
<equation-count count="0"/>
<ref-count count="38"/>
</counts>
<custom-meta-group>
<custom-meta>
<meta-name>How to cite</meta-name>
<meta-value>Cantero Romero, M.-V., Martín-Valdivia, M.-T., Ortiz-Colón, A. M., &amp; Jiménez-Zafra, S. M. (2026). Measuring writing skills in Spanish as a foreign language with generative artificial intelligence [Medición de la habilidad escrita en español como lengua extranjera con inteligencia artificial generativa]. <italic>RIED-Revista Iberoamericana de Educación a Distancia, 29</italic>(1), 353-379. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5944/ried.45486">https://doi.org/10.5944/ried.45486</ext-link>
</meta-value>
</custom-meta>
</custom-meta-group>
<custom-meta-group>
<custom-meta>
<meta-name>redalyc-journal-id</meta-name>
<meta-value>3314</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec>
<title>
<bold>INTRODUCTION</bold>
</title>
<p>The evolution of Artificial Intelligence has marked a turning point in all areas, especially in education (<xref ref-type="bibr" rid="redalyc_331483192028_ref1">Aparicio Gómez, 2023</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref19">Hernández-León &amp; Rodríguez-Conde, 2024</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref36">Zambrano Campozano, 2025</xref>). Advances in this discipline are opening up a field of new research and work with which to implement Artificial Intelligence, specifically generative AI, in the classroom. As <xref ref-type="bibr" rid="redalyc_331483192028_ref6">Bolaño-García and Duarte-Acosta (2024)</xref> point out, generative Artificial Intelligence has gained attention in education because it improves the personalisation of learning and real-time feedback. <xref ref-type="bibr" rid="redalyc_331483192028_ref38">Zapata Ros (2024)</xref> supports this idea of personalisation in addition to the availability of information. Likewise, <xref ref-type="bibr" rid="redalyc_331483192028_ref16">Fajardo et al. (2023)</xref> highlight the use of these tools for personalising learning in university education, adapting it to the preferences and needs of each student through guided and virtual tutorials. <xref ref-type="bibr" rid="redalyc_331483192028_ref18">García-Peñalvo et al. (2024)</xref> point out the importance of preparing both teachers and learners in the use of generative artificial intelligence, as it will be present in all aspects of life. Furthermore, as <xref ref-type="bibr" rid="redalyc_331483192028_ref25">Moreno (2019)</xref> notes, it is important to highlight the potential of generative Artificial Intelligence to transform education by creating adaptive learning environments tailored to student performance, such as for students with special educational needs. In their study on generative Artificial Intelligence tools, <xref ref-type="bibr" rid="redalyc_331483192028_ref2">Area-Moreira et al. (2024)</xref>, in addition to the functions already mentioned, indicate that these tools can be used to automate tasks to support teachers in their work and even as anti-plagiarism systems. In line with the above, <xref ref-type="bibr" rid="redalyc_331483192028_ref12">Chan and Tsi (2023)</xref> point to the use of generative Artificial Intelligence as a supplementary tool for teachers and not as a replacement for them. <xref ref-type="bibr" rid="redalyc_331483192028_ref25">Moreno (2019)</xref> also points to three approaches to work with in education: generative Artificial Intelligence, educational robotics, and self-learning platforms. We will focus on the first of them in this study. Both <xref ref-type="bibr" rid="redalyc_331483192028_ref3">Barroso-Osuna and Cabero-Almenara (2025)</xref> and <xref ref-type="bibr" rid="redalyc_331483192028_ref27">Owan et al. (2023)</xref> identify a number of benefits of using Artificial Intelligence, specifically generative AI, in education, including the optimisation of teaching time and automated and accurate assessment. In this regard, <xref ref-type="bibr" rid="redalyc_331483192028_ref15">Crespo Mendoza et al. (2024)</xref> point out that it can improve the accuracy and reliability of assessments.</p>
<p>As mentioned above, the main focus of this work is generative Artificial Intelligence, specifically Language Models (LLMs), which have provided a new tool for conducting studies aimed at improving teaching tasks. LLMs are natural language generation tools trained with a large amount of text (<xref ref-type="bibr" rid="redalyc_331483192028_ref34">Wang, 2024</xref>). <xref ref-type="bibr" rid="redalyc_331483192028_ref18">García-Peñalvo et al. (2024)</xref> identify various functions that LLMs can perform, such as supporting research, creating educational content, generating chatbots to interact with students by offering self-directed feedback, complementing search engines, paraphrasing text, teaching languages, and generating exams and questionnaires. It is in connection with these last two functions that the present study is framed: placement in language teaching, and specifically in Spanish as a foreign language.</p>
<p>Generative Artificial Intelligence offers the opportunity to carry out adaptive assessments for each student with immediate and specific feedback, suggesting possible solutions (<xref ref-type="bibr" rid="redalyc_331483192028_ref3">Barroso-Osuna &amp; Cabero-Almenara, 2025</xref>). Likewise, Language Models can be used to automate (<xref ref-type="bibr" rid="redalyc_331483192028_ref17">García-Peñalvo, 2024</xref>) the correction of both multiple-choice tests and open-ended responses (<xref ref-type="bibr" rid="redalyc_331483192028_ref25">Moreno, 2019</xref>). The advantages of using LLMs for assessment include efficiency (<xref ref-type="bibr" rid="redalyc_331483192028_ref2">Area-Moreira et al., 2024</xref>) and plagiarism detection (<xref ref-type="bibr" rid="redalyc_331483192028_ref18">García-Peñalvo et al., 2024</xref>). <xref ref-type="bibr" rid="redalyc_331483192028_ref18">García-Peñalvo et al. (2024)</xref> also point out that LLMs improve teacher productivity.</p>
<p>With regard to the use of Language Models and language teaching, their usefulness is backed by various studies (<xref ref-type="bibr" rid="redalyc_331483192028_ref4">Baskara &amp; Mukarto, 2023</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref31">Salguero Romero, 2023</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref34">Wang, 2024</xref>). <xref ref-type="bibr" rid="redalyc_331483192028_ref4">Baskara and Mukarto (2023)</xref> point out how ChatGPT is capable of generating realistic texts that bring students closer to the reality of the language. On the other hand, <xref ref-type="bibr" rid="redalyc_331483192028_ref20">Hong (2023)</xref> points out the advantage of being able to use these tools to speed up exam correction, freeing teachers from workload and giving them the opportunity to focus on lesson preparation.</p>
<p>One of the challenges of language teaching, and in this case, teaching Spanish as a foreign language, is the appropriate level assessment of students when they begin Spanish courses. It is essential for a student to be at the right level of Spanish learning in order to progress appropriately, since if they are placed at a higher level than what is appropriate for them, they may become frustrated, and conversely, if they are placed at a lower level, they may lack motivation. That is why level assessment is key when starting language courses.</p>
<p>Until now, this level assessment has been carried out through multiple-choice tests or interviews (<xref ref-type="bibr" rid="redalyc_331483192028_ref5">Biedma Torrecillas et al., 2012</xref>). These tests mainly focus on determining the student’s level of oral and written expression and comprehension. Existing Spanish level tests, such as the Instituto Cervantes level test, consist of answering a series of written multiple-choice questions (true-false; matching or ordering) of increasing difficulty (<xref ref-type="bibr" rid="redalyc_331483192028_ref11">Centro Virtual Cervantes, n.d.</xref>). This same typology is currently observed in international contexts, such as in the Spanish level tests at Columbia University and the University of Wisconsin–Madison, which are also based on multiple-choice exercises without including free written production (<xref ref-type="bibr" rid="redalyc_331483192028_ref13">Columbia University, n.d.</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref33">University of Wisconsin–Madison, n.d.</xref>). However, Spanish language proficiency tests have not focused directly on written expression tests involving the writing of complete texts, but rather indirectly, through item responses, due to the limitations this entails, such as the lack of immediacy or the complexity of level assessment processes when dealing with large groups.</p>
<p>To solve this problem, this study will focus on the level assessment of the written expression test. As mentioned above, Language Models are capable of automating the grading of written work (<xref ref-type="bibr" rid="redalyc_331483192028_ref18">García-Peñalvo et al., 2024</xref>), and they do so quickly, saving teachers time (<xref ref-type="bibr" rid="redalyc_331483192028_ref2">Area-Moreira et al., 2024</xref>). This is the main reason why written expression tests have not been included in current level tests.</p>
<p>In this context, it is necessary to establish a conceptual basis for understanding the strengths and limitations of automated writing assessment and, in particular, the linguistic foundations that underpin the classification of levels A1–C1. The following section develops this theoretical framework, which will serve as a basis for the methodological proposal of this study.</p>
<p>This article is structured as follows. After the introduction and theoretical framework, the technological basis and choice of model, the objectives of the study and the methodology are presented. This is followed by the results, followed by a section on the pedagogical relevance of the model. Finally, the article concludes with a discussion and conclusions, to which a subsection on ethical aspects and licences for use has been added.</p>
</sec>
<sec>
<title>
<bold>THEORETICAL FRAMEWORK</bold>
</title>
<p>This section begins with a review of the state of the art in automated writing assessment and the linguistic foundations underpinning the level classification of the Common European Framework of Reference (CEFR) and the Instituto Cervantes Curriculum Plan (PCIC). This conceptual basis will serve to contextualise the study proposal and, finally, to present the objectives guiding the research.</p>
<sec>
<title>
<bold>Automated handwriting assessment: current developments and approaches</bold>
</title>
<p>The automatic classification of learners' texts using language models is part of a broader tradition of automated writing assessment (AWE), the development of which has given rise to multiple tools and systems that should be taken into account in order to contextualise this work. AWE has evolved considerably in recent decades, becoming an increasingly common tool in educational contexts. Pioneering systems such as e-rater (<xref ref-type="bibr" rid="redalyc_331483192028_ref8">Burstein et al., 2003</xref>), developed by ETS, have been widely used in standardised tests, using linguistic, grammatical and discursive metrics to estimate textual quality. Coh-Metrix (<xref ref-type="bibr" rid="redalyc_331483192028_ref23">McNamara et al., 2014</xref>), meanwhile, allows for detailed analysis of cohesion, syntactic complexity and readability, providing a multifactorial approach to written discourse. Recent studies, such as that by <xref ref-type="bibr" rid="redalyc_331483192028_ref37">Zhang (2021)</xref>, offer systematic reviews of these systems, highlighting their transition from rule-based approaches to models driven by machine learning and natural language processing. Along the same lines, <xref ref-type="bibr" rid="redalyc_331483192028_ref35">Wang et al. (2022)</xref> analyse current approaches to the evaluation of argumentative texts, focusing on discursive components such as reasoning structure, evidence, and organisation. Other proposals, such as Writing Mentor (<xref ref-type="bibr" rid="redalyc_331483192028_ref7">Burstein et al., 2018</xref>), integrate automatic assessment with formative feedback, promoting self-regulation processes in academic writing. Likewise, tools such as Write &amp; Improve, developed by the University of Cambridge, exemplify how it is possible to provide immediate automatic feedback on texts produced by foreign language learners, facilitating autonomous and guided learning (<xref ref-type="bibr" rid="redalyc_331483192028_ref9">Cambridge English, n.d.</xref>).</p>
</sec>
<sec>
<title>
<bold>Linguistic foundations of the A1–C1 level classification</bold>
</title>
<p>The classification of texts produced by learners of Spanish as a foreign language at levels A1–C1 is based on the descriptors established by the Common European Framework of Reference for Languages (<xref ref-type="bibr" rid="redalyc_331483192028_ref14">Council of Europe, 2002</xref>) and the Instituto Cervantes Curriculum Plan (<xref ref-type="bibr" rid="redalyc_331483192028_ref21">Instituto Cervantes, 2006</xref>). These documents define in detail the linguistic, pragmatic and sociolinguistic competences associated with each level, providing a solid basis for assessment.</p>
<p>At level A1, the lexical repertoire is very limited and restricted to basic transactions and everyday expressions. Texts are very short and simple, with simple sentences and a low average number of words per utterance. The use of regular forms of the present indicative predominates and an elementary grammatical repertoire is employed (<xref ref-type="bibr" rid="redalyc_331483192028_ref14">Council of Europe, 2002</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref21">Instituto Cervantes, 2006</xref>).</p>
<p>At level A2, learners can produce short texts that convey simple information on familiar topics. A larger lexical repertoire and slightly more complex structures are observed, incorporating past indicative tenses (preterite, imperfect and indefinite) and some irregular forms of the present tense. The use of the affirmative imperative also appears (<xref ref-type="bibr" rid="redalyc_331483192028_ref14">Council of Europe, 2002</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref21">Instituto Cervantes, 2006</xref>).</p>
<p>At level B1, the lexical repertoire is broader and allows for the creation of texts that fulfil a specific communicative task, maintaining a coherent structure. Grammatically, tenses such as the simple future, the simple conditional and the past perfect are handled with a certain fluency, in addition to introducing the present subjunctive and the negative imperative. Discourse shows greater cohesion and a wider variety of connecting devices (<xref ref-type="bibr" rid="redalyc_331483192028_ref14">Council of Europe, 2002</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref21">Instituto Cervantes, 2006</xref>).</p>
<p>At level B2, the user has a broad and precise linguistic repertoire, capable of sustaining complex arguments and detailed descriptions. Compound and subordinate clauses are used fluently, as well as confident use of indicative tenses (present, past, future and conditional) and subjunctive tenses (present, imperfect, perfect and pluperfect). Textual cohesion is consistent and lexical nuances appropriate to different registers are used (<xref ref-type="bibr" rid="redalyc_331483192028_ref14">Council of Europe, 2002</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref21">Instituto Cervantes, 2006</xref>).</p>
<p>At level C1, the linguistic and non-linguistic repertoire is sufficiently broad and flexible to handle any type of communicative transaction or interaction, even in demanding academic or professional contexts. The learner is able to produce long, complex texts with a clear, well-organised structure, using all tenses accurately and a wide range of syntactic and lexical resources (<xref ref-type="bibr" rid="redalyc_331483192028_ref14">Council of Europe, 2002</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref21">Instituto Cervantes, 2006</xref>).</p>
</sec>
</sec>
<sec>
<title>
<bold>TECHNOLOGICAL BASIS AND MODEL SELECTION</bold>
</title>
<p>This study used version 3.5 of the ChatGPT language model developed by OpenAI, released in November 2022 (<xref ref-type="bibr" rid="redalyc_331483192028_ref26">OpenAI, 2022</xref>). Although this version is not open access, it allows fine-tuning through OpenAI’s API. The choice of version 3.5 over the latest version is justified by the possibility of performing this process, which is not possible in the latest version of ChatGPT.</p>
<p>For this research, GPT-3.5 was chosen over more recent or open-source models due to a combination of technical, economic, and methodological factors. As this is a novel task—the automatic level assessment of written proficiency in Spanish as a foreign language—it was considered appropriate to evaluate the performance of a widely tested generalist model, such as GPT-3.5, both in its base configuration and through fine-tuning. This model, accessible through the OpenAI API, does not require advanced computational infrastructure and offers a robust architecture with good coverage of Spanish (<xref ref-type="bibr" rid="redalyc_331483192028_ref22">Li, 2023</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref29">Pourpanah et al., 2023</xref>). In addition, it offers an adequate balance between performance, cost and response speed, key factors for validating the viability of the approach in this exploratory phase (<xref ref-type="bibr" rid="redalyc_331483192028_ref30">Roumeliotis et al., 2024</xref>). In contrast, at the time the experiments were carried out, models such as GPT-4 involved a considerably higher economic cost and longer inference times, which reinforced the decision to use GPT-3.5 as the initial reference.</p>
<p>However, it should be noted that GPT-3.5 has limitations compared to more recent models, such as GPT-4, which offer more accurate contextual understanding, training with larger and more heterogeneous datasets, and greater reasoning ability. These characteristics make them particularly suitable candidates for future research, both in the automated assessment of written competence aligned with CEFR levels and in the generation of high-quality synthetic corpora for training specialised systems.</p>
</sec>
<sec>
<title>
<bold>STUDY OBJECTIVE</bold>
</title>
<p>To carry out this study, the descriptors established by the Common European Framework of Reference for Languages (<xref ref-type="bibr" rid="redalyc_331483192028_ref14">Council of Europe, 2002</xref>) and the Instituto Cervantes Curriculum Plan have been adapted in order to establish clear and operational criteria that allow a language model, in this case ChatGPT, to automatically classify the written productions of Spanish learners according to their level of competence.</p>
<p>Despite the aforementioned advances in automatic writing assessment, few studies have focused on the level assessment of written texts produced by foreign language learners and, specifically, on the specific case of teaching Spanish as a foreign language. Existing research tends to address automatic feedback or grading, but not the classification of written work according to CEFR or Instituto Cervantes Curriculum levels, especially in initial assessment tasks. This gap is particularly relevant in contexts such as universities, where the heterogeneity of international students, as in Erasmus programmes, requires efficient tools for level assessment. This study proposes an innovative solution based on generative language models (ChatGPT), which allows for the automatic classification of written texts by learners of Spanish as a foreign language (ELE) according to their level of proficiency, reducing the teaching load and improving the management of workload at the start of the academic year. Thus, this work not only complements previous research focused on textual improvement, but also broadens the scope of automated assessment to include initial diagnostic tasks in second language teaching contexts.</p>
<p>That is why, as the general objective of the study, we have set out to evaluate the effectiveness of the ChatGPT language model as an innovative tool for assessing the written expression level of Spanish learners. We also hope to achieve a series of specific objectives: i) to find out whether ChatGPT, with its prior training, is capable of carrying out level assessment adequately; ii) to verify whether ChatGPT is capable of level assessment if it is adjusted with a corpus levelled with the Instituto Cervantes Curriculum Plan and the Common European Framework of Reference for Languages; and iii) to determine the impact of ChatGPT on the efficiency of the level assessment of Spanish as a foreign language.</p>
</sec>
<sec>
<title>
<bold>METHODOLOGY</bold>
</title>
<p>To carry out this research, we used the Corpus of Spanish Learners (hereinafter CAES) (<xref ref-type="bibr" rid="redalyc_331483192028_ref28">Palacios Martínez et al., 2019</xref>), developed by the Instituto Cervantes in collaboration with the University of Santiago de Compostela. We also used version 3.5 of the ChatGPT language model, developed by OpenAI (<xref ref-type="bibr" rid="redalyc_331483192028_ref26">OpenAI, 2022</xref>).</p>
<p>Three types of experimental tests were carried out in this study: one using the zero-shot learning technique and the other two using model fine-tuning.</p>
<p>The fine-tuning procedure was carried out with a single training epoch, keeping the rest of the parameters at their default values according to the OpenAI API. To ensure reproducibility, a fixed random seed (value 42) was used, i.e., a reference value that allows experiments to be reproduced with the same results every time. No class balancing techniques were applied, as it was decided to preserve the actual distribution of the corpus, thus reflecting authentic levelling conditions in the classroom. In this way, the system's results are better suited to the challenges of real educational scenarios, without introducing artificial modifications to the representation of levels. However, it is recognised that in future research, compensation strategies could be applied to compare their effect on the fairness and robustness of the model. Cross-validation was also not used, in line with the initial and experimental nature of the study.</p>
<p>Regarding the technical parameters of the training, the default values of OpenAI’s API were maintained in aspects such as batch size (i.e., the number of examples processed at a time), learning rate (which indicates the speed at which the model adjusts its parameters during training), and loss functions (metrics that measure the difference between the model's prediction and the expected result). No additional regularisation techniques or early stopping strategies (early termination of training to avoid overfitting) were applied, as the main objective was to validate the feasibility of the approach rather than to optimise the model for maximum performance. This decision is in line with the exploratory nature of the research, which aims to test the applicability of the model to the task of automatic text levelling.</p>
<sec>
<title>
<bold>CAES Corpus</bold>
</title>
<p>The CAES corpus was compiled by the University of Santiago de Compostela and funded by the Instituto Cervantes. Computerised data was collected from October 2011 to December 2020 from centres, mostly universities, in different countries such as Spain, the United States, Brazil, Egypt, Ireland and Portugal. The students who took part in the project had eleven different native languages (English, Mandarin Chinese, Portuguese, Arabic, Russian, German, French, Greek, Italian, Japanese and Polish).</p>
<p>This study used version 2.1 of the CAES corpus from March 2022, which updated the first data collection, which had a smaller amount of data, with a total of 1,423 participants, compared to the 2,544 participants in the 2022 update.</p>
<p>The corpus contains examples of different levels of Spanish according to the Common European Framework of Reference for Languages, from A1 to C1. At levels A1, A2 and B1, texts belonging to three different types of tasks that students had to write, ranging from 30 to 200 words in length, were collected. Levels B2 and C1 have a sample of two tasks per level, ranging from 275 to 500 words in length.</p>
<p>The topics identified at each level are as follows: At A1, the first task consists of an email about changing jobs, with a total of 728 samples; the second task consists of an email about their family, with 703 samples; the last task consists of a note about being late, with a sample of 705. At level A2, the first task is a biography, with 673 samples; the second task is booking a hotel room, with a sample of 603 texts; and the last task is writing a postcard about your holidays, with a sample of 701 texts. As it can be seen, around 2,000 examples were collected per level at the initial levels, providing a significant sample of these levels. Likewise, the texts’ themes are matched to the functions and corresponding textual products of each level according to the Instituto Cervantes Curriculum Plan (<xref ref-type="bibr" rid="redalyc_331483192028_ref21">Instituto Cervantes, 2006</xref>).</p>
<p>At level B1, there are also three different tasks. The first is to write a letter to a friend, with a sample of 528 texts; the second task is to write an email about a complaint to an airline, with a sample of 454; and the last task is to narrate a story, with a sample of 382. It should be noted that these tasks, like those at levels A1 and A2, are matched to the functions described in the Curriculum Plan.</p>
<p>At levels B2 and C1, the tasks are reduced to two. At level B2, the first task is to write an application for admission, with a total of 375 samples; while the second task at level B2 is to write a text arguing the case for smoking in public places, with a sample size of 356.</p>
<p>At level C1, the first task consists of writing a complaint to a gas company, with a sample of 169; and in the second task, they must write a film review, with a sample of 184 texts. It can be seen how the sample size is significantly reduced at these levels as the number of tasks is reduced. It is also notable how the samples are smaller at level C1, with fewer than 400 samples at this level.</p>
<p>As in the previous levels, it can be seen that the tasks in levels B2 and C1 also correspond to the functions described in the Curriculum Plan.</p>
<p>The corpus was annotated by specialists in teaching Spanish as a foreign language at the University of Santiago de Compostela. Each text was classified into one of the CEFR levels according to the criteria established in the Instituto Cervantes's Curriculum and the guidelines defined in the project itself (<xref ref-type="bibr" rid="redalyc_331483192028_ref28">Palacios Martínez et al., 2019</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref32">University of Santiago de Compostela, n.d.</xref>). To ensure the reliability of the classification, the texts were evaluated independently by several annotators and then reviewed jointly until a consensus was reached. This procedure ensures that level assessment is coherent and consistent, making the corpus a solid resource for research and automatic assessment of written production.</p>
<p>
<xref ref-type="table" rid="gt1">Table 1</xref> presents a summary of the levels with respect to the tasks addressed and the total number of samples collected.</p>
<p>
<xref ref-type="bibr" rid="redalyc_331483192028_ref10">Cantero (2024)</xref> also conducted a study of this corpus, from which the following results can be drawn. At level A1, the average number of words per sentence is between 10.9 and 11.7, and the most frequent words are simple connectors and a limited lexicon. At level A2, the average number of words per sentence is higher, ranging from 12.5 to 14.6, with a simple, frequent lexicon. At the next level, B1, the average number of words per sentence increases to 12.0-16.5, and the vocabulary is broader and more complex than at previous levels. At level B2, the average number of words per sentence is even more complex, ranging from 17.7 to 21.5, and in terms of vocabulary, more complex connectors and more specialised vocabulary are used. Finally, at level C1, the average number of words per sentence is between 20.7 and 23.3, with complex and varied vocabulary.</p>
<p>
<table-wrap id="gt1">
<label>Table 1</label>
<caption>
<title>
<italic>Sample summary</italic>
</title>
</caption>
<alt-text>Table 1 Sample summary</alt-text>
<alternatives>
<graphic xlink:href="331483192028_gt2.png" position="anchor" orientation="portrait"/>
<table style="border-collapse:collapse;border:none" id="gt2-526564616c7963">
<thead style="display:none;">
<tr style="display:none;">
<th style="display:none;"/>
</tr>
</thead>
<tbody>
<tr style="height:16.95pt">
<td style="width:2.0cm;border-top:solid windowtext 1.0pt;border-left:   none;border-bottom:solid windowtext 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;   height:16.95pt">
<bold>Level</bold>
</td>
<td style="width:233.9pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:16.95pt">
<bold>Ta</bold>
<bold>sk</bold>
</td>
<td style="width:70.9pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:16.95pt">
<bold>Exhibit</bold>
</td>
</tr>
<tr style="height:5.0pt">
<td style="width:2.0cm;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt" rowspan="3">A1</td>
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">Email regarding change of job</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">728</td>
</tr>
<tr style="height:5.0pt">
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">Family email</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">703</td>
</tr>
<tr style="height:5.0pt">
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">Note: arriving late</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">705</td>
</tr>
<tr style="height:5.0pt">
<td style="width:2.0cm;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt" rowspan="3">A2</td>
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">Biography of a person you admire</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">673</td>
</tr>
<tr style="height:5.0pt">
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">Book a hotel room</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">603</td>
</tr>
<tr style="height:5.0pt">
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">Holiday postcard</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">701</td>
</tr>
<tr style="height:5.0pt">
<td style="width:2.0cm;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt" rowspan="3">B1</td>
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">Letter to a friend</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">528</td>
</tr>
<tr style="height:5.0pt">
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">Funny story</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">382</td>
</tr>
<tr style="height:5.0pt">
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">Airline claim</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:5.0pt">454</td>
</tr>
<tr style="height:7.5pt">
<td style="width:2.0cm;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt" rowspan="2">B2</td>
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt">Application for admission</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt">375</td>
</tr>
<tr style="height:7.5pt">
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt">Smoking in public places</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt">356</td>
</tr>
<tr style="height:7.5pt">
<td style="width:2.0cm;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt" rowspan="2">C1</td>
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt">Film review</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt">184</td>
</tr>
<tr style="height:7.5pt">
<td style="width:233.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt">Gas company complaint</td>
<td style="width:70.9pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:7.5pt">169</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</p>
</sec>
<sec>
<title>
<bold>Prompt used</bold>
</title>
<p>Once the corpus had been analysed, a specific prompt was developed to carry out the tests with the model. As <xref ref-type="bibr" rid="redalyc_331483192028_ref24">Morales-Chan (2023)</xref> points out, a good prompt can guarantee the success of the task. Therefore, it is important to define the objective and provide sufficient context.</p>
<p>The following prompt was used in the three tests carried out in this study —zero-shot learning (Test 1) and fine-tuning (Tests 2 and 3)— in order to maintain methodological consistency in the evaluation criteria. The prompt<underline>
<sup>[1]</sup>
</underline> design was based on the linguistic descriptors of the Cervantes Institute Curriculum Plan and the Common European Framework of Reference for Languages.</p>
<p>
<disp-quote>
<p>
<italic>Tú eres un experto lingüista especializado en enseñanza de español como lengua extranjera. Tu tarea es indicar el nivel de español como lengua extranjera de los textos siguiendo el Plan Curricular del Instituto Cervantes.</italic>
</p>
<p>
<italic>Aquí tienes una descripción de los distintos niveles.</italic>
</p>
<p>
<italic>Niveles A1 y A2 Transacciones básicas relacionadas con su entorno.</italic>
</p>
<p>
<italic>A1: Repertorio limitado de léxico, textos muy breves y sencillos, un promedio de 10 palabras por oración. Formas regulares del presente de indicativo.</italic>
</p>
<p>
<italic>A2: Textos breves con información sencilla, un promedio de 12 palabras por oración. Tiempos verbales del pasado de indicativo: Pretérito perfecto, imperfecto e indefinido.Formas irregulares de presente de indicativo. Imperativo afirmativo.</italic>
</p>
<p>
<italic>Niveles B1 y B2 desenvolverse con textos sobre temas de su interés gustos y preferencias.</italic>
</p>
<p>
<italic>B1: Vocabulario amplio pero sencillo, realizar textos con una tarea concreta. Presente de indicativo, pretérito perfecto, imperfecto e indefinido de indicativo, futuro simple, condicional simple, pretérito pluscuamperfecto de indicativo, presente de subjuntivo. Imperativo negativo.</italic>
</p>
<p>
<italic>B2: Repertorio lingüístico amplio, oraciones subordinadas. Tiempos verbales de indicativo: presente, pretérito perfecto, imperfecto, indefinido, futuro simple y compuesto, condicional simple y compuesto, pretérito pluscuamperfecto. Tiempos verbales de subjuntivo: presente, pretérito imperfecto, pretérito perfecto y pluscuamperfecto.</italic>
</p>
<p>
<italic>C1 transacciones de todo tipo. Disponen de un repertorio de recursos lingüísticos y no lingüísticos lo suficientemente amplio y rico. Pueden enfrentarse a una amplia serie de textos extensos y complejos. Todos los tiempos verbales de indicativo y de subjuntivo el presente, pretérito perfecto, imperfecto y pluscuamperfecto.</italic>
</p>
<p>
<italic>Ahora vas a recibir un TEXTO y teniendo en cuenta lo explicado anteriormente y los errores gramaticales indica al final de tu respuesta con la etiqueta 'NIVEL:' el nivel del TEXTO (A1, A2, B1, B2 o C1).</italic>
</p>
<p>
<italic>TEXTO: “...”</italic>
</p>
</disp-quote>
</p>
<p>With this prompt, brief descriptive information has been added for each level, following the analysis of the corpus mentioned above. Likewise, following the Instituto Cervantes Curriculum Plan (<xref ref-type="bibr" rid="redalyc_331483192028_ref21">Instituto Cervantes, 2006</xref>), a description of the verb tenses used in each of the levels and the types of texts has been included, following the indications of the textual products. In this way, the model is provided with a broader context so that its response is more accurate and tailored to the needs requested.</p>
</sec>
<sec>
<title>
<bold>Testing with the ChatGPT Language Model</bold>
</title>
<p>A total of three different tests were carried out in the study to assess the model's ability to assess the level of the texts in the corpus.</p>
<p>The first test was zero-shot learning. In this process, the model does not receive specific examples, but rather relies on its prior knowledge. To carry out this test, only the prompt mentioned in the previous section was used.</p>
<p>In tests two and three, fine-tuning was performed. This process consists of specialising a pre-trained model to perform a specific task, adapting it to a specific set of data provided to it. We obtained this dataset from the CAES corpus, given that it provides with a set of clear examples for the model. To perform fine-tuning, the model is equipped with an input-output specifying the input and the type of output we want it to provide. In this case, the prompt is the one mentioned in the previous section. Furthermore, the output requested was the level of Spanish. Likewise, to perform fine-tuning, part of the dataset was reserved to verify the response.</p>
<p>Tests 2 and 3 differ from each other in the division of the corpus. For the second test, the corpus was divided into 90%, 5% and 5%. Ninety per cent of the corpus was used to train the model, 5% to validate it and the remaining 5% to test it. In the third test, an 80%–20% division was made, using 80% of the corpus for training and 20% for testing.</p>
</sec>
</sec>
<sec>
<title>
<bold>RESULTS </bold>
</title>
<p>To analyse the results, three evaluation measures widely used in classification tasks have been employed:</p>
<p>
<list list-type="bullet">
<list-item>
<p>
<bold>Precision: </bold>indicates the percentage of examples that the model classified at a certain level and that actually belong to that level.</p>
</list-item>
<list-item>
<p>
<bold>Recall</bold>: indicates the proportion of examples of a specific level that the model correctly identified. For example, the percentage of A1-level texts detected as A1 out of the total number of A1 texts in the corpus.</p>
</list-item>
<list-item>
<p>
<bold>F1-score</bold>: A single value that combines accuracy and coverage through its harmonic mean, providing a balanced measure of performance. This metric is particularly useful when it is important for the model not only to be correct, but also to detect all possible cases in each category.</p>
</list-item>
</list>
</p>
<p>The results of the three experiments are presented below:</p>
<sec>
<title>
<bold>Zero-shot learning experimentation </bold>
</title>
<p>As mentioned above, in this technique, the model is not provided with any examples; it is carried out using the prompt developed. In this case, the results obtained are as follows:</p>
<p>
<table-wrap id="gt2">
<label>Table 2</label>
<caption>
<title>
<italic>ZSL experimental results</italic>
</title>
</caption>
<alt-text>Table 2 ZSL experimental results</alt-text>
<alternatives>
<graphic xlink:href="331483192028_gt3.png" position="anchor" orientation="portrait"/>
<table style="border-collapse:collapse;border:none" id="gt3-526564616c7963">
<thead style="display:none;">
<tr style="display:none;">
<th style="display:none;"/>
</tr>
</thead>
<tbody>
<tr style="height:16.1pt">
<td style="width:120.5pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:16.1pt">
<bold>Tags</bold>
</td>
<td style="width:86.4pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:16.1pt">
<bold>Precision</bold>
</td>
<td style="width:86.45pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:16.1pt">
<bold>Recall</bold>
</td>
<td style="width:86.45pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:16.1pt">
<bold>F1-score</bold>
</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">A1</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9375</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.1402</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.2439</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">A2</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.3333</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.7677</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.4648</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">B1</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.2400</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.2609</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.2500</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">B2</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.4286</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.8110</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.1364</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">C1</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.0000</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</p>
<p>
<xref ref-type="table" rid="gt2">Table 2</xref> shows that, at level A1, accuracy is high, i.e. texts classified as level A1 have a high probability of being at this level. However, coverage is quite low, as the model has detected only a small percentage of texts in the corpus as A1. Therefore, the F1 score is low.</p>
<p>At level A2, the opposite phenomenon to that described at level A1 occurs. Accuracy is low, so it is less successful, but coverage is high. Therefore, we can say that at this level, although it detects A2 texts to a greater extent, it has low accuracy when it comes to detecting the correct level.</p>
<p>In relation to level B1, both factors, precision and coverage, are low. At this level, the model has problems both detecting B1-level texts and identifying them at their correct level.</p>
<p>With regard to level B2, coverage is high, detecting most B2-level texts, but precision is moderate, with most classifications being incorrect.</p>
<p>Finally, the case of level C1 is striking, as both accuracy and coverage are 0; it does not detect any texts at this level.</p>
<p>Furthermore, as this was an experiment using a prompt, the model not only gave the level of the text in its responses, but also added comments on each one about the errors found. Complete examples of these responses, including the original texts and the corrections proposed by the model, are presented in <xref ref-type="app" rid="app1">Appendix 1</xref>.</p>
<p>
<xref ref-type="fig" rid="gf1">Figure 1</xref> complements this information by showing a comparison of the accuracy, coverage and F1-score values for each level evaluated in the zero-shot learning configuration.</p>
<p>
<fig id="gf1">
<label>Figure 1</label>
<caption>
<title>
<italic>Results by level in zero-shot learning experimentation</italic>
</title>
</caption>
<alt-text>Figure 1 Results by level in zero-shot learning experimentation</alt-text>
<graphic xlink:href="331483192028_gf5.png" position="anchor" orientation="portrait">
<alt-text>Figure 1 Results by level in zero-shot learning experimentation</alt-text>
</graphic>
</fig>
</p>
</sec>
<sec>
<title>
<bold>Fine-tuning experimentation 90-5-5</bold>
</title>
<p>In this second experiment, as mentioned above, the corpus was fine-tuned and divided into 90%, 5% and 5% for training, validation and testing of the model. The results of this experiment are shown below:</p>
<p>
<table-wrap id="gt3">
<label>Table 3</label>
<caption>
<title>
<italic>Fine-tuning experimentation 90-5-5</italic>
</title>
</caption>
<alt-text>Table 3 Fine-tuning experimentation 90-5-5</alt-text>
<alternatives>
<graphic xlink:href="331483192028_gt4.png" position="anchor" orientation="portrait"/>
<table style="border-collapse:collapse;border:none" id="gt4-526564616c7963">
<thead style="display:none;">
<tr style="display:none;">
<th style="display:none;"/>
</tr>
</thead>
<tbody>
<tr style="height:19.35pt">
<td style="width:120.5pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:19.35pt">
<bold>Tags</bold>
</td>
<td style="width:86.4pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:19.35pt">
<bold>Precisi</bold>
<bold>on</bold>
</td>
<td style="width:86.45pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:19.35pt">
<bold>Recall</bold>
</td>
<td style="width:86.45pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:19.35pt">
<bold>F1-score</bold>
</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">A1</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9905</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9720</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9811</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">A2</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9519</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9754</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">B1</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9565</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9778</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">B2</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">C1</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</p>
<p>
<xref ref-type="table" rid="gt3">Table 3</xref> shows that the results have higher levels of accuracy and coverage than those obtained with zero-shot learning. At level A1, we can see that the model makes almost no errors and detects almost all texts at level A1.</p>
<p>At level A2, as we can see, the model detects all texts at level A2 with very high accuracy.</p>
<p>With regard to level B1, the model correctly predicts all texts and also has high coverage, with a small percentage undetected.</p>
<p>Finally, with levels B2 and C1, the results show that the model detects all texts at these levels and is correct on all occasions. However, the results of this experiment show that the model can correctly predict all levels with values close to or equal to 1. <xref ref-type="fig" rid="gf2">Figure 2</xref> complements this information by showing a comparison of the accuracy, coverage and F1-score values for each level evaluated in the 90-5-5 fine-tuning configuration.</p>
<p>
<fig id="gf2">
<label>Figure 2</label>
<caption>
<title>
<italic>Results by level in the 90-5-5 fine-tuning experiment</italic>
</title>
</caption>
<alt-text>Figure 2 Results by level in the 90-5-5 fine-tuning experiment</alt-text>
<graphic xlink:href="331483192028_gf6.png" position="anchor" orientation="portrait">
<alt-text>Figure 2 Results by level in the 90-5-5 fine-tuning experiment</alt-text>
</graphic>
</fig>
</p>
</sec>
<sec>
<title>
<bold>Fine-tuning experimentation 80-20</bold>
</title>
<p>In the third experiment, fine-tuning was performed by dividing the corpus into two percentages, 80% for training and 20% for model validation. The results obtained are shown below.</p>
<p>In this experiment, as shown in <xref ref-type="table" rid="gt4">Table 4</xref>, at level A1, both precision and coverage show a high level of accuracy and text detection.</p>
<p>Levels A2 and B1 show a similar result to A1, although with a slightly lower result in accuracy at level A2 and coverage at level B1.</p>
<p>With regard to levels B2 and C1, we observe that the accuracy of the model is excellent, as it correctly predicts these levels, although the coverage is moderately lower.</p>
<p>
<table-wrap id="gt4">
<label>Table 4</label>
<caption>
<title>
<italic>Fine-tuning experimentation 80-20</italic>
</title>
</caption>
<alt-text>Table 4 Fine-tuning experimentation 80-20</alt-text>
<alternatives>
<graphic xlink:href="331483192028_gt5.png" position="anchor" orientation="portrait"/>
<table style="border-collapse:collapse;border:none" id="gt5-526564616c7963">
<thead style="display:none;">
<tr style="display:none;">
<th style="display:none;"/>
</tr>
</thead>
<tbody>
<tr style="height:19.35pt">
<td style="width:120.5pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:19.35pt">
<bold>Tags</bold>
</td>
<td style="width:86.4pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:19.35pt">
<bold>Precisi</bold>
<bold>on</bold>
</td>
<td style="width:86.45pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:19.35pt">
<bold>Recall</bold>
</td>
<td style="width:86.45pt;border-top:solid windowtext 1.0pt;   border-left:none;border-bottom:solid windowtext 1.0pt;border-right:none;   padding:0cm 5.4pt 0cm 5.4pt;height:19.35pt">
<bold>F1-score</bold>
</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">A1</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9884</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9953</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9918</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">A2</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9727</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9899</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9812</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">B1</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9925</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9707</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9815</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">B2</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9795</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9896</td>
</tr>
<tr style="height:12.8pt">
<td style="width:120.5pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">C1</td>
<td style="width:86.4pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">1.0000</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9859</td>
<td style="width:86.45pt;border:none;border-bottom:solid windowtext 1.0pt;   padding:0cm 5.4pt 0cm 5.4pt;height:12.8pt">0.9929</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</p>
<p>
<xref ref-type="fig" rid="gf3">Figure 3</xref> complements this information by showing a comparison of the precision, coverage, and F1-score values for each level evaluated in the 80-20 fine-tuning configuration.</p>
<p>
<fig id="gf3">
<label>Figure 3</label>
<caption>
<title>
<italic>Results by level in the 80-20 fine-tuning experiment</italic>
</title>
</caption>
<alt-text>Figure 3 Results by level in the 80-20 fine-tuning experiment</alt-text>
<graphic xlink:href="331483192028_gf7.png" position="anchor" orientation="portrait">
<alt-text>Figure 3 Results by level in the 80-20 fine-tuning experiment</alt-text>
</graphic>
</fig>
</p>
</sec>
<sec>
<title>
<bold>Statistical analysis</bold>
</title>
<p>To assess the statistical significance of the results, 95% confidence intervals (95% CI) were estimated for the Macro-F1 values using <bold>bootstrapping</bold>, a resampling technique with replacement that allows confidence intervals to be calculated without assuming a specific statistical distribution. One thousand replicates were performed to obtain these intervals, which indicate the range within which the true value of Macro-F1 is expected to fall with a 95% probability. Additionally, hypothesis tests were performed to verify the results. The zero-shot learning experiment obtained a Macro-F1 value of 0.2190 with a confidence interval CI95% = [0.1785, 0.2617], while the fine-tuning experiments achieved values close to perfection: a Macro-F1 value of 0.9869 with a confidence interval CI95% = [0.9748, 0.9959] on the 90-5-5 split and a Macro-F1 value of 0.9874 (95% CI = [0.9812, 0.9933]) on the 80-20 split. The comparisons were made using non-parametric bootstrap-based hypothesis tests—i.e., statistical contrasts that do not require assuming a specific data distribution and are based on multiple random resamples. These analyses showed that both fine-tuning experiments statistically significantly outperformed the zero-shot experiment (Δ<sup>
<xref ref-type="fn" rid="fn5">1</xref>
</sup> ≈ 0.77; <italic>p</italic> &lt; 0.001 in both cases). In contrast, no significant differences were observed between the two fine-tuning experiments (Δ = −0.0005; 95% CI = [−0.0133, 0.0113]; <italic>p</italic> = 0.964). These results confirm that fine-tuning substantially improves the model's ability to assess written competence in ELE.</p>
</sec>
</sec>
<sec>
<title>
<bold>PEDAGOGICAL RELEVANCE OF THE MODEL AND ITS APPLICATION IN BOTH FACE-TO-FACE AND DISTANCE LEARNING CONTEXTS</bold>
</title>
<p>The results obtained show that a language model adjusted with a specialised corpus can achieve very high performance in the automatic classification of written texts according to CEFR levels. This capability has clear potential to facilitate teaching, especially in the initial stage of student placement, constituting a pedagogical innovation in the use of artificial intelligence for language teaching.</p>
<p>In a real learning environment, the system could be integrated into a Learning Management System (LMS) as a diagnostic assessment module. The flow of use would be simple: the student submits a text, the model automatically classifies it by level and presents the result to the teacher using a rubric aligned with the descriptors of the Instituto Cervantes Curriculum Plan (PCIC). In this way, teachers could assign students to the course corresponding to their actual level, optimising the adjustment of groups and avoiding gaps that could affect learning progress.</p>
<p>In future implementations, and based on more exhaustive analyses, the system could generate more detailed reports based on the PCIC, identifying specific linguistic areas to be reinforced (e.g. use of verb tenses, textual cohesion or lexical repertoire). This information would allow the course to be tailored not only to the student’s level, but also to their main weaknesses, guiding the teaching programme towards improving these aspects.</p>
<p>In distance education and online learning contexts, integration into an LMS would have the same diagnostic function, automatically classifying students from their first access. This would allow for efficient group organisation even in non-face-to-face environments, which is especially relevant in massive courses or virtual programmes with continuous enrolment.</p>
<p>Compared to tools such as Write &amp; Improve (<xref ref-type="bibr" rid="redalyc_331483192028_ref9">Cambridge English, n.d.</xref>), which focus mainly on corrective feedback and formative assessment, the proposal presented here offers a complementary approach: automatic classification by CEFR levels. This feature, combined with the possibility of direct integration into educational platforms, reinforces its value as a teaching support tool for optimising initial placement and streamlining course planning.</p>
<p>In all cases, the system is designed as a teaching aid and not as a replacement, with the aim of streamlining initial assessment tasks and freeing up time for higher value-added activities, such as individualised feedback or progress monitoring.</p>
<p>Among the possible future improvements are expanding the corpus to include more genres and topics, adding additional lexical and pragmatic descriptors, experimenting with other language models, and adapting the system to other languages and educational levels. It would also be relevant to evaluate its real impact on student performance and motivation, as well as its smooth integration into different LMSs and educational contexts.</p>
</sec>
<sec>
<title>
<bold>DISCUSSION AND CONCLUSIONS</bold>
</title>
<p>Once the tests have been carried out and the results obtained, there is a clear contrast between the model's performance in the zero-shot learning configuration and in the fine-tuning experiments. In the former, the model does not achieve adequate accuracy at any of the levels, with only a relatively high value in A1, albeit with very low coverage. At levels A2 and B2, coverage is medium-high (0.76 and 0.81), indicating that the model detects most of the texts at those levels, but with low accuracy. Level C1 is particularly problematic, with zero values for both accuracy and coverage. These data confirm that, with its original pre-training, the model is not capable of reliably assessing the level of ELE texts according to the Instituto Cervantes Curriculum Plan.</p>
<p>In contrast, the two fine-tuning tests (90-5-5 and 80-20) provided accuracy and coverage values very close to 1 at all levels, with particular strength at B2 and C1. The comparison between the two experiments suggests that a larger volume of training data can improve performance, although even with less data the model maintains a high predictive capacity. This behaviour coincides with that observed in previous work on model adaptation for specific natural language processing tasks ((<xref ref-type="bibr" rid="redalyc_331483192028_ref17">García-Peñalvo, 2024</xref>; <xref ref-type="bibr" rid="redalyc_331483192028_ref18">García-Peñalvo et al., 2024</xref>), where customisation of the system significantly increases its effectiveness.</p>
<p>
<fig id="gf4">
<label>
<bold>Figure 4</bold>
</label>
<caption>
<title>
<italic>Comparison of experimental results</italic>
</title>
</caption>
<alt-text>Figure 4 Comparison of experimental results</alt-text>
<graphic xlink:href="331483192028_gf8.png" position="anchor" orientation="portrait">
<alt-text>Figure 4 Comparison of experimental results</alt-text>
</graphic>
</fig>
</p>
<p>Compared to tools such as Write &amp; Improve (<xref ref-type="bibr" rid="redalyc_331483192028_ref9">Cambridge English, n.d.</xref>), which focus on formative feedback and error detection, this study offers a complementary approach: automatic classification by CEFR levels, which could be integrated into learning environments to optimise the initial placement of students and streamline teaching. Likewise, studies such as those by <xref ref-type="bibr" rid="redalyc_331483192028_ref8">Burstein et al. (2003</xref>, e-rater) and <xref ref-type="bibr" rid="redalyc_331483192028_ref23">McNamara et al. (2014</xref>, Coh-Metrix) had already shown the usefulness of combining linguistic features and automatic metrics to evaluate writing; our results reinforce this line of thinking, showing that a model adjusted with specific corpora achieves very high performance levels.</p>
<p>On the other hand, in the zero-shot learning experiment, the model not only indicated the texts’ level but also generated comments on errors and suggestions for correcting the texts. Although this functionality is not relevant to the main objective of this study, it could be explored in future research as a teaching support resource to help students identify and correct their errors.</p>
<p>In terms of limitations, the study was conducted on a single corpus due to the scarcity of specialised and level-assessed ELE corpora that meet homogeneous criteria and are managed by experts. Furthermore, the classification was based on grammatical descriptors present in the prompt (e.g., verbal repertoire, average number of words per sentence, use of certain tenses and modes), without systematically integrating complex syntactic structures, specific lexical resources, or pragmatic aspects such as appropriateness or discursive coherence, which are particularly relevant at intermediate and advanced levels.</p>
<p>As lines of future work, we propose:</p>
<p>
<list list-type="order">
<list-item>
<p>Design and validate new corpora for Spanish as a foreign language, level-assessed by experts, with greater thematic diversity and representativeness of levels.</p>
</list-item>
<list-item>
<p>Explore the application of the method with other language models to compare their performance.</p>
</list-item>
<list-item>
<p>Expand the set of linguistic descriptors, incorporating syntactic, lexical, and pragmatic indicators.</p>
</list-item>
<list-item>
<p>Experiment with controlled variations of the prompt to evaluate their impact on classification.</p>
</list-item>
<list-item>
<p>Validate the system with authentic student texts in real educational contexts and integrate the tool into learning management systems (LMS).</p>
</list-item>
</list>
</p>
<p>Overall, the results achieved confirm that generative artificial intelligence, and in particular language models adjusted with specific data, can become an effective resource for streamlining the initial level assessment of ELE students, supporting teaching work, and optimizing teaching-learning processes.</p>
<p>Within this reflection, it is also pertinent to address the ethical and legal aspects involved in the use of learner corpora, which are discussed in the following subsection.</p>
<sec>
<title>
<bold>Ethical and licensing issues</bold>
</title>
<p>This study used the Corpus of Spanish Learners (CAES), which is available online for academic and research purposes (<xref ref-type="bibr" rid="redalyc_331483192028_ref32">University of Santiago de Compostela, n.d.</xref>). Its texts have been anonymised beforehand so that they do not include any identifiable personal data, and they were compiled in regulated educational contexts and with expert validation, which guarantees the appropriate treatment of the information.</p>
<p>From an ethical point of view, it is necessary to consider that the expansion or combination of the corpus with other resources must address both clarity in usage licences and the prevention of biases associated with mother tongue or sociocultural context. These factors are essential to ensure fair and reproducible classification of texts and, ultimately, to guarantee the responsible use of artificial intelligence in educational environments.</p>
</sec>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>This work is funded by the Ministerio para la Transformación Digital y de la Función Pública and Plan de Recuperación, Transformación y Resiliencia - Funded by EU – NextGenerationEU within the framework of the project Desarrollo Modelos ALIA. This work has also been partially supported by Project CONSENSO (PID2021-122263OB-C21), Project MODERATES (TED2021-130145B-I00) and Project SocialTox (PDC2022-133146-C21) funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR, Project ROMANET (CERV-2024-CHAR-LITI-101215052), funded by the European Union under the Citizens, Equality, Rights and Values programme, and Project HEART-NLP-UJA (PID2024-156263OB-C21) funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU. The research work conducted by Salud María Jiménez-Zafra is part of the grant RYC2023-044481-I, supported by MICIU/AEI/10.13039/501100011033 and by ESF+.</p>
</ack>
<ref-list>
<title>
<bold>REFERENCES</bold>
</title>
<ref id="redalyc_331483192028_ref1">
<mixed-citation publication-type="journal">Aparicio Gómez, W. O. (2023). La inteligencia artificial y su incidencia en la educación: transformando el aprendizaje para el siglo XXI. <italic>Revista Internacional de Pedagogía e Innovación Educativa</italic>, <italic>3</italic>(2), 217-229. <ext-link ext-link-type="uri" xlink:href="https://dialnet.unirioja.es/servlet/articulo?codigo=9624350">https://dialnet.unirioja.es/servlet/articulo?codigo=9624350</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aparicio Gómez</surname>
<given-names>W. O.</given-names>
</name>
</person-group>
<article-title>La inteligencia artificial y su incidencia en la educación: transformando el aprendizaje para el siglo XXI</article-title>
<source>Revista Internacional de Pedagogía e Innovación Educativa</source>
<year>2023</year>
<volume>3</volume>
<issue>2</issue>
<fpage>217</fpage>
<lpage>229</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://dialnet.unirioja.es/servlet/articulo?codigo=9624350">https://dialnet.unirioja.es/servlet/articulo?codigo=9624350</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref2">
<mixed-citation publication-type="journal">Area-Moreira, M., Del Prete, A., Sanabria-Mesa, A. L., &amp; Sannicolás-Santos, M. B. (2024). No todas las herramientas de IA son iguales: análisis de aplicaciones inteligentes para la enseñanza universitaria. <italic>Digital Education Review</italic>, <italic>45</italic>, 141-149. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1344/der.2024.45.141-149">https://doi.org/10.1344/der.2024.45.141-149</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Area-Moreira</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Del Prete</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sanabria-Mesa</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Sannicolás-Santos</surname>
<given-names>M. B.</given-names>
</name>
</person-group>
<article-title>No todas las herramientas de IA son iguales: análisis de aplicaciones inteligentes para la enseñanza universitaria</article-title>
<source>Digital Education Review</source>
<year>2024</year>
<volume>45</volume>
<fpage>141</fpage>
<lpage>149</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1344/der.2024.45.141-149">https://doi.org/10.1344/der.2024.45.141-149</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref3">
<mixed-citation publication-type="book">Barroso-Osuna, J., &amp; Cabero-Almenara, J. (2025). Potencialidades de la inteligencia artificial en la personalización de la educación. In P. Román-Graván, J. Barroso-Osuna, J. Cabero-Almenara, &amp; C. Llorente-Cejudo (Eds.), <italic>Visiones sobre la integración educativa de la inteligencia artificial</italic> (1st ed.). Dykinson. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.14679/4177">https://doi.org/10.14679/4177</ext-link>
</mixed-citation>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Barroso-Osuna</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Cabero-Almenara</surname>
<given-names>J.</given-names>
</name>
</person-group>
<source>Visiones sobre la integración educativa de la inteligencia artificial</source>
<year>2025</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.14679/4177">https://doi.org/10.14679/4177</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref4">
<mixed-citation publication-type="journal">Baskara, R., &amp; Mukarto, M. (2023). Exploring the implications of ChatGPT for language learning in higher education. <italic>Indonesian Journal of English Language Teaching and Applied Linguistics</italic>, <italic>7</italic>(2), 343-358. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.21093/ijeltal.v7i2.1387">https://doi.org/10.21093/ijeltal.v7i2.1387</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baskara</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Mukarto</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Exploring the implications of ChatGPT for language learning in higher education</article-title>
<source>Indonesian Journal of English Language Teaching and Applied Linguistics</source>
<year>2023</year>
<volume>7</volume>
<issue>2</issue>
<fpage>343</fpage>
<lpage>358</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.21093/ijeltal.v7i2.1387">https://doi.org/10.21093/ijeltal.v7i2.1387</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref5">
<mixed-citation publication-type="confproc">Biedma Torrecillas, A., Chamorro Guerrero, M. D., Lozano, G., &amp; Sánchez Cuadrado, A. (2012). Diseño y validación de las pruebas de nivel del CLM de la Universidad de Granada. In <italic>Actas del VII Congreso ACLES: Multilingüismo en los centros de lengua universitarios: evaluación, acreditación, calidad y política lingüística</italic> (pp. 26-37). ACLES. <ext-link ext-link-type="uri" xlink:href="https://dialnet.unirioja.es/servlet/libro?codigo=501925">https://dialnet.unirioja.es/servlet/libro?codigo=501925</ext-link>
</mixed-citation>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Biedma Torrecillas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Chamorro Guerrero</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Lozano</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Sánchez Cuadrado</surname>
<given-names>A.</given-names>
</name>
</person-group>
<source>Actas del VII Congreso ACLES: Multilingüismo en los centros de lengua universitarios: evaluación, acreditación, calidad y política lingüística</source>
<year>2012</year>
<fpage>26</fpage>
<lpage>37</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://dialnet.unirioja.es/servlet/libro?codigo=501925">https://dialnet.unirioja.es/servlet/libro?codigo=501925</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref6">
<mixed-citation publication-type="journal">Bolaño-García, M., &amp; Duarte-Acosta, N. (2024). Una revisión sistemática del uso de la inteligencia artificial en la educación. <italic>Revista Colombiana de Cirugía, 39</italic>(1), 51-63. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.30944/20117582.2365">https://doi.org/10.30944/20117582.2365</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bolaño-García</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Duarte-Acosta</surname>
<given-names>N.</given-names>
</name>
</person-group>
<article-title>Una revisión sistemática del uso de la inteligencia artificial en la educación</article-title>
<source>Revista Colombiana de Cirugía</source>
<year>2024</year>
<volume>39</volume>
<issue>1</issue>
<fpage>51</fpage>
<lpage>63</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.30944/20117582.2365">https://doi.org/10.30944/20117582.2365</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref7">
<mixed-citation publication-type="journal">Burstein, J., Elliot, N., Beigman Klebanov, B., Madnani, N., Napolitano, D., Schwartz, M., Houghton, P., &amp; Molloy, H. (2018). Writing mentor: Writing progress using self-regulated writing support. <italic>Journal of Writing Analytics</italic>, <italic>2</italic>, 285-313. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.37514/JWA-J.2018.2.1.12">https://doi.org/10.37514/JWA-J.2018.2.1.12</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burstein</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Elliot</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Beigman Klebanov</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Madnani</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Napolitano</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Houghton</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Molloy</surname>
<given-names>H.</given-names>
</name>
</person-group>
<article-title>Writing mentor: Writing progress using self-regulated writing support</article-title>
<source>Journal of Writing Analytics</source>
<year>2018</year>
<volume>2</volume>
<fpage>285</fpage>
<lpage>313</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.37514/JWA-J.2018.2.1.12">https://doi.org/10.37514/JWA-J.2018.2.1.12</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref8">
<mixed-citation publication-type="confproc">Burstein, J., Kukich, K., Wolff, S., Lu, C., Chodorow, M., Braden-Harder, L., &amp; Harris, M. D. (2003). E-rater as a diagnostic tool for writing instruction. In <italic>Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations</italic> (pp. 79-81). Association for Computational Linguistics.</mixed-citation>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Burstein</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kukich</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Wolff</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Chodorow</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Braden-Harder</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>M. D.</given-names>
</name>
</person-group>
<source>Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations</source>
<year>2003</year>
<fpage>79</fpage>
<lpage>81</lpage>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref9">
<mixed-citation publication-type="webpage">Cambridge English. (n. d.). <italic>Write &amp; Improve</italic>. <ext-link ext-link-type="uri" xlink:href="https://writeandimprove.com">https://writeandimprove.com</ext-link>
</mixed-citation>
<element-citation publication-type="webpage">
<source>Write &amp; Improve</source>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://writeandimprove.com">https://writeandimprove.com</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref10">
<mixed-citation publication-type="book">Cantero, M. V. (2024). Aproximación a un posible uso de ChatGPT para nivelar la expresión escrita en ELE. In F. M. Sirignano, R. Martínez Roig, &amp; A. López Padrón (Eds.), <italic>Enseñanza y aprendizaje en la era digital desde la investigación y la innovación</italic> (pp. 55-64). Octaedro.</mixed-citation>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Cantero</surname>
<given-names>M. V.</given-names>
</name>
</person-group>
<source>Enseñanza y aprendizaje en la era digital desde la investigación y la innovación</source>
<year>2024</year>
<fpage>55</fpage>
<lpage>64</lpage>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref11">
<mixed-citation publication-type="webpage">Centro Virtual Cervantes. (n. d.). <italic>Ítem de respuesta cerrada.</italic>
<ext-link ext-link-type="uri" xlink:href="https://cvc.cervantes.es/ensenanza/biblioteca_ele/diccio_ele/diccionario/itemrespuestacerrada.htm">https://cvc.cervantes.es/ensenanza/biblioteca_ele/diccio_ele/diccionario/itemrespuestacerrada.htm</ext-link>
</mixed-citation>
<element-citation publication-type="webpage">
<source>Ítem de respuesta cerrada.</source>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://cvc.cervantes.es/ensenanza/biblioteca_ele/diccio_ele/diccionario/itemrespuestacerrada.htm">https://cvc.cervantes.es/ensenanza/biblioteca_ele/diccio_ele/diccionario/itemrespuestacerrada.htm</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref12">
<mixed-citation publication-type="database">Chan, C. K. Y., &amp; Tsi, L. H. Y. (2023). <italic>The AI revolution in education: Will AI replace or assist teachers in higher education</italic> [Preprint]. arXiv. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2305.01185">https://doi.org/10.48550/arXiv.2305.01185</ext-link>
</mixed-citation>
<element-citation publication-type="database">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>C. K. Y.</given-names>
</name>
<name>
<surname>Tsi</surname>
<given-names>L. H. Y.</given-names>
</name>
</person-group>
<article-title>The AI revolution in education: Will AI replace or assist teachers in higher education</article-title>
<source>arXiv</source>
<year>2023</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2305.01185">https://doi.org/10.48550/arXiv.2305.01185</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref13">
<mixed-citation publication-type="webpage">Columbia University, Department of Latin American and Iberian Cultures. (n. d.). <italic>Spanish placement exam</italic>. Recuperado el 25 de julio de 2025, de <ext-link ext-link-type="uri" xlink:href="https://laic.columbia.edu/content/spanish-second-language-placement-exam">https://laic.columbia.edu/content/spanish-second-language-placement-exam</ext-link>
</mixed-citation>
<element-citation publication-type="webpage">
<source>Spanish placement exam</source>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://laic.columbia.edu/content/spanish-second-language-placement-exam">https://laic.columbia.edu/content/spanish-second-language-placement-exam</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref14">
<mixed-citation publication-type="report">Council of Europe. (2002). <italic>Marco común europeo de referencia para las lenguas: aprendizaje, enseñanza, evaluación</italic>. Instituto Cervantes; Ministerio de Educación, Cultura y Deporte. <ext-link ext-link-type="uri" xlink:href="https://cvc.cervantes.es/ensenanza/biblioteca_ele/marco/cvc_mer.pdf">https://cvc.cervantes.es/ensenanza/biblioteca_ele/marco/cvc_mer.pdf</ext-link>
</mixed-citation>
<element-citation publication-type="report">
<person-group person-group-type="author">
<collab>Council of Europe</collab>
</person-group>
<source>Marco común europeo de referencia para las lenguas: aprendizaje, enseñanza, evaluación</source>
<year>2002</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://cvc.cervantes.es/ensenanza/biblioteca_ele/marco/cvc_mer.pdf">https://cvc.cervantes.es/ensenanza/biblioteca_ele/marco/cvc_mer.pdf</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref15">
<mixed-citation publication-type="journal">Crespo Mendoza, R., Rodríguez López, W., Montenegro Patrel, M., &amp; Tomalá Tomalá, G. (2024). IA: una herramienta para asistir a los docentes en la evaluación de los estudiantes. <italic>Conocimiento Global, 9</italic>(2), 305-323. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.70165/cglobal.v9i2.423">https://doi.org/10.70165/cglobal.v9i2.423</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crespo Mendoza</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Rodríguez López</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Montenegro Patrel</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tomalá Tomalá</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>IA: una herramienta para asistir a los docentes en la evaluación de los estudiantes</article-title>
<source>Conocimiento Global</source>
<year>2024</year>
<volume>9</volume>
<issue>2</issue>
<fpage>305</fpage>
<lpage>323</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.70165/cglobal.v9i2.423">https://doi.org/10.70165/cglobal.v9i2.423</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref16">
<mixed-citation publication-type="journal">Fajardo, G. M., Ayala, D. C., Arroba, E. M., &amp; López, M. (2023). Inteligencia artificial y la educación universitaria: una revisión sistemática. <italic>Magazine de las Ciencias: Revista de Investigación e Innovación, 8</italic>(1), 109-131. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.33262/rmc.v8i1.2935">https://doi.org/10.33262/rmc.v8i1.2935</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fajardo</surname>
<given-names>G. M.</given-names>
</name>
<name>
<surname>Ayala</surname>
<given-names>D. C.</given-names>
</name>
<name>
<surname>Arroba</surname>
<given-names>E. M.</given-names>
</name>
<name>
<surname>López</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Inteligencia artificial y la educación universitaria: una revisión sistemática</article-title>
<source>Magazine de las Ciencias: Revista de Investigación e Innovación</source>
<year>2023</year>
<volume>8</volume>
<issue>1</issue>
<fpage>109</fpage>
<lpage>131</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.33262/rmc.v8i1.2935">https://doi.org/10.33262/rmc.v8i1.2935</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref17">
<mixed-citation publication-type="journal">García-Peñalvo, F. J. (2024). Cómo afecta la inteligencia artificial generativa a los procesos de evaluación. <italic>Cuadernos de Pedagogía, (549)</italic>.</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>García-Peñalvo</surname>
<given-names>F. J.</given-names>
</name>
</person-group>
<article-title>Cómo afecta la inteligencia artificial generativa a los procesos de evaluación</article-title>
<source>Cuadernos de Pedagogía</source>
<year>2024</year>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref18">
<mixed-citation publication-type="journal">García-Peñalvo, F. J., Llorens-Largo, F., &amp; Vidal, J. (2024). La nueva realidad de la educación ante los avances de la inteligencia artificial generativa. <italic>RIED-Revista Iberoamericana de Educación a Distancia, 27</italic>(1), 9-39. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5944/ried.27.1.37716">https://doi.org/10.5944/ried.27.1.37716</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>García-Peñalvo</surname>
<given-names>F. J.</given-names>
</name>
<name>
<surname>Llorens-Largo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Vidal</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>La nueva realidad de la educación ante los avances de la inteligencia artificial generativa</article-title>
<source>RIED-Revista Iberoamericana de Educación a Distancia</source>
<year>2024</year>
<volume>27</volume>
<issue>1</issue>
<fpage>9</fpage>
<lpage>39</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5944/ried.27.1.37716">https://doi.org/10.5944/ried.27.1.37716</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref19">
<mixed-citation publication-type="journal">Hernández-León, N., &amp; Rodríguez-Conde, M. J. (2024). Inteligencia artificial aplicada a la educación y la evaluación educativa en la universidad: Introducción de sistemas de tutorización inteligentes, sistemas de reconocimiento y otras tendencias futuras. <italic>Revista de Educación a Distancia (RED), 24</italic>(78), Artículo 6. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6018/red.594651">https://doi.org/10.6018/red.594651</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hernández-León</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Rodríguez-Conde</surname>
<given-names>M. J.</given-names>
</name>
</person-group>
<article-title>Inteligencia artificial aplicada a la educación y la evaluación educativa en la universidad: Introducción de sistemas de tutorización inteligentes, sistemas de reconocimiento y otras tendencias futuras</article-title>
<source>Revista de Educación a Distancia (RED)</source>
<year>2024</year>
<volume>24</volume>
<issue>78</issue>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6018/red.594651">https://doi.org/10.6018/red.594651</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref20">
<mixed-citation publication-type="journal">Hong, W. C. H. (2023). The impact of ChatGPT on foreign language teaching and learning: Opportunities in education and research. <italic>Journal of Educational Technology and Innovation, 5</italic>(1), 38-53. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.61414/jeti.v5i1.103">https://doi.org/10.61414/jeti.v5i1.103</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hong</surname>
<given-names>W. C. H.</given-names>
</name>
</person-group>
<article-title>The impact of ChatGPT on foreign language teaching and learning: Opportunities in education and research</article-title>
<source>Journal of Educational Technology and Innovation</source>
<year>2023</year>
<volume>5</volume>
<issue>1</issue>
<fpage>38</fpage>
<lpage>53</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.61414/jeti.v5i1.103">https://doi.org/10.61414/jeti.v5i1.103</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref21">
<mixed-citation publication-type="webpage">Instituto Cervantes. (2006). <italic>Plan curricular del Instituto Cervantes: Niveles de referencia para el español</italic> (3 vols.). Biblioteca Nueva. <ext-link ext-link-type="uri" xlink:href="https://cvc.cervantes.es/ensenanza/biblioteca_ele/plan_curricular">https://cvc.cervantes.es/ensenanza/biblioteca_ele/plan_curricular</ext-link>/</mixed-citation>
<element-citation publication-type="webpage">
<person-group person-group-type="author">
<collab>Instituto Cervantes</collab>
</person-group>
<source>Plan curricular del Instituto Cervantes: Niveles de referencia para el español</source>
<year>2006</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://cvc.cervantes.es/ensenanza/biblioteca_ele/plan_curricular">https://cvc.cervantes.es/ensenanza/biblioteca_ele/plan_curricular</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref22">
<mixed-citation publication-type="database">Li, Y. (2023). <italic>A practical survey on zero-shot prompt design for in-context learning</italic> [Preprint]. arXiv. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2309.13205">https://doi.org/10.48550/arXiv.2309.13205</ext-link>
</mixed-citation>
<element-citation publication-type="database">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
</person-group>
<article-title>A practical survey on zero-shot prompt design for in-context learning</article-title>
<source>arXiv</source>
<year>2023</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2309.13205">https://doi.org/10.48550/arXiv.2309.13205</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref23">
<mixed-citation publication-type="book">McNamara, D. S., Graesser, A. C., McCarthy, P. M., &amp; Cai, Z. (2014). <italic>Automated evaluation of text and discourse with Coh-Metrix</italic>. Cambridge University Press. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1017/CBO9780511894664">https://doi.org/10.1017/CBO9780511894664</ext-link>
</mixed-citation>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>McNamara</surname>
<given-names>D. S.</given-names>
</name>
<name>
<surname>Graesser</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>McCarthy</surname>
<given-names>P. M.</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>Z.</given-names>
</name>
</person-group>
<source>Automated evaluation of text and discourse with Coh-Metrix</source>
<year>2014</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1017/CBO9780511894664">https://doi.org/10.1017/CBO9780511894664</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref24">
<mixed-citation publication-type="book">Morales-Chan, M. A. (2023). <italic>Explorando el potencial de ChatGPT: Una clasificación de prompts efectivos para la enseñanza</italic>. Universidad Galileo. <ext-link ext-link-type="uri" xlink:href="https://biblioteca.galileo.edu/tesario/handle/123456789/1348">https://biblioteca.galileo.edu/tesario/handle/123456789/1348</ext-link>
</mixed-citation>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Morales-Chan</surname>
<given-names>M. A.</given-names>
</name>
</person-group>
<source>Explorando el potencial de ChatGPT: Una clasificación de prompts efectivos para la enseñanza</source>
<year>2023</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://biblioteca.galileo.edu/tesario/handle/123456789/1348">https://biblioteca.galileo.edu/tesario/handle/123456789/1348</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref25">
<mixed-citation publication-type="journal">Moreno, R. D. (2019). La llegada de la inteligencia artificial a la educación. <italic>Revista de Investigación en Tecnologías de la Información, 7</italic>(14), 260-270. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.36825/riti.07.14.022">https://doi.org/10.36825/riti.07.14.022</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moreno</surname>
<given-names>R. D.</given-names>
</name>
</person-group>
<article-title>La llegada de la inteligencia artificial a la educación</article-title>
<source>Revista de Investigación en Tecnologías de la Información</source>
<year>2019</year>
<volume>7</volume>
<issue>14</issue>
<fpage>260</fpage>
<lpage>270</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.36825/riti.07.14.022">https://doi.org/10.36825/riti.07.14.022</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref26">
<mixed-citation publication-type="webpage">OpenAI. (2022). <italic>ChatGPT (versión 3.5) [Artificial intelligence language model].</italic>
<ext-link ext-link-type="uri" xlink:href="https://openai.com">https://openai.com</ext-link>
</mixed-citation>
<element-citation publication-type="webpage">
<person-group person-group-type="author">
<collab>OpenAI</collab>
</person-group>
<source>ChatGPT (versión 3.5) [Artificial intelligence language model].</source>
<year>2022</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://openai.com">https://openai.com</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref27">
<mixed-citation publication-type="journal">Owan, V. J., Abang, K. B., Idika, D. O., Etta, E. O., &amp; Bassey, B. A. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. <italic>EURASIA Journal of Mathematics, Science and Technology Education, 19</italic>(8), em2307. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.29333/ejmste/13428">https://doi.org/10.29333/ejmste/13428</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Owan</surname>
<given-names>V. J.</given-names>
</name>
<name>
<surname>Abang</surname>
<given-names>K. B.</given-names>
</name>
<name>
<surname>Idika</surname>
<given-names>D. O.</given-names>
</name>
<name>
<surname>Etta</surname>
<given-names>E. O.</given-names>
</name>
<name>
<surname>Bassey</surname>
<given-names>B. A.</given-names>
</name>
</person-group>
<article-title>Exploring the potential of artificial intelligence tools in educational measurement and assessment</article-title>
<source>EURASIA Journal of Mathematics, Science and Technology Education</source>
<year>2023</year>
<volume>19</volume>
<issue>8</issue>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.29333/ejmste/13428">https://doi.org/10.29333/ejmste/13428</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref28">
<mixed-citation publication-type="book">Palacios Martínez, I., Barcala Rodríguez, F. M., &amp; Rojo, G. (2019). El corpus de aprendices de español (CAES) y sus aplicaciones para la enseñanza y aprendizaje del español como lengua extranjera. In M. Blanco, H. Olbertz, &amp; V. Vázquez Rozas (Eds.), <italic>Corpus y construcciones: Perspectivas hispánicas</italic> (pp. 273-301). Universidade de Santiago de Compostela (Verba, Anexo 79). <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.15304/9788417595876">https://doi.org/10.15304/9788417595876</ext-link>
</mixed-citation>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Palacios Martínez</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Barcala Rodríguez</surname>
<given-names>F. M.</given-names>
</name>
<name>
<surname>Rojo</surname>
<given-names>G.</given-names>
</name>
</person-group>
<source>Corpus y construcciones: Perspectivas hispánicas</source>
<year>2019</year>
<fpage>273</fpage>
<lpage>301</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.15304/9788417595876">https://doi.org/10.15304/9788417595876</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref29">
<mixed-citation publication-type="journal">Pourpanah, F., Abdar, M., Luo, Y., Zhou, X., Wang, R., &amp; Lim, C. P. (2023). A review of generalized zero-shot learning methods. <italic>IEEE Transactions on Pattern Analysis and Machine Intelligence, 45</italic>(4), 4051-4070. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/TPAMI.2022.3182926">https://doi.org/10.1109/TPAMI.2022.3182926</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pourpanah</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Abdar</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Lim</surname>
<given-names>C. P.</given-names>
</name>
</person-group>
<article-title>A review of generalized zero-shot learning methods</article-title>
<source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
<year>2023</year>
<volume>45</volume>
<issue>4</issue>
<fpage>4051</fpage>
<lpage>4070</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/TPAMI.2022.3182926">https://doi.org/10.1109/TPAMI.2022.3182926</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref30">
<mixed-citation publication-type="journal">Roumeliotis, K. I., Tselikas, N. D., &amp; Nasiopoulos, D. K. (2024). Next-generation spam filtering: Comparative fine-tuning of LLMs, NLPs, and CNN models for email spam classification. <italic>Electronics, 13</italic>(11), 2034. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/electronics13112034">https://doi.org/10.3390/electronics13112034</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roumeliotis</surname>
<given-names>K. I.</given-names>
</name>
<name>
<surname>Tselikas</surname>
<given-names>N. D.</given-names>
</name>
<name>
<surname>Nasiopoulos</surname>
<given-names>D. K.</given-names>
</name>
</person-group>
<article-title>Next-generation spam filtering: Comparative fine-tuning of LLMs, NLPs, and CNN models for email spam classification</article-title>
<source>Electronics</source>
<year>2024</year>
<volume>13</volume>
<issue>11</issue>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/electronics13112034">https://doi.org/10.3390/electronics13112034</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref31">
<mixed-citation publication-type="thesis">Salguero Romero, P. (2023). <italic>La traducció pedagògica i l’ús de ChatGPT-3 a classes d’anglès com a segona llengua per a nens i nenes</italic> [Trabajo de fin de grado, Universitat Autònoma de Barcelona]. Repositorio UAB. <ext-link ext-link-type="uri" xlink:href="https://ddd.uab.cat/record/279383">https://ddd.uab.cat/record/279383</ext-link>
</mixed-citation>
<element-citation publication-type="thesis">
<person-group person-group-type="author">
<name>
<surname>Salguero Romero</surname>
<given-names>P.</given-names>
</name>
</person-group>
<source>La traducció pedagògica i l’ús de ChatGPT-3 a classes d’anglès com a segona llengua per a nens i nenes</source>
<year>2023</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://ddd.uab.cat/record/279383">https://ddd.uab.cat/record/279383</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref32">
<mixed-citation publication-type="webpage">Universidad de Santiago de Compostela. (n. d.). <italic>Corpus de aprendices de español (CAES)</italic>. <ext-link ext-link-type="uri" xlink:href="https://galvan.usc.es/caes">https://galvan.usc.es/caes</ext-link>
</mixed-citation>
<element-citation publication-type="webpage">
<source>Corpus de aprendices de español (CAES)</source>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://galvan.usc.es/caes">https://galvan.usc.es/caes</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref33">
<mixed-citation publication-type="webpage">University of Wisconsin–Madison, Testing and Evaluation Services. (n. d.). <italic>Spanish placement test</italic>. University of Wisconsin–Madison. <ext-link ext-link-type="uri" xlink:href="https://testing.wisc.edu/centerpages/spanishtest.html">https://testing.wisc.edu/centerpages/spanishtest.html</ext-link>
</mixed-citation>
<element-citation publication-type="webpage">
<source>Spanish placement test</source>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://testing.wisc.edu/centerpages/spanishtest.html">https://testing.wisc.edu/centerpages/spanishtest.html</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref34">
<mixed-citation publication-type="thesis">Wang, L. (2024). <italic>El uso de ChatGPT y Gemini para la preparación de actividades de comprensión oral de CLE: Un estudio comparativo</italic> [Trabajo de fin de máster, Universitat Autònoma de Barcelona]. Repositorio UAB. <ext-link ext-link-type="uri" xlink:href="https://ddd.uab.cat/record/300353">https://ddd.uab.cat/record/300353</ext-link>
</mixed-citation>
<element-citation publication-type="thesis">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
</person-group>
<source>El uso de ChatGPT y Gemini para la preparación de actividades de comprensión oral de CLE: Un estudio comparativo</source>
<year>2024</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://ddd.uab.cat/record/300353">https://ddd.uab.cat/record/300353</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref35">
<mixed-citation publication-type="database">Wang, X., Lee, Y., &amp; Park, J. (2022). <italic>Automated evaluation for student argumentative writing: A survey</italic> [Preprint]. arXiv. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2205.04083">https://doi.org/10.48550/arXiv.2205.04083</ext-link>
</mixed-citation>
<element-citation publication-type="database">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Automated evaluation for student argumentative writing: A survey</article-title>
<source>arXiv</source>
<year>2022</year>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2205.04083">https://doi.org/10.48550/arXiv.2205.04083</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref36">
<mixed-citation publication-type="journal">Zambrano Campozano, R. F. (2025). Impacto de la inteligencia artificial en la transformación digital de la educación. <italic>Serie Científica de la Universidad de las Ciencias Informáticas, 18</italic>(1), 177-192. <ext-link ext-link-type="uri" xlink:href="https://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2306-24952025000100177">https://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2306-24952025000100177</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zambrano Campozano</surname>
<given-names>R. F.</given-names>
</name>
</person-group>
<article-title>Impacto de la inteligencia artificial en la transformación digital de la educación</article-title>
<source>Serie Científica de la Universidad de las Ciencias Informáticas</source>
<year>2025</year>
<volume>18</volume>
<issue>1</issue>
<fpage>177</fpage>
<lpage>192</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2306-24952025000100177">https://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2306-24952025000100177</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref37">
<mixed-citation publication-type="journal">Zhang, S. (2021). Review of automated writing evaluation systems. <italic>Journal of China Computer-Assisted Language Learning, 1</italic>(1), 170-176. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1515/jccall-2021-2007">https://doi.org/10.1515/jccall-2021-2007</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Review of automated writing evaluation systems</article-title>
<source>Journal of China Computer-Assisted Language Learning</source>
<year>2021</year>
<volume>1</volume>
<issue>1</issue>
<fpage>170</fpage>
<lpage>176</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1515/jccall-2021-2007">https://doi.org/10.1515/jccall-2021-2007</ext-link>
</comment>
</element-citation>
</ref>
<ref id="redalyc_331483192028_ref38">
<mixed-citation publication-type="journal">Zapata Ros, M. (2024). IA generativa y ChatGPT en educación: Un reto para la evaluación y ¿una nueva pedagogía? <italic>Revista Paraguaya de Educación a Distancia (REPED), 5</italic>(1), 12-44. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.56152/reped2024-vol5num1-art2">https://doi.org/10.56152/reped2024-vol5num1-art2</ext-link>
</mixed-citation>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zapata Ros</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>IA generativa y ChatGPT en educación: Un reto para la evaluación y ¿una nueva pedagogía?</article-title>
<source>Revista Paraguaya de Educación a Distancia (REPED)</source>
<year>2024</year>
<volume>5</volume>
<issue>1</issue>
<fpage>12</fpage>
<lpage>44</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.56152/reped2024-vol5num1-art2">https://doi.org/10.56152/reped2024-vol5num1-art2</ext-link>
</comment>
</element-citation>
</ref>
</ref-list>
<fn-group>
<title>Notes</title>
<fn id="fn5" fn-type="other">
<label>1</label>
<p>To quantify the differences between models, the symbol Δ (delta) was used, which is conventionally employed in statistics to indicate the difference between two values. Thus, Δ = Macro-F1(fine-tuning) − Macro-F1(zero-shot) ≈ 0.77 means that the difference in Macro-F1 between the experiments being compared is approximately 0.77.</p>
</fn>
</fn-group>
<app-group>
<app id="app1">
<title>
<bold>APPENDICES</bold>
</title>
<sec>
<title>
<bold>Appendix 1. Complete examples of model responses.</bold>
</title>
<p>Corpus text:</p>
<p>“Hola Eva.</p>
<p>Que hace dos meses, hemos estudiado espanol en madrid.</p>
<p>He vuelt a Japon la semana pasada.</p>
<p>Glacias por tu ayuda.</p>
<p>Yo escribo mi familia, mi mujer, hija y hijo.</p>
<p>Mi mujer trabaya en Idioma.</p>
<p>Hija trabaya en una tienda de palo, hijo en un enterprise electrolico.</p>
<p>Vivimos en yokohama cerca de tokyo.</p>
<p>Vamos a ir a madrid que mes viene</p>
<p>Por favor, vernos en madrid.</p>
<p>Hasta plonto.” (<xref ref-type="bibr" rid="redalyc_331483192028_ref32">CAES, 2022</xref>)</p>
<p>Model response:</p>
<p>Errores encontrados en el texto:</p>
<p>- "Que hace dos meses, hemos estudiado espanol en madrid." La forma correcta sería "Hace dos meses, estudiamos español en Madrid."</p>
<p>- "He vuelt a Japon la semana pasada." La forma correcta sería "He vuelto a Japón la semana pasada."</p>
<p>- "Glacias por tu ayuda." Debería ser "Gracias por tu ayuda."</p>
<p>- "Yo escribo mi familia, mi mujer, hija y hijo." Se debería decir "Te escribo sobre mi familia: mi mujer, mi hija y mi hijo."</p>
<p>- "Mi mujer trabaya en Idioma." La forma correcta sería "Mi mujer trabaja en Idiomas."</p>
<p>- "Hija trabaya en una tienda de palo, hijo en un enterprise electrolico." Debería ser "Mi hija trabaja en una tienda de ropa, y mi hijo en una empresa electrónica."</p>
<p>- "Vamos a ir a madrid que mes viene." La forma correcta sería "Vamos a ir a Madrid el mes que viene."</p>
<p>- "Hasta plonto." Debería ser "Hasta pronto."</p>
<p>NIVEL: A2 (<xref ref-type="bibr" rid="redalyc_331483192028_ref26">OpenAI, 2022</xref>)</p>
<p>—</p>
<p>Corpus text:</p>
<p>“El 24 de Noviembre</p>
<p>Estimados señores</p>
<p>El 22 de noviembre pasado, viajé con Tonta Airlines de Badaros a Madrid con el vuelo TA 222.</p>
<p>Estoy aquí en Madrid hasta el 27 para algunas citas profesionales.</p>
<p>Cuando embarqué en Badaros, la azafata me dijo que mi maleta estaba demasiado grande y que tenia que facturarla.</p>
<p>La maleta esta de marca Samsonite, de modelo XYZ1234, color negra, con mis nombre y dirección muy visibles.</p>
<p>Cuando llegamos a Madrid, la maleta no fue encontrada.</p>
<p>Hace a dos dias que estoy esperando pero nadie de su compañía me ha llamado con información.</p>
<p>Evidentemente, no puedo quedar sin cambiarme o lavarme me.</p>
<p>Entonces compré algunas cosas y me parece que Tonta Airlines tenga que indemnizarme.</p>
<p>Yo incluido la lista de mis compras y les pide que se pongan en contacto para decirme si sean de acuerdo de pagar el 50% des estas compras.</p>
<p>Se pueden contactarme por correo electronico o llamarme en el movil 660 12 34 45 y evidentemente, en cualquier caso que Tonta Airlines encuentre mi maleta, me daría mucha alegria.</p>
<p>Un saludo” (<xref ref-type="bibr" rid="redalyc_331483192028_ref32">CAES, 2022</xref>)</p>
<p>Model response:</p>
<p>El texto presenta varios errores gramaticales y algunos elementos propios de un nivel de español como lengua extranjera B1:</p>
<p>- Uso incorrecto de tiempos verbales: "tenia" en lugar de "tenía".</p>
<p>- Errores en el uso de los artículos: "la maleta esta" en lugar de "la maleta está".</p>
<p>- Uso incorrecto de pronombres: "Hace a dos días" en lugar de "Hace dos días".</p>
<p>- Errores de concordancia de género y número: "me parece que Tonta Airlines tenga que indemnizarme" debería ser "me parece que Tonta Airlines debe indemnizarme".</p>
<p>NIVEL: B1 (<xref ref-type="bibr" rid="redalyc_331483192028_ref26">OpenAI, 2022</xref>)</p>
</sec>
<sec>
<title>
<bold>Appendix 2. English version prompt.</bold>
</title>
<p>You are a linguist expert specializing in teaching Spanish as a foreign language. Your task is to indicate the level of Spanish as a foreign language in the texts, following the Instituto Cervantes Curriculum Plan.</p>
<p>Here is a description of the different levels.</p>
<p>Levels A1 and A2 Basic transactions related to their environment.</p>
<p>A1: Limited vocabulary, very short and simple texts, an average of 10 words per sentence. Regular forms of the present indicative.</p>
<p>A2: Short texts with simple information, an average of 12 words per sentence. Past indicative tenses: present perfect, imperfect, and indefinite. Irregular forms of the present indicative. Affirmative imperative.</p>
<p>Levels B1 and B2: ability to handle texts on topics of interest, tastes, and preferences.</p>
<p>B1: Extensive but simple vocabulary, writing texts with a specific task. Present indicative, past perfect, imperfect, and indefinite indicative, simple future, simple conditional, past perfect indicative, present subjunctive. Negative imperative.</p>
<p>B2: Extensive linguistic repertoire, subordinate clauses. Indicative verb tenses: present, past perfect, imperfect, indefinite, simple and compound future, simple and compound conditional, past perfect. Subjunctive verb tenses: present, imperfect, past perfect and past perfect.</p>
<p>C1 transactions of all kinds. They have a sufficiently broad and rich repertoire of linguistic and non-linguistic resources. They can deal with a wide range of long and complex texts. All indicative and subjunctive verb tenses: present, past perfect, imperfect, and pluperfect.</p>
<p>Now you are going to receive a TEXT. Taking into account the above and the grammatical errors, indicate at the end of your answer with the label ‘LEVEL:’ the level of the TEXT (A1, A2, B1, B2, or C1).</p>
<p>TEXT: “...”</p>
</sec>
</app>
</app-group>
</back>
</article>