Authentic Mathematics Assessment Using an Integrated Deep Learning and Adiwiyata Testlet Model for Elementary Schools and Madrasah Ibtidaiyah
DOI:
https://doi.org/10.31538/nzh.v9i1.418Keywords:
Adiwiyata, Authentic Mathematics Assessment, Deep Learning, Islamic Education, TestletAbstract
This study aimed to develop an authentic mathematics assessment for elementary schools in the form of a testlet-based instrument integrated with deep learning principles and the Adiwiyata context within the framework of transformative Islamic education in the Special Region of Yogyakarta. Employing basic research with an embedded mixed-methods design, the development process adapted the Plomp model and the instrument development framework of Oreondo and Antonio. Data were collected through teacher needs surveys, expert validation, readability testing, and field trials involving 462 fifth-grade students from elementary schools and Islamic elementary schools. Quantitative analyses included Content Validity Index (CVI), Aiken’s V, Cronbach’s Alpha, Classical Test Theory (CTT), and Item Response Theory (IRT) using the 2-PL Graded Response Model. The results indicate that the developed instruments meet acceptable psychometric standards, with Aiken’s V values ranging from 0.75 to 1.00 and high internal consistency for both the testlet instrument (Cronbach’s Alpha = 0.845) and the environmental awareness questionnaire (Cronbach’s Alpha = 0.850). Item analysis shows adequate discrimination and a structured progression of difficulty, although one item exhibited low discrimination in the 2-PL GRM, highlighting the importance of IRT-based diagnostics for testlet refinement. Descriptive findings reveal that students demonstrate high levels of environmental awareness, particularly in the knowledge and attitude dimensions, while mathematical achievement remains low on non-routine items. Correlation analysis shows no significant relationship between environmental awareness and mathematical ability. Methodologically, this study contributes a validated and contextually grounded assessment framework that integrates expert judgment, reliability analysis, and complementary CTT–IRT procedures. Theoretically, the findings reconceptualize authentic assessment as a diagnostic bridge rather than a direct causal link between affective values and cognitive performance, demonstrating that environmental concern functions as a potential cognitive resource only when explicitly activated within mathematical tasks.
Downloads
References
Acharya, B. R. (2017). Factors affecting difficulties in learning mathematics by mathematics learners. International Journal of Elementary Education, 6(2), 8–15. https://doi.org/10.11648/j.ijeedu.20170602.11
Achmad, G. H., & Prastowo, A. (2022). Authentic assessment techniques on cognitive aspects in Islamic religious education learning at elementary school level. Jurnal Ilmiah Sekolah Dasar, 6(1), 75–84. https://doi.org/10.23887/jisd.v6i1.43470
Ahdhianto, E., & Santi, N. N. (2020). The Effect of Metacognitive-Based Contextual Learning Model on Fifth-Grade Students’ Problem-Solving and Mathematical Communication Skills. European Journal of Educational Research, 9(2), 753–764. https://doi.org/10.12973/eu-jer.9.2.753
Aiken, L. R. (1985). Three coefficients for analyzing the reliability and validity of ratings. Educational and Psychological Measurement, 45(1), 131–142. https://doi.org/10.1177/0013164485451012
Alqarni, A. M. (2019). A Design for Comparing CTT and IRT in Test Assembly, Scoring, and Argumentation: Differences Among Reliability, Information, and Validation. I-Manager’s Journal on Educational Psychology, 13(2), 1. https://doi.org/10.26634/jpsy.13.2.16084
Araújo, D., Davids, K., & Renshaw, I. (2020). Cognition, emotion and action in sport: an ecological dynamics perspective. In Handbook of sport psychology (pp. 535–555). Wiley Online Library. https://doi.org/10.1002/9781119568124.ch25
Bates, A. (2021). Moral emotions and human interdependence in character education: Beyond the one-dimensional self. Routledge.
Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability (Formerly: Journal of Personnel Evaluation in Education), 21(1), 5–31. https://doi.org/10.1007/s11092-008-9068-5
Bock, R. D., & Gibbons, R. D. (2021). Item response theory. John Wiley & Sons.
Brookhart, S. M. (2018). Appropriate criteria: Key to effective rubrics. Frontiers in Education, 3, 22. https://doi.org/10.3389/feduc.2018.00022
Darling-Hammond, L., & Hyler, M. E. (2020). Preparing educators for the time of COVID… and beyond. European Journal of Teacher Education, 43(4), 457–465. https://doi.org/10.1080/02619768.2020.1816961
de Ayala, R. J. (2022). The Theory and Practice of Item Response Theory. Guilford Publications.
DeMars, C. E. (2018). Classical test theory and item response theory. In The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development (pp. 49–73). Wiley Online Library. https://doi.org/10.1002/9781118489772.ch2
DeVellis, R. F. (2006). Classical test theory. Medical Care, 44(11), S50–S59. https://doi.org/10.1097/01.mlr.0000245426.10853.30
Gravett, K. (2025). Authentic assessment as relational pedagogy. Teaching in Higher Education, 30(3), 608–622. https://doi.org/10.1080/13562517.2024.2380997
Griffin, P. (2017). Assessment for teaching (Second). Cambridge University Press.
Guo, L., Zhou, W., & Li, X. (2024). Cognitive Diagnosis Testlet Model for Multiple-Choice Items. Journal of Educational and Behavioral Statistics, 49(1), 32–60. https://doi.org/10.3102/10769986231165622
Hagger, M. S., Cheung, M. W.-L., Ajzen, I., & Hamilton, K. (2022). Perceived behavioral control moderating effects in the theory of planned behavior: A meta-analysis. Health Psychology, 41(2), 155. https://doi.org/10.1037/hea0001153
Hagger, M. S., & Hamilton, K. (2019). Grit and self‐discipline as predictors of effort and academic attainment. British Journal of Educational Psychology, 89(2), 324–342. https://doi.org/10.1111/bjep.12241
Hajar, A. (2024). Transforming Islamic Education for Environmental and Social Sustainability. Sinergi International Journal of Islamic Studies, 2(2), 82–95. https://doi.org/10.61194/ijis.v2i2.601
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). Sage.
Hernandez-Martinez, P., & Vos, P. (2018). “Why do I have to learn this?” A case study on students’ experiences of the relevance of mathematical modelling activities. ZDM, 50(1), 245–257. https://doi.org/10.1007/s11858-017-0904-2
Holland, J., & Stevens, N. (2021). Guidelines for the development of multiple choice items & assessments. Royal College of Surgeons in Ireland. https://doi.org/10.25419/rcsi.13947164.v1
Jiang, R. (2022). Understanding, Investigating, and promoting deep learning in language education: A survey on chinese college students’ deep learning in the online EFL teaching context. Frontiers in Psychology, 13, 955565. https://doi.org/10.3389/fpsyg.2022.955565
Kablan, Z., & Uğur, S. S. (2021). The relationship between routine and non-routine problem solving and learning styles. Educational Studies, 47(3), 328–343. https://doi.org/10.1080/03055698.2019.1701993
Koh, K. H. (2017). Authentic assessment. In Oxford research encyclopedia of education.
Krajcik, J., & Shin, N. (2023). Student conceptions, conceptual change, and learning progressions. In Handbook of research on science education (pp. 121–157). Routledge.
Li, J.-B., Bi, S.-S., Willems, Y. E., & Finkenauer, C. (2021). The association between school discipline and self-control from preschoolers to high school students: a three-level meta-analysis. Review of Educational Research, 91(1), 73–111. https://doi.org/10.3102/0034654320979160
Lickona, T. (1996). Eleven principles of effective character education. Journal of Moral Education, 25(1), 93–100. https://doi.org/10.1080/0305724960250110
Lin, C.-L. (2018). The development of an instrument to measure the project competences of college students in online project-based learning. Journal of Science Education and Technology, 27(1), 57–69. https://doi.org/10.1007/s10956-017-9708-y
Lukitasari, M., Hasan, R., Sukri, A., & Handhika, J. (2021). Developing Student’s Metacognitive Ability in Science through Project-Based Learning with E-Portfolio. International Journal of Evaluation and Research in Education, 10(3), 948–955. https://doi.org/10.11591/ijere.v10i3.21370
Ma, W., Wang, C., & Xiao, J. (2023). A testlet diagnostic classification model with attribute hierarchies. Applied Psychological Measurement, 47(3), 183–199. https://doi.org/10.1177/01466216231165315
Maxwell, A. E., Warner, T. A., & Guillén, L. A. (2021). Accuracy assessment in convolutional neural network-based deep learning remote sensing studies—Part 1: Literature review. Remote Sensing, 13(13), 2450. https://doi.org/10.3390/rs13132450
Morris, R., Perry, T., & Wardle, L. (2021). Formative assessment and feedback for learning in higher education: A systematic review. Review of Education, 9(3), e3292. https://doi.org/10.1002/rev3.3292
Morse, J. M. (2016). Mixed method design: Principles and procedures. Routledge.
Moss, C. M., & Brookhart, S. M. (2019). Advancing formative assessment in every classroom: A guide for instructional leaders. ASCD.
Mundofi, A. A. (2025). Integration of Deep Learning Approach in Transforming Islamic Religious Education Learning in Schools: A Pedagogical and Technological Study. Journal of Asian Primary Education (JoAPE), 2(1), 79–90. https://doi.org/10.59966/joape.v2i1.1787
Olfos, R., & Zulantay, H. (2007). Reliability and validity of authentic assessment in a web based course. Journal of Educational Technology & Society, 10(4), 156–173. https://www.jstor.org/stable/jeductechsoci.10.4.156
Oreondo, L. L., & Antonio, E. M. D. (1984). Evaluating Educational Outcomes. Rex Book Store. https://books.google.co.id/books?id=8xNMBn4bn8oC
Plomp, T. (2013). Educational design research: An introduction. In T. Plomp & N. Nieveen (Eds.), Educational design research (Vol. 1).
Prayitno, S. H., & Jaedun, M. P. D. (2018). Authentic assessment competence of building construction teachers in indonesian vocational schools. Journal of Technical Education and Training, 10(1). https://doi.org/10.30880/jtet.2018.10.01.008
Rahman, N. A. (2025). Competency-Based and Ethical Assessment Models in Contemporary Islamic Pedagogy. Sinergi International Journal of Islamic Studies, 3(1), 57–69. https://doi.org/doi.org/10.61194/ijis.v3i1.710
Saadah, L., Rusnaini, R., & Muchtarom, M. (2023). The internalization of school environmental care through Adiwiyata program. Jurnal Civics: Media Kajian Kewarganegaraan, 20(2), 205–213. https://doi.org/10.21831/jc.v20i2.56549
Sabri, M., & Retnawati, H. (2019). The implementation of authentic assessment in mathematics learning. Journal of Physics: Conference Series, 1200(1), 12006. https://doi.org/10.1088/1742-6596/1200/1/012006
Sachdeva, S., & Eggen, P.-O. (2021). Learners’ critical thinking about learning mathematics. International Electronic Journal of Mathematics Education, 16(3), em0644. https://doi.org/10.29333/iejme/11003
Safitri, D. I., Mudzanata, M., & Putri, A. D. S. (2020). The Implementation of Authentic Assessment in Thematic Learning in Elementary Schools. International Journal of Elementary Education, 4(2), 255–260. https://doi.org/doi.org/10.23887/ijee.v4i2.25551
Sahin, A. (2018). Critical issues in Islamic education studies: Rethinking Islamic and Western liberal secular values of education. Religions, 9(11), 335. https://doi.org/10.3390/rel9110335
Schmucker, R., & Moore, S. (2025). The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory. ArXiv Preprint ArXiv:2503.10533. https://doi.org/10.48550/arXiv.2503.10533
Seah, R., & Horne, M. (2020). The construction and validation of a geometric reasoning test item to support the development of learning progression. Mathematics Education Research Journal, 32(4), 607–628. https://doi.org/10.1007/s13394-019-00273-2
Selvaraj, A. M., Azman, H., & Wahi, W. (2021). Teachers’ Feedback Practice and Students’ Academic Achievement: A Systematic Literature Review. International Journal of Learning, Teaching and Educational Research, 20(1), 308–322. https://doi.org/10.26803/ijlter.20.1.17
Sewagegn, A. A. (2020). Learning Objective and Assessment Linkage: Its Contribution to Meaningful Student Learning. Universal Journal of Educational Research, 8(11), 5044–5052. https://doi.org/10.13189/ujer.2020.081104
Tavakol, M., & Wetzel, A. (2020). Factor Analysis: a means for theory and instrument development in support of construct validity. International Journal of Medical Education, 11, 245. https://doi.org/10.5116/ijme.5f96.0f4a
Torshizi, M. D., & Bahraman, M. (2019). I explain, therefore I learn: Improving students’ assessment literacy and deep learning by teaching. Studies in Educational Evaluation, 61, 66–73. https://doi.org/10.1016/j.stueduc.2019.03.002
Triyandana, A., Ibrohim, I., Yanuwiyadi, B., Amin, M., & Hajar, M. U. (2024). Strategies to Enhance Eco-Friendly Culture and Environmental Awareness by Green Curriculum Integration in Indonesian Elementary Science Classroom. International Electronic Journal of Elementary Education, 17(1), 217–232. https://doi.org/10.26822/iejee.2024.374
Utaya, S., & Wafaretta, V. (2021). The vision, mission, and implementation of environmental education of adiwiyata elementary school in Malang City. IOP Conference Series: Earth and Environmental Science, 802(1), 12048. https://doi.org/10.1088/1755-1315/802/1/012048
Vos, P. (2018). “How real people really need mathematics in the real world”—Authenticity in mathematics education. Education Sciences, 8(4), 195. https://doi.org/10.3390/educsci8040195
Wijaya, A., Van den Heuvel-Panhuizen, M., Doorman, M., & Veldhuis, M. (2018). Opportunity-to-learn to solve context-based mathematics tasks and students’ performance in solving these tasks–lessons from Indonesia. https://doi.org/10.29333/ejmste/93420
Zhang, D., Wang, C., Yuan, T., Li, X., Yang, L., Huang, A., Li, J., Liu, M., Lei, Y., & Sun, L. (2023). Psychometric properties of the Coronavirus Anxiety Scale based on Classical Test Theory (CTT) and Item Response Theory (IRT) models among Chinese front-line healthcare workers. BMC Psychology, 11(1), 224. https://doi.org/10.1186/s40359-023-01251-x
Zhang, J.-L. (2020). The application of human comprehensive development theory and deep learning in innovation education in higher education. Frontiers in Psychology, 11, 1605. https://doi.org/10.3389/fpsyg.2020.01605
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Syukrul Hamdi, Lia Yuliana, Anisa Dwi Oktarina, Kana Hidayati, Elly Arliyani, Nurul Mu'minin Mz

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





