Authentic Mathematics Assessment Using an Integrated Deep Learning and Adiwiyata Testlet Model for Elementary Schools and Madrasah Ibtidaiyah

Authors

  • Syukrul Hamdi Universitas Negeri Yogyakarta
  • Lia Yuliana Universitas Negeri Yogyakarta
  • Anisa Dwi Oktarina Universitas Negeri Yogyakarta
  • Kana Hidayati Universitas Negeri Yogyakarta
  • Elly Arliani Universitas Negeri Yogyakarta
  • Nurul Mu'minin Mz Sekolah Dasar Negeri Sosrowijayan, Yogyakarta

DOI:

https://doi.org/10.31538/nzh.v9i1.418

Keywords:

Adiwiyata, Authentic Mathematics Assessment, Deep Learning, Islamic Education, Testlet

Abstract

This study aimed to develop an authentic mathematics assessment for elementary schools in the form of a testlet-based instrument integrated with deep learning principles and the Adiwiyata context within the framework of transformative Islamic education in the Special Region of Yogyakarta. Employing basic research with an embedded mixed-methods design, the development process adapted the Plomp model and the instrument development framework of Oreondo and Antonio. Data were collected through teacher needs surveys, expert validation, readability testing, and field trials involving 462 fifth-grade students from elementary schools and Islamic elementary schools. Quantitative analyses included Content Validity Index (CVI), Aiken’s V, Cronbach’s Alpha, Classical Test Theory (CTT), and Item Response Theory (IRT) using the 2-PL Graded Response Model. The results indicate that the developed instruments meet acceptable psychometric standards, with Aiken’s V values ranging from 0.75 to 1.00 and high internal consistency for both the testlet instrument (Cronbach’s Alpha = 0.845) and the environmental awareness questionnaire (Cronbach’s Alpha = 0.850). Item analysis shows adequate discrimination and a structured progression of difficulty, although one item exhibited low discrimination in the 2-PL GRM, highlighting the importance of IRT-based diagnostics for testlet refinement. Descriptive findings reveal that students demonstrate high levels of environmental awareness, particularly in the knowledge and attitude dimensions, while mathematical achievement remains low on non-routine items. Correlation analysis shows no significant relationship between environmental awareness and mathematical ability. Methodologically, this study contributes a validated and contextually grounded assessment framework that integrates expert judgment, reliability analysis, and complementary CTT–IRT procedures. Theoretically, the findings reconceptualize authentic assessment as a diagnostic bridge rather than a direct causal link between affective values and cognitive performance, demonstrating that environmental concern functions as a potential cognitive resource only when explicitly activated within mathematical tasks.

Downloads

Download data is not yet available.

References

Acharya, B. R. (2017). Factors affecting difficulties in learning mathematics by mathematics learners. International Journal of Elementary Education, 6(2), 8–15. https://doi.org/10.11648/j.ijeedu.20170602.11

Achmad, G. H., & Prastowo, A. (2022). Authentic assessment techniques on cognitive aspects in Islamic religious education learning at elementary school level. Jurnal Ilmiah Sekolah Dasar, 6(1), 75–84. https://doi.org/10.23887/jisd.v6i1.43470

Ahdhianto, E., & Santi, N. N. (2020). The Effect of Metacognitive-Based Contextual Learning Model on Fifth-Grade Students’ Problem-Solving and Mathematical Communication Skills. European Journal of Educational Research, 9(2), 753–764. https://doi.org/10.12973/eu-jer.9.2.753

Aiken, L. R. (1985). Three coefficients for analyzing the reliability and validity of ratings. Educational and Psychological Measurement, 45(1), 131–142. https://doi.org/10.1177/0013164485451012

Alqarni, A. M. (2019). A Design for Comparing CTT and IRT in Test Assembly, Scoring, and Argumentation: Differences Among Reliability, Information, and Validation. I-Manager’s Journal on Educational Psychology, 13(2), 1. https://doi.org/10.26634/jpsy.13.2.16084

Araújo, D., Davids, K., & Renshaw, I. (2020). Cognition, emotion and action in sport: an ecological dynamics perspective. In Handbook of sport psychology (pp. 535–555). Wiley Online Library. https://doi.org/10.1002/9781119568124.ch25

Bates, A. (2021). Moral emotions and human interdependence in character education: Beyond the one-dimensional self. Routledge.

Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability (Formerly: Journal of Personnel Evaluation in Education), 21(1), 5–31. https://doi.org/10.1007/s11092-008-9068-5

Bock, R. D., & Gibbons, R. D. (2021). Item response theory. John Wiley & Sons.

Brookhart, S. M. (2018). Appropriate criteria: Key to effective rubrics. Frontiers in Education, 3, 22. https://doi.org/10.3389/feduc.2018.00022

Darling-Hammond, L., & Hyler, M. E. (2020). Preparing educators for the time of COVID… and beyond. European Journal of Teacher Education, 43(4), 457–465. https://doi.org/10.1080/02619768.2020.1816961

de Ayala, R. J. (2022). The Theory and Practice of Item Response Theory. Guilford Publications.

DeMars, C. E. (2018). Classical test theory and item response theory. In The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development (pp. 49–73). Wiley Online Library. https://doi.org/10.1002/9781118489772.ch2

DeVellis, R. F. (2006). Classical test theory. Medical Care, 44(11), S50–S59. https://doi.org/10.1097/01.mlr.0000245426.10853.30

Gravett, K. (2025). Authentic assessment as relational pedagogy. Teaching in Higher Education, 30(3), 608–622. https://doi.org/10.1080/13562517.2024.2380997

Griffin, P. (2017). Assessment for teaching (Second). Cambridge University Press.

Guo, L., Zhou, W., & Li, X. (2024). Cognitive Diagnosis Testlet Model for Multiple-Choice Items. Journal of Educational and Behavioral Statistics, 49(1), 32–60. https://doi.org/10.3102/10769986231165622

Hagger, M. S., Cheung, M. W.-L., Ajzen, I., & Hamilton, K. (2022). Perceived behavioral control moderating effects in the theory of planned behavior: A meta-analysis. Health Psychology, 41(2), 155. https://doi.org/10.1037/hea0001153

Hagger, M. S., & Hamilton, K. (2019). Grit and self‐discipline as predictors of effort and academic attainment. British Journal of Educational Psychology, 89(2), 324–342. https://doi.org/10.1111/bjep.12241

Hajar, A. (2024). Transforming Islamic Education for Environmental and Social Sustainability. Sinergi International Journal of Islamic Studies, 2(2), 82–95. https://doi.org/10.61194/ijis.v2i2.601

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). Sage.

Hernandez-Martinez, P., & Vos, P. (2018). “Why do I have to learn this?” A case study on students’ experiences of the relevance of mathematical modelling activities. ZDM, 50(1), 245–257. https://doi.org/10.1007/s11858-017-0904-2

Holland, J., & Stevens, N. (2021). Guidelines for the development of multiple choice items & assessments. Royal College of Surgeons in Ireland. https://doi.org/10.25419/rcsi.13947164.v1

Jiang, R. (2022). Understanding, Investigating, and promoting deep learning in language education: A survey on chinese college students’ deep learning in the online EFL teaching context. Frontiers in Psychology, 13, 955565. https://doi.org/10.3389/fpsyg.2022.955565

Kablan, Z., & Uğur, S. S. (2021). The relationship between routine and non-routine problem solving and learning styles. Educational Studies, 47(3), 328–343. https://doi.org/10.1080/03055698.2019.1701993

Koh, K. H. (2017). Authentic assessment. In Oxford research encyclopedia of education.

Krajcik, J., & Shin, N. (2023). Student conceptions, conceptual change, and learning progressions. In Handbook of research on science education (pp. 121–157). Routledge.

Li, J.-B., Bi, S.-S., Willems, Y. E., & Finkenauer, C. (2021). The association between school discipline and self-control from preschoolers to high school students: a three-level meta-analysis. Review of Educational Research, 91(1), 73–111. https://doi.org/10.3102/0034654320979160

Lickona, T. (1996). Eleven principles of effective character education. Journal of Moral Education, 25(1), 93–100. https://doi.org/10.1080/0305724960250110

Lin, C.-L. (2018). The development of an instrument to measure the project competences of college students in online project-based learning. Journal of Science Education and Technology, 27(1), 57–69. https://doi.org/10.1007/s10956-017-9708-y

Lukitasari, M., Hasan, R., Sukri, A., & Handhika, J. (2021). Developing Student’s Metacognitive Ability in Science through Project-Based Learning with E-Portfolio. International Journal of Evaluation and Research in Education, 10(3), 948–955. https://doi.org/10.11591/ijere.v10i3.21370

Ma, W., Wang, C., & Xiao, J. (2023). A testlet diagnostic classification model with attribute hierarchies. Applied Psychological Measurement, 47(3), 183–199. https://doi.org/10.1177/01466216231165315

Maxwell, A. E., Warner, T. A., & Guillén, L. A. (2021). Accuracy assessment in convolutional neural network-based deep learning remote sensing studies—Part 1: Literature review. Remote Sensing, 13(13), 2450. https://doi.org/10.3390/rs13132450

Morris, R., Perry, T., & Wardle, L. (2021). Formative assessment and feedback for learning in higher education: A systematic review. Review of Education, 9(3), e3292. https://doi.org/10.1002/rev3.3292

Morse, J. M. (2016). Mixed method design: Principles and procedures. Routledge.

Moss, C. M., & Brookhart, S. M. (2019). Advancing formative assessment in every classroom: A guide for instructional leaders. ASCD.

Mundofi, A. A. (2025). Integration of Deep Learning Approach in Transforming Islamic Religious Education Learning in Schools: A Pedagogical and Technological Study. Journal of Asian Primary Education (JoAPE), 2(1), 79–90. https://doi.org/10.59966/joape.v2i1.1787

Olfos, R., & Zulantay, H. (2007). Reliability and validity of authentic assessment in a web based course. Journal of Educational Technology & Society, 10(4), 156–173. https://www.jstor.org/stable/jeductechsoci.10.4.156

Oreondo, L. L., & Antonio, E. M. D. (1984). Evaluating Educational Outcomes. Rex Book Store. https://books.google.co.id/books?id=8xNMBn4bn8oC

Plomp, T. (2013). Educational design research: An introduction. In T. Plomp & N. Nieveen (Eds.), Educational design research (Vol. 1).

Prayitno, S. H., & Jaedun, M. P. D. (2018). Authentic assessment competence of building construction teachers in indonesian vocational schools. Journal of Technical Education and Training, 10(1). https://doi.org/10.30880/jtet.2018.10.01.008

Rahman, N. A. (2025). Competency-Based and Ethical Assessment Models in Contemporary Islamic Pedagogy. Sinergi International Journal of Islamic Studies, 3(1), 57–69. https://doi.org/doi.org/10.61194/ijis.v3i1.710

Saadah, L., Rusnaini, R., & Muchtarom, M. (2023). The internalization of school environmental care through Adiwiyata program. Jurnal Civics: Media Kajian Kewarganegaraan, 20(2), 205–213. https://doi.org/10.21831/jc.v20i2.56549

Sabri, M., & Retnawati, H. (2019). The implementation of authentic assessment in mathematics learning. Journal of Physics: Conference Series, 1200(1), 12006. https://doi.org/10.1088/1742-6596/1200/1/012006

Sachdeva, S., & Eggen, P.-O. (2021). Learners’ critical thinking about learning mathematics. International Electronic Journal of Mathematics Education, 16(3), em0644. https://doi.org/10.29333/iejme/11003

Safitri, D. I., Mudzanata, M., & Putri, A. D. S. (2020). The Implementation of Authentic Assessment in Thematic Learning in Elementary Schools. International Journal of Elementary Education, 4(2), 255–260. https://doi.org/doi.org/10.23887/ijee.v4i2.25551

Sahin, A. (2018). Critical issues in Islamic education studies: Rethinking Islamic and Western liberal secular values of education. Religions, 9(11), 335. https://doi.org/10.3390/rel9110335

Schmucker, R., & Moore, S. (2025). The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory. ArXiv Preprint ArXiv:2503.10533. https://doi.org/10.48550/arXiv.2503.10533

Seah, R., & Horne, M. (2020). The construction and validation of a geometric reasoning test item to support the development of learning progression. Mathematics Education Research Journal, 32(4), 607–628. https://doi.org/10.1007/s13394-019-00273-2

Selvaraj, A. M., Azman, H., & Wahi, W. (2021). Teachers’ Feedback Practice and Students’ Academic Achievement: A Systematic Literature Review. International Journal of Learning, Teaching and Educational Research, 20(1), 308–322. https://doi.org/10.26803/ijlter.20.1.17

Sewagegn, A. A. (2020). Learning Objective and Assessment Linkage: Its Contribution to Meaningful Student Learning. Universal Journal of Educational Research, 8(11), 5044–5052. https://doi.org/10.13189/ujer.2020.081104

Tavakol, M., & Wetzel, A. (2020). Factor Analysis: a means for theory and instrument development in support of construct validity. International Journal of Medical Education, 11, 245. https://doi.org/10.5116/ijme.5f96.0f4a

Torshizi, M. D., & Bahraman, M. (2019). I explain, therefore I learn: Improving students’ assessment literacy and deep learning by teaching. Studies in Educational Evaluation, 61, 66–73. https://doi.org/10.1016/j.stueduc.2019.03.002

Triyandana, A., Ibrohim, I., Yanuwiyadi, B., Amin, M., & Hajar, M. U. (2024). Strategies to Enhance Eco-Friendly Culture and Environmental Awareness by Green Curriculum Integration in Indonesian Elementary Science Classroom. International Electronic Journal of Elementary Education, 17(1), 217–232. https://doi.org/10.26822/iejee.2024.374

Utaya, S., & Wafaretta, V. (2021). The vision, mission, and implementation of environmental education of adiwiyata elementary school in Malang City. IOP Conference Series: Earth and Environmental Science, 802(1), 12048. https://doi.org/10.1088/1755-1315/802/1/012048

Vos, P. (2018). “How real people really need mathematics in the real world”—Authenticity in mathematics education. Education Sciences, 8(4), 195. https://doi.org/10.3390/educsci8040195

Wijaya, A., Van den Heuvel-Panhuizen, M., Doorman, M., & Veldhuis, M. (2018). Opportunity-to-learn to solve context-based mathematics tasks and students’ performance in solving these tasks–lessons from Indonesia. https://doi.org/10.29333/ejmste/93420

Zhang, D., Wang, C., Yuan, T., Li, X., Yang, L., Huang, A., Li, J., Liu, M., Lei, Y., & Sun, L. (2023). Psychometric properties of the Coronavirus Anxiety Scale based on Classical Test Theory (CTT) and Item Response Theory (IRT) models among Chinese front-line healthcare workers. BMC Psychology, 11(1), 224. https://doi.org/10.1186/s40359-023-01251-x

Zhang, J.-L. (2020). The application of human comprehensive development theory and deep learning in innovation education in higher education. Frontiers in Psychology, 11, 1605. https://doi.org/10.3389/fpsyg.2020.01605

Downloads

Published

2026-02-07

How to Cite

Hamdi, S., Yuliana, L., Oktarina, A. D., Hidayati, K., Arliani, E., & Mu'minin Mz, N. (2026). Authentic Mathematics Assessment Using an Integrated Deep Learning and Adiwiyata Testlet Model for Elementary Schools and Madrasah Ibtidaiyah. Nazhruna: Jurnal Pendidikan Islam, 9(1), 187–206. https://doi.org/10.31538/nzh.v9i1.418

Similar Articles

<< < 1 2 3 4 5 6 7 > >> 

You may also start an advanced similarity search for this article.