In the current post-pandemic landscape, discussions surrounding educational and assessment policies increasingly center on determining the scope and depth of advancing digitalization, recognizing that assessment organizations typically prioritize integrity initially due to limited resources and public pressure, and gradually evolving toward improving assessment quality and personalization over time. With evolutionary mindset, assessment organizations could potentially be classified into categories reflecting distinct primary focuses on integrity (fairness and security), quality, or personalization in their selection of test delivery models within the assessment cycle.
In this article, the term 'test delivery model' describes specific approaches in computerized testing such as randomly shuffling items within a test (linear test delivery model), organizing items into subsets based on type or difficulty (testlet-based test delivery model), or dynamically selecting items directly from an item bank through a method known as Linear-On-The-Fly Testing (LOFT) and its variations, and adaptive testing with its own distinct methods as a separate approach. Carefully balancing decision-making across each aspect of the chosen test delivery model can help create a secure, valid, and student-centered testing environment within a jurisdiction's education system.
This article presents a simplified overview of how assessment organizations can strategically choose or balance integrity, quality, and personalization with choices on test delivery models.
To support an evolving culture of decision-making regarding test delivery models, particularly in the context of digitizing the assessment cycle and advancing the functionalities of digital platforms, a digital review could help determine the appropriate scope and depth of the reform. Read more about how to build a roadmap for the modernizing assessment cycle. 
The first aspect to be discussed, integrity, relates to having a digital platform and procedures in place to prevent item exposure during the implementation of the assessment cycle, a topic broadly discussed in the article Securing the Assessment Cycle. The quality aspect involved in choosing a test delivery model focuses primarily on supporting the validity and reliability of assessments. The final aspect, personalization, focuses on student-centeredness by prioritizing assessments that are user-friendly and adaptable to individual student needs. The following table illustrates how the classic test delivery models: Linear, Testlet-based, Linear-On-The-Fly Testing (LOFT), and Adaptive Testing impact assessment integrity, quality, and personalization:
Test Delivery Model Impact Table
| Test Delivery Model | Impact on Integrity | Impact on Quality | Impact on Personalization | 
|---|---|---|---|
| Linear Test (Randomly shuffling items within a test) | Reduces exposure risk by varying item order across test-takers. | Maintains consistency in item difficulty and content coverage. | Limited; provides uniform content for all students. | 
| Testlet-based Test (Items organized into subsets by type or difficulty) | Improves security by minimizing predictable patterns, reducing exposure. | Improves reliability by controlling difficulty and content distribution within subsets. | Moderate; allows targeted assessment based on student ability groups or content categories. | 
| Linear-On-The-Fly Test (LOFT) or Testlet-based LOFT (tLOFT) (Dynamic item selection directly from an item bank) | Increases integrity by dynamically assembling unique tests for each student, with LOFT selecting individual items and tLOFT selecting structured testlets, minimizing predictability and item exposure. | LOFT maximizes adaptability through individual item selection, while tLOFT balances adaptability with structured content via predefined testlets. | High; LOFT provides highly individualized assessments, whereas tLOFT allows personalization within structured content clusters. | 
| Adaptive Test (Items selected based on test-taker responses in real-time) | Further improves integrity through unique, responsive item selection. | Greatly improves accuracy, reliability, and validity by adapting to test-taker ability. | Very high; precisely tailors item difficulty and content to individual student performance and needs. | 
Considering the evolutionary phase of each test delivery model, their impact varies, improving specific aspects of assessment practices and outcomes depending on the jurisdiction's context and priorities. Thus, an assessment organization's choice of delivery model can indicate which assessment focus is prioritized as an urgency, and the type of assessment culture being promoted within a jurisdiction's assessment system.
The integration of AI into test delivery models improves item selection, test assembly, and real-time adaptability through machine learning algorithms that predict student performance, automated scoring for faster evaluations, and predictive analytics to dynamically adjust test difficulty. These advancements support the evolution of linear, multistage, and fully adaptive test delivery models, establishing a new benchmark for precision, personalization, and scalability in educational assessment.
In practice, accurate and student ability-based assessments will emerge from a new test delivery model called On-the-Fly Assembled Multistage Computer Adaptive Testing (Li et al., 2025), which integrates adaptive testing principles, multidimensional analysis using Multidimensional Item Response Theory (MIRT), and the on-the-fly mode. This test delivery model would enable assessment organizations to dynamically assemble tests in real time and utilize multidimensional computing models to assess multiple abilities simultaneously, improving measurement precision, test security, and efficiency.
Hopefully, decision-makers in assessment organizations and educational ministries will be better informed about the uses of each test delivery model and the message it conveys to the industry and stakeholders, ensuring that their choice of technical solutions effectively balances integrity, quality, and personalization in the education system. This balancing act, or the decision to prioritize one aspect over others, is shaped by contextual factors that may justify the choice without judgment. However, understanding all technical options, including advanced test delivery models, enables organizations to develop a strategic growth roadmap while learning and applying technical solutions as a key asset in balancing priorities.
Vali Huseyn is an educational assessment specialist, recognized for his expertise in development projects of various aspects of the assessment cycle. His capability to advise on the improvement of assessment delivery models, administration of different levels of assessments, innovation within data analytics, and creation of quick, secure reporting techniques sets him apart in the field. His work, expanded by collaborations with leading assessment technology firms and certification bodies, has greatly advanced his community's assessment practices. At The State Examination Centre of Azerbaijan, Vali significantly contributed to the transformations of local assessments and led key regional projects, such as unified registration and tracking platform of international testing programs, reviews of CEFR-aligned language assessments, PISA-supported assessment literacy trainings, and the institutional audit project, all aimed at improving the assessment culture across the country and former USSR region.
Vali has received two prestigious scholarships for his studies: he completed an MA in Education Policy Planning and Administration at Boston University on a Fulbright Scholarship and also studied Educational Assessment at Durham University on a Chevening Scholarship.
Discover guided practices in modernizing assessments and gain insights into the future of educational assessments by connecting with Vali on LinkedIn.