A structural diagnosis of how generative AI has collapsed the evidentiary basis of academic evaluation in Indian higher education — and a proposal to rebuild it from the ground up.
A student enrolled in a law programme is assigned a research paper on constitutional interpretation, due in seventy-two hours. They open an AI model, type the assignment question, and receive a structured, well-argued response within seconds. After minor edits, the submission is uploaded. Turnitin finds no red flags — the content was generated, not copied — and a first-division grade is awarded.
The student has demonstrated no independent legal reasoning, engaged with no primary sources, and formed no original argument. The credential, however, reflects otherwise.
AI detection produces probabilistic estimates that do not meet the evidentiary standard required for academic misconduct proceedings. Paraphrasing attacks can bypass most detectors, and tools specifically designed to make AI text undetectable are widely available.
More fundamentally, detection is in a permanent race it cannot win. Each model update makes generated text harder to flag. The assignment becomes an AI detection evasion exercise rather than genuine engagement with the material assigned.
"The problem is not that AI exists, or even that students use it, but that the system provides no mechanism to distinguish between demonstrated understanding and generated output."— JGU 5 Initial Contribution · Inria AI Grand Challenge 2026
Academic assessment serves two simultaneous purposes: it produces a grade and it produces information — about what the student knows, where comprehension has failed, and what instruction must still do. These functions break down simultaneously when AI-generated work is submitted.
It is verifying that a student has submitted a document. The grade is an award but the information is absent. Faculty proceed on false data, unable to identify gaps in understanding, adjust instruction, or meaningfully gauge the effectiveness of their teaching.
Access to premium AI tools tracks closely with socioeconomic advantage. This is not a digital divide — it is an evaluation regime that systematically advantages those already ahead, while presenting itself as a meritocracy.
The UGC's 2018 plagiarism regulations remain the governing standard. No formal update addresses AI-generated content. AICTE has moved faster — creating a fragmented regulatory landscape with inconsistent accountability standards.
Law firms and consulting firms have already introduced parallel assessment exercises precisely because they no longer treat transcripts as sufficient evidence of competence. The burden of evaluation is migrating beyond universities.
A submitted essay is supposed to show that a student can construct an argument. When a machine can produce that same output in minutes, the instrument stops measuring what it was designed to measure. Policy that tells students how much AI they may use does not fix the underlying problem.
AI did not directly cause the erosion of Socratic learning — tutorial-based dialogue was progressively sidelined as institutions shifted toward scalable written assessments. But AI has made the cost of that erosion visible and urgent. The Socratic method is the most resilient form of assessment precisely because it cannot be delegated to a machine.
AI tools are often described as democratising because they are widely available. A student at a well-resourced institution with a premium AI model is not using the same tool as a student at a state college on a free-tier platform. The gap is not just access — it is also the capacity to use AI well, which tracks closely with the quality of prior schooling.
The adoption of AI tools in academic settings is entry into a commercial relationship with companies whose business models depend on data accumulation. Student submissions and reasoning patterns are commercially valuable training data. The DPDPA 2023 is directly relevant, but subordinate regulations for educational AI have not been developed.
When grades increasingly reflect AI-assisted output rather than a student's own reasoning, employers, postgraduate institutions, and professional bodies lose the ability to trust what a degree represents. Faculty have already begun resorting to informal viva-style conversations to verify understanding — that informal workaround points toward the structural reform the system needs.
When any assignment can be resolved in ten minutes with the right prompt, the incentive to sit with a problem and build understanding from the struggle disappears. This is not laziness — it is a rational response to a broken incentive structure. A student who consistently outsources academic thinking will graduate with a credential but without the intellectual muscle that credential is supposed to represent.
The Situated Learning Assessment Model is a governance framework built on a single premise the current system has never adequately operationalised: what a credential certifies is not the ability to produce a document, but the capacity to demonstrate genuine understanding. SLAM does not modify how submissions are staged or disclosed — it moves evaluation away from submissions entirely.
Most evaluation reform designs a universal framework, then inserts flexibility. SLAM inverts this — beginning from the learner's year of study, subject, and chosen direction, building the evaluation method upward from there. The result is a principled architecture within which different subjects and years are assessed through mechanisms genuinely suited to what is being demonstrated.
The unit of evaluation shifts from what a student produces outside the room to what they can demonstrate inside it. Structured application tasks, Socratic dialogue, and performance-based evaluation make understanding visible through action. A student cannot delegate a live advocacy performance. A student cannot outsource reasoning a clinical scenario demands in real time.
Two students assessed through different formats are not being held to different standards. The format is variable and context-responsive. The standard — the cognitive depth, analytical rigour, and disciplinary competence expected at a given year level — is fixed and institutionally defined.
Differentiation by Subject Type
| Subject Type | Evaluation Method | Rationale | Tags |
|---|---|---|---|
| Foundational & Theory | In-classroom application tasks with unseen problems requiring real-time reasoning | Tests whether a student can identify what principle applies, why it applies, and what it requires in a specific situation — reasoning that happens in the student's mind, in the present moment, and cannot be produced in advance. | No AI substitution possibleDesign-based resistance |
| Applied & Clinical | Moot court, case simulations, structured problem-solving under observation, live client scenarios | Professional competence demands evaluation through performance. A student cannot delegate a live advocacy exercise. The assessment instrument is the performance itself, which by definition cannot be generated in advance. | Performance-basedProfessional alignment |
| Elective & Specialisation | Structured faculty-led inquiry in the student's chosen domain, directed intellectual exchange | A student who has elected to focus on technology law should be assessed on their capacity to reason within that field — through a directed exchange in which the faculty member probes the depth and independence of thinking in real time. | Domain-specificIntellectual ownership |
Differentiation by Year of Study
| Stage | Evaluative Priority | Assessment Focus |
|---|---|---|
| Early Years | Conceptual grounding | Can this student identify the relevant principle, apply it correctly, and explain why it leads to the outcome reached? The question is not complex — the test is whether the answer is genuinely theirs. |
| Middle Years | Analytical development | Students must explain why one position is stronger than another, what their favoured argument cannot account for, and where reasoning breaks down under pressure. Precisely the questions AI can simulate in a document but cannot answer when a faculty member pursues them in real time. |
| Final Years | Intellectual ownership | Dual-track model: shared disciplinary competence (core knowledge every graduate must demonstrate) + learner-specific track (the student's elected specialisation). Both tracks use in-context, real-time demonstration as the evaluation instrument. |
Language equity. Before enrolment, institutions map the medium of prior schooling for each student. In the first semester, all students participate in academic discourse coaching through credited or co-curricular workshops — a universal preparation, not a deficit intervention. Students may initially articulate reasoning in their first language before self-translating; the argument, not the language of its first expression, is what is being assessed.
After every viva, faculty are required to give feedback that separates content understanding from expression quality — so students can distinguish what they know from how fluently they currently articulate it.
Process anxiety. In-classroom application tasks can disadvantage students who process more slowly or who experience anxiety in high-pressure academic settings. SLAM does not treat these as edge cases. Faculty development must include explicit training in designing tasks that allow for variation in response pace and style without reducing the rigour of what is being tested.
Where language anxiety remains significant, viva questions are made available thirty minutes before the session — removing vocabulary retrieval pressure without reducing the spontaneity of reasoning being tested.
The conversation began not in a seminar room but in an informal exchange among five law students who had noticed something unsettling: assignments were being graded on output that no longer reliably reflected the understanding of the person submitting it. Not hypothetically — in our own institution, in our own cohort.
We started from the student's perspective rather than the institution's. Our first instinct was to ask what the assessment was actually trying to do before asking how AI disrupted it. That reorientation — diagnosis before prescription — shaped everything that followed. We read through UGC frameworks, BCI guidelines, and NAAC accreditation criteria, and found the same gap everywhere: the instruments were designed for a world in which written output was evidence of the reasoning that produced it. That assumption had quietly collapsed.
We argued about detection first — whether smarter tools could close the gap. The consensus, after working through the literature, was no. Detection is a probabilistic exercise that cannot meet the evidentiary standards required for misconduct proceedings. We needed to think structurally, not technologically.
Our initial contribution to the challenge laid out the structural failure of detection-based approaches and proposed SLAM as an alternative architecture. The core argument was that the credential crisis was not primarily a technology problem but an evidentiary one: institutions had lost the mechanism for distinguishing demonstrated understanding from generated output.
We proposed three interlocking principles. Assessment should start from the learner rather than from an abstract universal standard. Evaluation should move toward real-time demonstration — what a student can show inside the room — rather than what they produce outside it. And format should be understood as variable while standard remains fixed.
We mapped these principles across subject types and year levels, and addressed the fairness questions directly: on language equity, on processing anxiety, and on what it means to prepare students for a form of evaluation that is, in many institutional contexts, unfamiliar.
The peer review process surfaced questions we had anticipated and several we had not. The scalability challenge came up repeatedly — how do you run viva-style assessments at the scale of a public university with hundreds of students in a cohort and faculty already stretched thin? We had addressed this in our framework but not with sufficient concreteness, and that was a legitimate criticism.
We also heard a sharper version of the equity argument. Our proposal assumes that integration — formally recognising AI use and building assessment around it — is preferable to prohibition. The counterargument was that prohibition might be more equitable precisely because it levels access to zero. Working through that argument changed how we articulated our position: prohibition does not in fact eliminate unequal advantage, it merely pushes it underground, where disparities continue to shape outcomes without any institutional mechanism to address them.
A third line of feedback pushed us toward something we had not adequately considered: AI literacy as an assessable competency in its own right. Our framework had largely treated AI as something to render irrelevant to evaluation. The question — can you assess whether a student uses AI well, not just whether they can think without it — opened a parallel track we intend to develop further.
The discussion within the team was structured around two competing paradigms — integration as regulation and prohibition as equalisation. The deliberation focused on which framework more accurately responds to the structural transformation introduced by generative AI.
Generative AI has already entered the epistemic core of academic work. Prohibition attempts to regulate a practice that is already widespread yet difficult to detect or enforce. Integration enables reconstruction of assessment frameworks — shifting evaluation toward process-based and performance-based verification that directly tests understanding rather than relying on written artefacts AI can replicate.
Access to AI is stratified across three axes: tool quality, AI literacy, and institutional capacity. Integration — particularly when paired with disclosure requirements — was seen as insufficient because it renders inequality visible without correcting it. Formalising AI use risks entrenching advantage for already well-resourced students, while overburdened faculty in large public institutions lack capacity to meaningfully assess disclosures.
Prohibition does not, in fact, eliminate unequal advantage — it pushes AI use into informal and unregulated domains, where differences in access and capability continue to shape outcomes without institutional oversight. Integration, while not eliminating inequality, creates the only viable framework within which disparities can be addressed through standardised access, AI literacy training, and redesigned assessment methods. Regulatory leverage is only possible under conditions of formal integration — the DPDPA 2023 can only be operationalised when institutions explicitly recognise and govern AI use.
Five reviewers engaged with our initial contribution between April and May 2026 — from France, Mexico, Senegal, and India. Their questions sharpened the proposal in places, challenged it in others, and in one case opened a line of thinking we had not yet pursued. What follows is the exchange in full.
SLAM is not a suggestion to replace all assessments with individual viva-style evaluations. It is a phased and flexible framework where assessment formats differ based on subject type, year of study, and institutional capacity. Scalability is addressed through supervised in-class application tasks, structured group discussions, simulations, and debate-based assessments that allow multiple students to be evaluated simultaneously. Real-time oral assessments are intended to supplement, not entirely replace, existing evaluation systems.
The point about the trustworthiness of diplomas in the age of generative AI is highly relevant. More broadly, this reflects a systemic issue across education: students are incentivised to prioritise short-term performance over long-term learning. AI brings this to the forefront. Some systems don't rely on papers much — in France, written in-person exams require real-time reasoning around a "problématique." However, both written and oral exams have limits: students often learn for the test and forget afterward. AI is an opportunity to rethink assessments in favour of deeper, long-term understanding.
Your observation about AI exposing a deeper structural weakness within education is something we strongly agree with. Many assessment systems were already rewarding short-term performance over durable understanding, and generative AI has simply made that gap more visible. We agree that formats centred around real-time reasoning are significantly more resilient than conventional take-home submissions. That is precisely why our proposal attempts to rethink not just the format of assessment, but what assessment is actually meant to verify.
The format-vs-standard distinction in SLAM is genuinely useful. Two questions: SLAM shifts evaluation to live faculty judgment — what about structural safeguards inside the viva itself? Blind rubrics, cross-faculty calibration, audit trails — to ensure fluency doesn't get conflated with reasoning quality at scale. Also, the proposal largely treats AI as something to render irrelevant to assessment. Do you see a parallel track where AI literacy becomes its own assessable competency — "can you use it well" — not just "can you think without it"?
Mechanisms such as structured rubrics, calibration standards, and audit trails are exactly the kinds of safeguards we believe would become necessary as frameworks like SLAM scale institutionally. We would also clarify that the proposal is not attempting to render AI irrelevant to assessment broadly — our argument is narrower: AI should not substitute the evidentiary function of assessment itself. AI literacy would sit within SLAM but not as a substitute for demonstrated cognition. The ability to use AI well cannot replace the need to demonstrate independent reasoning.
We question the idea that paid AI tools are necessarily better, since both paid and free versions can make similar errors or biases. Your point about alignment with societal and employer expectations is very relevant, especially as AI reshapes other sectors beyond education. The equity concern already existed before AI — and while inequalities are real, access to knowledge is still broadly possible through the internet and open resources.
Our concern is that AI risks formalising and amplifying existing inequalities within evaluation systems that still assume comparable conditions of access, literacy, and institutional support. Our point regarding paid and free AI tools was less about correctness alone — students with access to premium models benefit from better contextual memory, longer interactions, multimodal capabilities, and greater institutional exposure in how to use these systems effectively. The advantage lies not only in the tool itself, but in the conditions under which students learn to engage with it.
The proposal reframes AI not just as a plagiarism issue, but as a deeper challenge to how universities evaluate genuine understanding. The focus on viva-based assessments and real-time reasoning directly addresses AI-generated submissions. Connecting assessment failures with equity, faculty preparedness, and declining credential credibility makes the proposal realistic and well-researched. One area to improve: simplifying the framework and making implementation more practical. The ideas are strong, but large-scale adoption may be difficult for institutions with limited resources. Adding clearer pilot strategies and measurable outcomes could strengthen it further.
We have attempted to account for implementation through phased rollouts, pilot institutions, mandatory outcome reporting, faculty development mechanisms, and equity audits — precisely because we recognise that large-scale adoption cannot happen uniformly across institutions with differing levels of resources. We completely agree that clearer measurable outcomes and implementation indicators would strengthen the framework further, and that is something we intend to develop more concretely as the proposal evolves through the next stages of the challenge.
Pilot institutions implement subject-type differentiated evaluation in two disciplines. Faculty development workshops on application-based assessment design. Mandatory outcome reporting to surface early constraints. Language equity mapping for incoming cohorts.
Framework extends across all year levels at pilot institutions. Equity audits conducted before further expansion — identifying where differentiated formats are producing unintended disadvantage. Cross-faculty calibration standards developed and tested.
Regulatory guidance enables national rollout with institutional adaptation flexibility built in. The framework's logic scales — but its specific mechanisms must be calibrated to disciplinary and institutional context. The framework, not the format, is what is standardised.
We are not approaching this challenge as outside observers of a system we have studied. We are inside it. We have sat in examinations designed for a pre-AI world. We have submitted assignments assessed by faculty who have no institutional mechanism to distinguish genuine understanding from generated output.
"What we stand to gain from this opportunity is specific and cumulative. A practical insight into designing AI interventions that are not only conceptually sound but institutionally scalable, capable of functioning within the resource constraints of mid-tier Indian universities, not just the well-funded exceptions. The students who stand to lose most from the collapse of meaningful evaluation are not those with the resources to navigate it strategically. They are the ones for whom a credible degree was the primary instrument of social mobility."— JGU 5 Team · Motivation Statement
The proposal presents a compelling shift from evaluating polished written outputs to assessing genuine student understanding through real-time methods. However, the proposal would benefit from a more concrete implementation strategy, particularly considering under-resourced institutions and faculty managing large student cohorts. How can these personalised, real-time assessments be conducted at scale without significantly increasing teacher workload?