A Maryland teenager says she has been falsely accused of using artificial intelligence to write class assignments after a teacher relied on an AI detection program that flagged her work. "It's mentally exhausting because it's like I know this is my work," said Ailsa Ostovitz, 17, a junior at Eleanor Roosevelt High School, who shared a screenshot showing the tool gave a 30.76% probability her piece was AI-generated. According to the original report, the teacher docked her grade and later, after a meeting with the student's mother, said they no longer believed the work was AI-produced. [1]

The incident is part of a wider trend: more than 40% of surveyed 6th- to 12th-grade teachers used AI detection tools during the last school year, and school districts from Utah to Ohio to Alabama are buying licences for products that promise quick signals about suspected AI use. Industry data shows some large districts are investing heavily , near Miami, Broward County Public Schools disclosed a more-than-$550,000, three-year contract with Turnitin, a company that added an AI-detection feature to its longstanding plagiarism service. The company itself cautions that "our AI writing detection may not always be accurate … so it should not be used as the sole basis for adverse actions against a student." District officials say the tools are provided to support teacher judgement, not replace it. [1]

Researchers and some educators warn those caveats are not merely theoretical. Multiple studies and institutional reviews identify troubling rates of false positives and inconsistent accuracy across popular detectors. "It's now fairly well established in the academic integrity field that these tools are not fit for purpose," said Mike Perkins, a researcher on academic integrity and AI. University and peer-reviewed analyses report accuracy sometimes below 70%, with individual tools performing worse, and that discrete changes to text can make AI-produced content look more human to detectors. According to published research, many tools lack precision and dependability, which risks misclassifying student-authored work as AI-generated. [1][2][3]

Teachers who use these systems describe them as starting points for conversations rather than definitive proof. John Grady, a language and literature teacher in Ohio, said he runs essays through GPTZero and treats a probability score over 50% as a prompt to investigate revision history and speak with the student. "It's certainly not foolproof," he said. GPTZero's CEO, Edward Tian, similarly framed the tool as "a tool in the toolkit and not the final smoking gun." Yet others recount troubling false alarms: one high-school teacher uploaded a chapter of her own Ph.D. dissertation to a detector and it returned an 89%–91% AI-written result. [1]

Practical harms extend beyond individual disputes. Students with different linguistic backgrounds or those who use grammar and editing aids can be disproportionately flagged. A Shaker Heights junior, whose first language is Mandarin, said his style and use of tools such as Grammarly appeared to trigger a GPTZero alert; he warned detectors can feel like "a false alarm" and questioned whether districts should spend on licences rather than on teacher development. Institutional guidance from several universities and task forces advises that detection results should not be the sole basis for sanctions and highlights potential bias against non-native English speakers. [1][7][2]

Cost and convenience are part of the appeal for districts: some administrators say detectors save teacher time and assist with externally moderated programmes that require teacher authentication of student work. Yet critics argue money spent on detection might be better invested in professional development, assessment redesign and clearer academic integrity policies. Recent scholarship proposes shifting emphasis from trying to catch AI use to designing AI-resilient assessments that foster critical thinking and make cheating less feasible, including task designs that require process evidence, iterative drafts, oral components or in-person demonstrations. A web-based tool developed by researchers aims to evaluate an assignment's "AI-solvability" and encourage proactive assessment design rather than reactive detection. [1][5][3]

A growing consensus among ethicists and educators is that human oversight, transparent local policies and procedural safeguards are essential if detection software is used at all. Brandeis University's AI task force and other academic bodies recommend that institutions explain detection limits to students, avoid hard thresholds for punitive action, and combine any automated signal with teacher knowledge of a student's skill and the student's own explanation of their process. According to the original reporting, Prince George's County Public Schools advised staff not to rely on such tools and said the district does not purchase the software used in the Maryland case. [7][1][2]

For students like Ostovitz, the practical consequence has been extra labour: she now runs all her assignments through multiple detectors and revises any lines flagged as possibly AI-generated, a process that adds about half an hour to each task. The result, she said, is a heightened vigilance to present work that "is mine and not AI." Academics and policy groups argue a more sustainable approach would reduce reliance on imperfect detection and focus instead on assessment practices and teacher training that preserve fairness and trust. [1][5][2]

📌 Reference Map:

##Reference Map:

  • [1] (NPR) - Paragraph 1, Paragraph 2, Paragraph 3, Paragraph 4, Paragraph 6, Paragraph 7, Paragraph 8
  • [2] (University of Nebraska at Kearney) - Paragraph 3, Paragraph 5, Paragraph 8
  • [3] (Information journal / MDPI) - Paragraph 3, Paragraph 6
  • [5] (arXiv preprint) - Paragraph 6, Paragraph 8
  • [7] (Brandeis University) - Paragraph 5, Paragraph 7

Source: Noah Wire Services