A treasure trove was hidden in plain sight at Stanford Medicine Children’s Health.
Precious data for preventing harm lay buried within the weekly 30-page precursor and near-miss safety reports. Unearthing it required manually sifting through dense PDF exports, performing ad-hoc keyword searches, and subjectively interpreting trends. Staff spent hours poring over the summaries to find patterns. Some made individual excel sheets to sort the disparate information. All of them found it laborious and inefficient.
"There’s a ton of good information in there, but we just couldn’t tap into it,” said Kiley Rogers, director of Patient Safety and Professional Practice Evaluation at Stanford Children’s. “Our quality teams were spending hours every week trying to digest the reports and figure out where their big issues are.”
So they turned to an expert prospector: AI.
They used ChatGPT-4o (within PHI-compliant Microsoft Azure) to automatically analyze reports, categorize risks, and surface meaningful patterns across all harm levels.
Doing this required considerable effort to train the AI and validate results, tallying around 100 hours for core team members and around 85 hours for validation. But those early time investments led to better insight into safety trends, reduced staff burdens, and ultimately freed substantially more time for staff every week.
3 steps for successful AI safety reports
The team trained AI in three steps to develop insightful and reliable safety-trend reports for executives and frontline staff alike.
1. Identify the issues
Working with AI specialists, the team trained ChatGPT to analyze the report and generate a concise list of discrete problems — an exercise that doubled or even tripled the issues identified compared to the manual versions. In pilot testing, patient safety officers validated the AI’s choice of issues 93% of the time.
Getting there required lots of prompt engineering and clinician feedback. For example, safety reports generally use Situation, Background, Assessment, Recommendation (SBAR) format, and initially AI would confuse recommendations for facts and sometimes focus on the wrong information. After several iterations, the model learned to focus on explicit root-cause statements, ignore recommendations, and avoid inventing problems in the report.
2. Create a taxonomy
Next, they trained AI to categorize the problems into two tiers: broad “parent” categories and detailed “child” subcategories. Early versions produced overlapping and inconsistent categories, so the safety team and clinicians examined multiple versions side-by-side to select categories that seemed most applicable and appeared most often.
The result was a human-augmented taxonomy that AI could reliably assign with 91% accuracy for parent categories and 83% for subcategories.
3. Visualize trends
The team partnered with their analytic and clinical team to visualize the AI data with Tableau dashboards. Leaders can view the frequency of single-issue compared to multi-issue reports by unit, drill down into parent and child category counts, and click through to the original report text, among other uses.
Quality managers and nursing leaders have praised the visualizations for their clarity and ease of use — replacing hours of manual PDF reviews with intuitive charts. They see value not only for system-level executives but also for frontline staff who submit these reports daily yet rarely see how their input shapes organizational learning.
Tips for success
- Assign a team. “It truly does take a dedicated team,” Rogers said. “It was really important to have the right people on the team throughout the whole process to ensure we were building what was actually needed and that it could integrate into our software, processes, and structures.”
- Invest heavily in validation. “You have to make sure what you're building can be trusted, and to what degree can it be trusted. How many times do we need to run this model before we can say it will do this consistently? How many individual events and problem lists do we need to look at until we can trust it? How many people needs to independently review it?”
- Stay focused on core objectives. “Even with this narrow scope of identifying safety trends, we had a parking lot of ideas at every meeting. When you're working in this large, brand-new arena, we had to be very disciplined so we could stay focused on our main goal.”
Next, the team plans to include AI-generated content in daily safety reports, modify dashboards based on frontline feedback, launch an education campaign for stakeholders, and use the AI model as a template for additional areas like complaints, grievances, health-equity, and workforce safety.
“We're hoping that this can be a single source of truth for our committees and improvement teams to prioritize projects and initiatives,” Rogers said. “We’re seeing a lot of excitement.”
This article is based on a presentation at the Children’s Hospital Association’s 2025 Transforming Quality Conference.