A common theme emerging in the discussion of agentic AI systems in complex technical domains is the need for additional domain knowledge to provide context beyond what exists in the training data of frontier foundation models to improve performance and outcomes.
One problem area ripe to address with AI systems has been the rapid retrieval of contextually relevant information to aid in troubleshooting and debugging critical systems. Tool and equipment downtime in semiconductor manufacturing is extremely costly, with production lines idled while waiting for experienced technicians. Chat assistants have reduced the time to access documented and structured knowledge, a long-standing challenge in this process. The labor of providing search terms that can surface the right information without cluttering results with irrelevant noise has been shifted from the user to the tool, where the user simply needs to supply a more robust description of the problem context in natural language. Additionally, the labor of scanning the surfaced resources for relevance and integrating the found information into relevant form has also shifted from user to tool. These innovations have had a stunning impact on the time and cost to access expertise and domain knowledge just as Google did in its early days relative to the status quo.
Although chat assistants with access to manuals provide information, they still lack the context and experiential knowledge to significantly reduce downtime. Domain experts often retain tribal knowledge that goes undocumented in formal knowledge management systems yet is often the key ingredient to problem solving in these domains. Worse yet, as AI systems lower the barriers to access expertise, the incentives to bring expertise into these systems (a weak incentive to begin with) are now reduced even further. This dynamic has a foundational implication on how AI knowledge management systems must be designed to avoid catastrophic failure over time.
This presentation discusses how agentic AI systems can function as effective troubleshooting partners by systematically learning from past troubleshooting and debugging records and structuring this knowledge as they operate to be accessed more easily in the future, reducing dependence on tribal domain knowledge and creating a self-sustaining knowledge management system that overcomes the shortcomings of traditional knowledge management systems.
I'll show how these AI partners can capture troubleshooting processes, identify underlying patterns, and detect early warning signs of future failures. Attendees will discover how these systems preserve critical institutional knowledge, empower less experienced technicians, and ultimately create more resilient operations with significantly reduced equipment downtime.