In the semiconductor manufacturing industry, modern tools generate thousands of heterogeneous sensor streams during wafer processing, making manual review of fault-detection (FD) data both time-consuming and prone to oversight. Early identification of equipment anomalies is critical: undetected deviations can propagate through process steps, leading to reduced yield, increased scrap rates, and unplanned tool downtime that can cost upwards of hundreds of thousands of dollars per hour. We therefore propose a generalized, high-throughput framework that employs unsupervised subspace learning to capture nominal equipment behavior and flag outliers without requiring large, labeled datasets or extensive parameter tuning.
Our approach projects multi-modal FD data into a compact representation by leveraging lightweight dimensionality-reduction techniques to develop flexible learned embeddings. This subspace construction is entirely unsupervised, and delivers a basis aligned with the directions of maximal variation in the original sensor space. By modeling the distribution of in-subspace projections with simple multivariate statistics (e.g., Gaussian envelopes and Mahalanobis-style distance), we define quantitative deviation scores at both the chamber (sensor-level) and wafer (batch-level) scales. These scores enable immediate ranking of the highest-risk wafers and sensors, guiding rapid screening and resource allocation for in-depth diagnostics.
Our approach addresses the challenge of building models for each sensor by providing a scalable solution that is effective even with as few as 10 runs. Unlike deep learning models that typically require large datasets, our method is designed to function efficiently with limited data, making it ideal for development phases where data may be scarce. This is especially important as the number of samples available during initial tool deployment is often very small. A built-in sensor down-selection module further reduces dimensionality by ranking features according to their importance and impact on deviation scoring, ensuring that follow-up investigations focus on the most informative channels.
Furthermore, our methodology includes a sensor down-selection process, which aids in identifying the most critical sensors for detailed analysis. This not only streamlines the data triage process but also provides clear direction for users on which sensors warrant deeper investigation. By improving process control, tool monitoring, and yield diagnostics, our dimensionality reduction-driven anomaly detection framework enhances overall processing efficiency and reliability.