Update to the AI Challenge: Using LLMs for Regulatory Compliance
- David Connor
- May 30
- 5 min read
Originally presented as part of the second article in a six part series on CE marking
The AI Challenge

In July of 2024, I was writing a series of articles based on Unlocking the European Market: A Step-by-Step Guide to CE Marking. In the article on Navigating EU Directives and Standards, I posed a challenge:
"Before continuing, take the product description above and give it to your favorite AI. Ask the AI to provide you with the appropriate European directives and standards to use for product CE marking."
I'm a fan of Large Language Models (LLMs) and their ability to distill vast amounts of complicated information into simple, coherent ideas. It would seem a perfect fit for the safety and regulatory space, where there are complex rules and regulations. However, general LLMs have limitations.
In July 2024 I tried three major free LLMs; Google Gemini, ChatGPT-3.5, and Microsoft Copilot. After giving them the exact product description from the article, they were asked simply, to provide the appropriate European Directives and Standards used for CE marking. The results at the time were incomplete at best and in some cases incorrect.
So, knowing that LLMs would only get better, I opened up the comments to the AI Challenge.
Background: LLMs and Regulatory Compliance

Large Language Models (LLMs) offer significant potential for assisting with regulatory compliance but their effectiveness is limited by multiple training challenges, here are a few examples:
Data Quality and Specificity of Training Data:
Lack of Domain-Specific Data: LLMs are pre-trained on vast datasets from the internet, which may not contain sufficient high-quality, highly specific regulatory text. Regulatory language is often highly nuanced, precise, and uses specialized terminology.
Outdated Information: Regulatory landscapes are constantly evolving. LLMs trained on historical data may provide outdated information, leading to non-compliance. Keeping the training data consistently up-to-date with the latest amendments, new directives, and withdrawn standards is a massive and ongoing challenge.
Copyrighted, Proprietary, and Confidential Data: Training LLMs on vast datasets raises concerns about the potential inclusion of copyrighted material, leading to intellectual property infringement issues. Much of the practical knowledge for regulatory compliance resides within companies (e.g., internal risk assessments, test reports, clinical evaluations). This proprietary and often confidential data cannot be easily used for general LLM training.
Bias in Training Data:
If the training data contains biases (e.g., towards certain industries, interpretations, or geographical regions), the LLM may perpetuate these biases in its responses, leading to potentially discriminatory or non-compliant advice.
Hallucinations and Factual Accuracy:
LLMs can generate plausible-sounding but factually incorrect or nonsensical information (hallucinations). In regulatory compliance, where precision is paramount, a single hallucination could lead to severe legal and financial consequences (e.g., incorrect CE marking, product recalls, fines).
Text Chunking
Text chunking is the process of breaking down long or complex documents into smaller, manageable segments (chunks) before feeding them into an LLM. This is necessary because LLMs have a "context window" limit, meaning they can only process a certain amount of text at one time. Regulatory documents are highly interconnected. A definition in one section might be crucial for understanding requirements in another. When a document is chunked, important contextual information can be separated, leading to the LLM misinterpreting requirements or missing critical details. For example, a standard might refer to a specific test method described in detail pages away. If these are in different chunks, the LLM might struggle to link them correctly, leading to incomplete or incorrect guidance.
Do We Have an AI Challenge Winner?
I was recently contacted by a colleague in the regulatory world using ChatGPT o3 with Deep Research (pro) and gave it the information from the AI Challenge. Deep Research does a good job of prompting the user to get more information for a more accurate response:

To ensure I find the most relevant CE marking directives and standards for your lab fume hood, could you please clarify:
Is the product intended for use in professional laboratories only, or also in educational or residential settings?
Does the product involve any flammable, explosive, or hazardous substances during operation?
Are there any specific safety features (like alarms, sensors, automatic shut-off)?
Will the product be marketed in specific EU countries, or broadly across the EU?
These details will help me determine the precise directives and harmonized standards applicable to your product.
After providing details about the product in response to the prompts, Deep Research was able to provide what appears to be a comprehensive and accurate list of EU Directives and Standards applicable to the described product, including a list of references for the data.
The full text was too long for the comment section, so I included it here:
However...

An important aspect of ChatGPT o3 with Deep Research is that it gives specific references for the information it provides. This way you can follow-up on the references to determine if the source is credible and appropriate.
For the AI challenge, Deep Research provided 43 source quotations. Twenty (20) of the 43 source quotations are links directly to Striper Solutions LLC, most reference "Unlocking the European Market: Navigating EU Directives and Standards" -the same article that issued the AI Challenge...
Including the links to EUR-Lex that were also directly from the article, half of the sources Deep Research referenced were Striper Solutions LLC and the article itself. While being referred to as an expert is flattering, it indicates a significant reliance on certain sources, possibly including the foundational material initially provided to the AI. This could affect the perceived scope and independence of the regulatory information presented.
In Conclusion
The AI Challenge is in no way intended to be a definitive test for AI in regulatory compliance. Dedicated systems, trained on specific proprietary data, and using Retrieval-Augmented Generation (RAG) designed for regulatory work, could potentially do an excellent job at answering targeted questions. Rather, this was meant to be an exercise in understanding the current state of common AI tools that a typical product designer or regulatory specialist would employ to when investigating product specifications.
Like any tool, AI requires an understanding of its operation and its limitations for optimal use. This analysis has provided valuable insights into the importance of understanding both the capabilities and limitations of these tools for regulatory compliance (at this time).
LLMs are critical for efficient business operations. They provide insight and organization in a world of messy information -but they still need expert guidance when dealing with complex issues requiring experience. They can help you get there -but you still need to know where you are going before you start.
Striper Solutions LLC, Your Regulatory Partner

With over 30 years development experience and more than a decade providing consulting services, Striper Solutions has demonstrated a commitment to excellence and integrity in developing new products. Embracing a shift to regulatory compliance allows us to focus on a critical area of product development with the dedication and quality our customers need.
See more at www.striperllc.com
Comments