Exploitable result: Methodology of evaluating duplicate question detection for a GenAI based chatbot service

The method assesses question caching by matching new queries to a question database to determine whether a validated answer can be safely reused. Duplicate detection is strict, requiring exact intent equivalence; any change in intent or detail is treated as a non-match to protect patient safety. Evaluation uses a synthetic dataset from existing questions, with positive and negative duplicate candidates generated via multiple methods. Performance is measured with MRR, Hit@K, and precision/recall, plus review to judge retrieval quality and reliability, noting label subjectivity as noise.

Research area(s)

Generative AI and LLMs

Technical features

The system uses a two-stage pipeline. A bi-encoder–based semantic search using all-MiniLM-L6-v2 retrieves candidate questions by computing cosine similarity between embeddings. Final duplicate detection is performed via an LLM prompt, which assigns a discrete score (1–5) to a pair of questions. Only exact matches (score 5) are accepted for answer reuse. In our internal evaluation, this prompt-based approach outperforms fixed cosine-similarity-threshold baselines, which is particularly important from a safety standpoint.

Integration constraints

Solutions that use LLMs

Targeted customer(s)

Philips and any industry developing chatbots

Conditions for reuse

Originally to be used internally, licensing to be considered

Contact

Martijn Krans

Email martijn.krans@philips.com