Enhancing Expert Assessments with an AI-powered Agent

Our custom AI agent structures expert assessments, automates the audit process, and transforms raw expert feedback into clear, actionable insights for the company's leadership and management.

Intelliarts Success StoriesEnhancing Expert Assessments with an AI-powered Agent

Structure

Solution Highlights About the Project Technology Solution Business Outcomes

Solution Highlights

Developed an AI agent to structure Valency’s expert assessments and enhance audit clarity
Increased decision-making by transforming raw expert feedback into clear, structured insights
Improved assessment quality for them to align with industry-specific language, tone, and terminology
Reduced processing time from 5 minutes per query to a few seconds

Optimize LLMs with structured prompts, fine-tuned models, and AI-powered quality control.

Talk to our experts

About the Project

Customer:

A global leader in project assurance, Valency helps businesses improve capital project outcomes through audits, risk assessments, and expert-driven recommendations. By analyzing organizational structures, financial management, leadership, and operational workflows, the company delivers data-backed insights that enhance decision-making and drive greater predictability.

Challenges & Project Goals:

Valency reached out to Intelliarts to streamline their traditional audit process conducted for clients. This assessment relied heavily on domain experts manually analyzing key areas, such as finances, operations, or organizational structures. While thorough, this approach often resulted in complex, unstructured reports that were difficult for management and leadership to interpret.

By using LLMs, our partner aimed to enhance clarity and accessibility of these insights and make data-driven recommendations more accessible for decision-making.

Solution:

To improve the consulting process, Intelliarts developed an AI-powered agent that structures domain expert assessments. Using LLM technology, our solution transforms raw consultant feedback into a structured, easy-to-digest format for company management and leadership, i.e., Valency’s clients.

Key capabilities include:

Summarization of expert assessments, including scores for key assessment areas and feedback across multiple categories and subcategories
Comparative analysis and benchmarking against industry standards
Structuring audit results into an intuitive format for improved decision-making

Location: Canada

Industry: Construction

Partnership period: Sep 2024 — Now

Services:

Technology Consulting, AI Development, Software Engineering, R&D

Expertise:

ML Development, Data Analysis, Cloud Services, Business Intelligence & Analytics, Data Science, SaaS Development

Technologies used: Docker, Uvicorn, asyncio, Pydantic, LangChain, Claude, ChatGPT, Llama, AWS Bedrock, AWS NOVA

Build AI agents that deliver actionable insights tailored to your industry.

Get started with AI

Technology Solution

The AI agent created by the Intelliarts ML team structures domain expert assessments, summarizes them, and conducts comparative analysis and benchmarking against industry standards. Our solution also increases clarity by converting raw consultant feedback into a structured, easy-to-interpret format.

Here’s how we built this AI-powered agent:

Step 1: Initial AI agent development

We started by developing and testing a basic version of the AI agent using a GPT-based model. Early in the process, we met a significant challenge: executing queries sequentially to structure the assessments resulted in processing times of up to 5 minutes per query. This delay was unacceptable, as speed and efficiency were key to the solution’s success.

To solve this performance issue, our ML engineers parallelized the queries by using asyncio. This optimization cut assessment generation time to a few seconds per query, making the solution practical for real-time use.

Step 2: Model selection

After increasing performance, we explored top LLMs beyond ChatGPT to find the best-performing solution. We integrated Amazon’s LLM solution and tested 6 different models, including Claude, GPT, Llama, and AWS NOVA (Llama 3.3, with 70B parameters). The evaluation criteria included:

Text generation quality
Cost efficiency
Latency (processing time per query)

Our final selection was AWS Bedrock with Llama, as it provided the best balance of quality, cost, and efficiency, meeting our partner’s specific needs.

Step 3: Assessments with metadata

Initially, our AI agent generated general assessments only. Then we realized that providing assessments with metadata could bring more value to the customer. The metadata included detailed statistics on the generation process, which allowed for assessing the quality of the generated content beyond just the textual feedback. This enhancement added more transparency to the process, enabling decision-makers to evaluate the model’s output with greater confidence.

Step 4: Text quality improvements

Text quality was a top priority for our customer, as AI-generated reports needed to align with industry-specific language, tone, abbreviations, terminology, and sentence structure. However, LLMs often introduce errors (AI hallucinations) and inconsistencies, making structured output difficult to achieve.

To ensure high-quality and reliable text generation, we:

Focused heavily on prompt engineering and iterative refinement
Built 7 core prompts and fine-tuned them iteratively over 15 rounds of optimization
Split prompts into smaller components, refining each separately before feeding them into the LLM
Integrated quality engines using Pydantic and LangChain to eliminate inconsistencies and ensure that the AI agent produces well-structured, contextually accurate assessments
Collaborated closely with Subject Matter Experts (SMEs) to refine terminology and ensure the generated content meets the industry standards

Step 5: Improved categorization

Our team started with summary generation for individual elements based on domain expert feedback. Now we’re working on enhancing the system to generate more comprehensive summaries, specifically tailored to each assessment category. For instance, we check the comments made on budget management and make the summary for the whole block. Also, this new version of the AI agent will include references. In case there is any ambiguity in the generated content, users can trace back to the source.

Step 6: Further improvements

The solution is yet to be deployed and integrated, but it’s important to mention that our ML engineers built this AI agent as a microservice. This allows seamless integration into our partner’s custom cloud solution, which runs on Heroku. We also used Docker to manage the microservice, ensuring easy deployment in various environments.

Business Outcomes

The Intelliarts team built an AI-powered agent to structure expert assessments and streamline the audit process. Some of the key achievements include:

Improved assessment quality – Through iterative prompt tuning, we ensured that AI-generated assessments mirror expert-written assessments
Faster and more efficient processing – By implementing parallelized query execution, we reduced the time required to generate assessments from 5 minutes per query to a few seconds
Enhanced decision-making – The AI agent transforms raw consultant feedback into a structured, digestible format, making it easier for decision-makers to extract key insights and act upon them
Seamless integration – Developed as a microservice, the solution is ready for deployment into the partner’s cloud system