Solution Highlights
- Proved the feasibility of the GPT-driven solution to analyze delivery contract discounts
- Selected the best approach to implement the solution and improve pipelines
- Implemented the PoC using a mix of OCR and LLM technologies
- Wrapped the PoC as a user-friendly application for contract management and cost optimization
About the Project
Customer:
Our partner, currently under NDA, was about to launch a startup project and wanted to test a business idea that leverages AI for accurate discount evaluation in shipping contracts.
Challenges & Project Goals:
Through market analysis, the customer identified that the two largest U.S. shipping companies, UPS and FedEx, deliberately offer various discounts, deals, package types, and payment options. However, small- and medium-sized businesses — our partner’s target audience — often lack the resources to thoroughly explore these options and select the most cost-effective shipping services.
Aiming to close this gap in the future, the customer wanted us to validate their business idea. Their request was to develop a proof of concept (PoC) for a machine learning-powered solution designed to analyze contract discounts in delivery services. The application would then recommend the contract option with the most optimal pricing.
Solution:
Collaborating closely with the customer, the Intelliarts team created the PoC — AI-powered discount tracking solution in delivery agreements that helps users choose the most optimal contract. We proved the feasibility of building this AI solution and recommended the best approach to implement it.
Technology Consulting, AI Development, Software Engineering, R&D, Data Engineering
Technology Solution
Data exploration
After a series of meetings with the customer, we managed to delve deeper into their business needs. The partner also provided us with around 30 PDF contracts with different structures for data exploration.
Our analysis revealed that the two US key players, UPS and FedEx, deliberately make contracts unstable and inconsistent, making it challenging to compare them directly. The same table might be presented differently in two contracts, and various abbreviations could be used to denote the same term, for example, “3P” and “TP” for third parties. The chaotic nature of these contracts made it clear that we could not write a fully deterministic program whose behavior would be completely determined by inputs and sequence of instructions.
After our data scientists investigated the input contracts, they also outlined the output schema of what to expect as the result of our solution. The idea was to use Generation AI, such as ChatGPT or other large language models (LLMs), to extract the necessary information from the contracts as an industry expert would do.
Approach selection
Given the diversity and unstructured nature of the contracts, which also varied significantly in format, the Intelliarts team assumed that LLMs were the most suitable solution for several reasons:
- They’re flexible across formats — LLMs are good at interpreting and standardizing diverse document structures and layouts. Working with multiple tables with similar information, LLMs can convert them into a consistent, unified format.
- As contract formats evolve, LLMs effortlessly adapt to contract changes. The technology easily identifies and interprets new clauses, terms, or table structures, that is if a new billing option is introduced, the LLM can accurately recognize it.
- LLMs are also capable of grasping the context of contract language, which helps to accurately interpret complex clauses and terms. By analyzing context and language patterns, LLMs can extract details that are implied rather than directly stated in contracts. For instance, UPS usually combines information about service levels and billing options in one paragraph, but LLMs are able to extract this information right.
At this stage, we conducted a series of hypothesis-driven experiments to find the most optimal algorithms for the task. Our data scientists analyzed the performance of several AI models, with CloudAI and ChatGPT delivering the best results. All in all, we chose ChatGPT as our primary tool, given the customer’s existing subscription and the minimal performance difference.
The Intelliarts team also experimented with various vision models. The results were mixed. ChatGPT Vision excelled with certain table types, Vision AI outperformed in others, and LLaVA worked better for the third type. This diversity highlighted the strengths and limitations of each model.
At the same time, the biggest problem with LLMs is their instability. They can sometimes return different answers to the same query or even crash. To mitigate this, our team balanced the use of LLMs with deterministic methods whenever possible, though this was challenging due to the contracts’ varied structures.
PoC development
Our next step was to create the PoC for GPT-based contract analysis solution for shipping designed to analyze contract discounts in delivery services, helping users select the most optimal contract. Key milestones included:
- Our team worked with two types of documents, scanned and text. For the text data, we utilized the Camelot library for data extraction, configuring it to handle the text correctly before uploading this information to ChatGPT.
- Some text documents were encrypted, leading to discrepancies when copying and pasting them into ChatGPT. For example, the number “1” could appear as the letter “L” or the symbol “%”. To address this, we had to convert and process these documents as scanned images.
- For the scanned documents, our data scientists employed Python-tesseract, an Optical Character Recognition (OCR) tool, to extract and “read” the text embedded in images before uploading it to ChatGPT for analysis.
- Many scanned PDFs were old and of low image quality. Our team applied various techniques to sharpen the images and extract the text.
- Also, some tables were split across pages, making it difficult for the model to recognize them as a single entity. We developed methods to address this challenge too.
- Abbreviations and varying discount ranges posed additional challenges. We successfully standardized these elements to ensure accurate interpretation by the model.
As a result, the PoC model achieved an accuracy rate of 75% to 80%, depending on the parameter. With further fine-tuning, we anticipate reaching even higher accuracy in the final solution.
Business Outcomes
The Intelliarts team proved the feasibility of the business idea to build the GPT-driven solution for contract discount analysis in shipping services. The customer received the PoC wrapped as an application where a user can upload the contracts, and the model helps to choose the most optimal one. The application highlights key information like weights, zones, packaging types, and billing options, enabling users to make informed comparisons between contracts.
For this project, our team performed thorough research to select the best approach to implement the solution and improve pipelines. Finally, we implemented the PoC using various OCR technologies, Camelot, ChatGPT, etc.
The project is still in progress, with the customer currently testing the solution. Following this phase, we will proceed with full-scale solution implementation. Ultimately, the end users will benefit from a cutting-edge ML-driven solution designed to streamline the contract selection process and optimize shipping costs.
This solution represents a significant advancement in the way businesses approach contract management and cost optimization in the logistics sector as well as reducing legal risks with GPT contract analysis tools. The end users can benefit from:
- More informed decisions, with a clear comparison of contract options
- Reduced time and effort required to evaluate various shipping options
- Increased transparency thanks to presented key contract details and discount structures