Extract text based instructions into a structured format
Unformatted text to structured data with GenAI
Project overview
Technologies
- GPT-4 with Vision
- Azure Document Intelligence
Tools
- Text recognition
- Python
Key features
- Covert instructions from various formatting into consistent, structured data
Contributors
Why the client wanted this
- A reliable and accurate way of digitising a vast and diverse library of knitting instuctions from a variety of sources like images, magazines and documents - at scale
- Get the instructions into a structured format using a large language model so they could be archived and distributed.
- As far as we're aware, this AI-led approach to digitising knitting instructions had not been attempted before.
Methodology
Collecting documents
To begin this process, it is key to collect all the documents and images that need to be digitised. These can be in any readable format such as PDF and images, as long as the text is visible.
Extracting the text
Next came the question of how we were going to extract all the text verbatim into raw text. Enter Azure's Document Intelligence (DI) tool, backed by OpenAI's GPT-4.
DI was fairly reliable for getting the text into a good structured format so that is easily parsable for any AI to then convert into any other desired format.
Using GPT-4 to power the web app's UI
Now that we had the text in a digital format all that was needed was to get the instructions in a consistent, structured format so that the web app's UI had reliable data to be powered from.
For this we used GPT-4's chat completion feature and primed it to return only JSON (the desired output format for the UI) with the API's JSON mode, and gave a system prompt to ensure consistent output with our requirements for how we wanted the instructions presented.
Tech stack
Upload documents
The knitting instructions
The knitting instructions
Azure document intelligence
Text extraction
Text extraction
GPT-4 Turbo
Formatting to JSON
Formatting to JSON
The prototype
Not just needles and threads
Mass analysis of legal and financial documents
Consider the legal industry, where contracts and case files often exist in a blend of typed and handwritten formats. Applying our techniques could streamline the analysis and organisation of these critical documents. The same could be said for the financial sector, awash with bank statements, invoices, and receipts, often a mix of digital and handwritten entries.
Educational resource compilation
Think of schools where educators compile varied teaching materials, including handwritten notes. Our solution could assist in creating a unified, digital repository of educational resources.