Five

AI LAB – AI assisted data structuring

Written by Steve Milne | Sep 5, 2023 11:00:00 PM

Although chatGPT steals most of the headlines, it’s far from being the only tool you need to consider when adopting an AI chat sidekick for your workflows.

Claude.ai recently released version 2 of their competitor to chatGPT, and it has some distinct advantages over chatGPT thanks to accepting file attachments without the need for plugins.

This makes it particularly suited to turning unstructured data into a structured tabular form that can be used elsewhere.

To demonstrate, I grabbed a sale catalogue from a local auction house which usually lists hundreds of items in the order that they are added to the sale, with minimal information. This week is a general tools and house clearance sale with a particularly mixed selection:

 

 

This can obviously make finding what you want difficult. Step one is to get an overview of the types of lots featured. The prompt is simple. Upload the file and request “create a table grouping the auction listings by category"

 

Excellent. Twenty seconds later, we have a comprehensive categorisation of the listings. Remember, the catalogue didn’t include a category, Claude created the categorisation having reviewed the whole document.

Now let’s quickly check that we agree with the categorisations:

 

 

Another 20 seconds, and we can confirm that this looks good.

What about the listings themselves, though, those numbers are useful, but it would be better to have the descriptions in place to make finding things easy.

 

 

This is an excellent example of why asking the right question, often grandly called “prompt engineering” is so critical. We asked for “listing number” and that’s what we got, the number on *this* list. We need the number from the original catalogue.

Try again.

 

 

Excellent. Within around 90 seconds, we took a Word document that included a randomly ordered listing of hundreds of items with inconsistent descriptions and turned it into a structured table of only the category of items that we are interested in. A table that can be cut and paste for use anywhere.

This is where it gets interesting, and the second key tip relating to using an AI assistant comes in. A technique that is particularly effective with Claude 2. Treat your requests as a conversation, not as individual questions. Trust that Claude will remember what’s happening and build upon it.

Instead of having to repeat this for all categories, just ask Claude to do it for you.

 

 

We now have a set of tables that are far more useful than the original catalogue. We could go further and introduce sub-categories, alphabetise, or list only those items whose catalogue number is prime. The speed of delivery means that these are low-cost additions, so even if they turn out to be of little value, they can be experimented with quickly.

Going from an unstructured Word document to structured tables took no more than three minutes. The same task manually would be significantly longer, not least due to the need to select a category for each item.

This is a trivial example with a single document, but it illustrates a key power of Claude 2.

  • Upload unstructured data

  • Ask a sequence of questions to add layers of structure

  • Sense check as you go

  • Get specific with one ‘slice’ of the data until you are happy with the results

  • Ask a final ‘do that for everything’ question

Currently, Claude 2 is free to use and has no waiting list. Just create an account, and you’re ready to turn all that unstructured mess into valuable structured data.