Skip to content

Case Studies

Skip Manual Labeling: How You Can Automatically Caption Images with Spacial Awareness for Any Product

Have you ever stared at thousands of product images, dreading the manual labor of tagging each one for AI training?

Capturing every nuance by hand is a daunting (and expensive) task.

Yet structured annotations are the lifeblood of machine learning.

The rule is simple: garbage in, garbage out.

A high quality image caption needs to capture:

  • Exact object locations in complex scenes
  • Relationships with surrounding elements
  • Environmental context and lighting conditions
  • Consistent descriptions at scale

That's exactly where our client found themselves - facing 10,000+ images of custom textured walls that needed precise labeling for fine-tuning a diffusion model.

Using a combination of Florence 2, GPT-4o Vision, and the Instructor library, you'll see how to build a reliable system that:

  • Automatically detects and localizes objects
  • Generates structured, validated descriptions
  • Handles spatial relationships systematically
  • Scales from 50 to 50,000+ images without compromising quality

Best of all? We did it without any custom models or infrastructure.

Here's the complete technical breakdown of how we turned a month-long manual process into an automated pipeline that runs in hours.

How You Can Save 20,000+ Hours a Year with a Secure, GPT-Driven Meeting to Email Workflow

Your team is wasting thousands of hours manually writing follow-up emails after Zoom meetings.

Every day, they:

  • Battle with meeting recordings
  • Miss capturing action items
  • Triple-check that sensitive data hasn't been exposed

For a mid-sized organization, this adds up to tens of thousands of wasted hours annually.

What if you could transform every Zoom transcript into a perfectly structured follow-up email in under 60 seconds, while keeping your sensitive data completely secure?

This post will show you how to:

  • Build a GPT-powered system that automatically converts meetings into action-ready emails
  • Protect sensitive data by keeping everything in your control
  • Save your organization 20,000+ hours annually on email drafting
  • Ensure 100% accuracy with domain-specific terminology correction
  • Create traceable links between action items and meeting timestamps

See It In Action

In this demo, you'll see:

  • A real meeting transcript being processed in under 60 seconds
  • The automated extraction of key points and action items
  • How sensitive data is handled securely
  • The final formatted email output ready to send

This automated workflow reduces a 30-minute manual process to just a few clicks while maintaining complete data security and accuracy.


The Real Cost of Manual Meeting Follow-ups

For a team of 50 people averaging just two client calls per week, manual follow-up emails waste 12,500 hours annually.

Here's what your team currently spends 30 minutes doing after every call:

The Secret to Better LLM Outputs: Multiple Structured Reasoning Steps

Traditional chain-of-thought prompting is leaving performance on the table.

While working on a recent client project, we A/B tested different prompting approaches.

Breaking LLM reasoning into multiple structured steps was preferred 80% of the time over traditional methods.

Instead of one meandering thought stream, we can greatly boost reliability now get precise by using a tightly controlled response model

  • Analyze the example structure
  • Analyze the example style
  • Generate the output based on the previous steps

I'll show you exactly how to implement this approach using the Instructor library, with real examples you can use today.

You Don't Need to Fine-Tune to Clone YOUR Report Style

"This doesn't sound like us at all."

It's the all-too-familiar frustration when organizations try using AI to generate reports and documentation.

While AI can produce grammatically perfect content, it often fails at the crucial task of matching an organization's voice - turning what should be a productivity boost into a major bottleneck.

I'll show you how we solved this using a novel two-step approach that separates style from data.

By breaking down what seemed like an AI fine-tuning problem into a careful prompt engineering solution, we achieved something remarkable:

AI-generated reports that practitioners couldn't distinguish from their own writing.

Here's what we delivered:

  • Style matching so accurate that practitioners consistently approved the outputs
  • Complete elimination of data contamination from example reports
  • A solution that scales effortlessly from 10 to 1000 users
  • Zero need for expensive fine-tuning or ML expertise

Best of all? You can implement this approach yourself using prompt engineering alone - no complex ML infrastructure required.