GenAI In Action

Sep 18, 2024

On September 20th 2024 I gave the talk GenAI In Action: Real-World Applications. The goal of the talk is to show applications using GenAI solutions that go beyond the chat bot, although they may still be based on Large Language Models.

The talk weaves theory with demos. Every demo it supposed to run locally. On a laptop. Because I have a peculiar setup, not all of the models use the GPU. Still, all demos ran reasonably fast.

Demo application

The demo application is a simple TODO-app supercharged with a number of AI-powered features. Specifically, these features are OCR, audio/video transcription, semantic search of notes, images and audio/video, summarizing notes and extracting actions from notes.

Find the demo application on GitHub.

OCR

In the first demo we use EasyOCR to scan the text in an image. As a bonus, and this has little to do with AI, we overlay the text on the image so that it can be selected, copied, etc.

Transcribe A/V

Whisper by OpenAI is a great audio transcribing model, but has as a downside that it can only take 30 seconds of audio as input. Silero VAD can scan an audio file for parts with sound and divide it into chunks, that we can then feed to Whisper.

Semantic Search

In order to enable semantic search, including in the videos, we use faiss, an open source similarity search library, as an in-memory vector database. MiniLM is used for vector encoding our notes as well as the search query.

Summary

For summarizing I wanted to use something like phi3, but couldn't get that to recognize my GPU which made it unbearably slow.

I ended up using bart-large-cnn, which is a finetuned vversion of bart-large, trained on CNN Daily Mail, which is a huge set of (text, summary) pairs.

Generate Actions

Similarly for scanning our notes for actions I ended up using a bart model. This time I used bart-large-mnli, which is strong in natural language inference.

Sheets