Skip to content

Chat with Louis Rossmann | Navigating all of Louis's videos with Semantic Search and RAG

Posted on:October 9, 2023 at 10:00 AM
By: skeptrune @ arguflow

Chat with Louis Rossmann | Navigating all of Louis’ videos with Semantic Search and RAG

At Arguflow, we provide an open source toolkit for quickly standing up semantic search and retrieval-augmented generation (RAG) on your data sources.

We’ve been hard at work on demos showcasing how you can use Arguflow to build Search and RAG products. Last week, we released our Enron Demo, allowing you to Search and Chat over the entirety of the Enron Corpus.

Rust crab github star us or else

Now, we’re really excited to kickoff our new series of demos with popular online influencers. Those of you that are active in the right-to-repair scene may appreciate this one. Introducing Search and RAG over the entirety of Louis Rossmann’s video archive!

For those that don’t know Louis, he is an American independent repair technician, right to repair activist, and Youtuber with over 1.8 million subscribers. He is the owner and operator of Rossmann Repair Group in Austin, Texas. And is a partner at FUTO, where we are participating in their Summer Fellows Class.

Chat with and Search Transcripts of all of Louis’s Videos Yourself

This was a particularly unique Dataset to explore. As open-source advocates ourselves, seeing some of Louis’s most common topics, in particular about Google, was really funny and interesting

The demo search and chat experiences are publicly available for you to try out and use at louis-search.arguflow.ai and louis-chat.arguflow.ai.

How to self-host and deploy a mirror yourself

Follow our self-hosting guide on docs.arguflow.ai to stand up the REST API and frontends. Use the Youtube API to extract all the video IDs of his channel. Iterate over each video and use some sort of audio-to-text model to convert each video into a transcript and upload them. The code will look something like this:

for key in keys:
    try:
        video_url = key[8:]  # @param {type:"string"}
        video = YouTube(video_url)
        audio_file = (
            video.streams.filter(only_audio=True).first().download(
                filename="audio.mp4")
        )
        transcription = whisper_model.transcribe(audio_file, verbose=False)
        df = pd.DataFrame(transcription["segments"], columns=[
                          "start", "end", "text"])
        for start in range(0, len(df), 8):
            end = min(start + 8, len(df))
            chunk = df.iloc[start:end]
            text = chunk["text"].astype(str).str.cat(sep="")
            data = {
                "card_html": text,
                "link": video_url + f"&t={math.floor(chunk.iloc[0]['start'])}",
                "private": False,
                "metadata": {
                    "Title": video.title,
                    "Description": video.description,
                    "Thumbnail": video.thumbnail_url,
                    "Channel": video.author,
                    "Duration": video.length,
                    "Uploaded At": video.publish_date.strftime("%Y-%m-%d %H:%M:%S"),
                },
            }
            print(data)
            response = requests.post(
                "http://localhost:8090/api/card", json=json.dumps(data)
            )
            print(video_url + f"&t={math.floor(chunk.iloc(0)['start'])}")
            if response.status_code != 200:
                print(f"Error: {response.text}")
                r.set(
                    "Error: " + video_url +
                    f"&t={math.floor(chunk.iloc(0)['start'])}",
                    "Error",
                )
        r.delete(key)
    except Exception as e:
        print("Error: " + str(e))
        r.set("Error: " + video_url, "Error")
        continue

Uploading data to server

Send a video transcription to the server using python

data = {
    "card_html": text,
    "link": video_url + f"&t={math.floor(chunk.iloc[0]['start'])}",
    "private": False,
    "metadata": {
        "Title": video.title,
        "Description": video.description,
        "Thumbnail": video.thumbnail_url,
        "Channel": video.author,
        "Duration": video.length,
        "Uploaded At": video.publish_date.strftime("%Y-%m-%d %H:%M:%S"),
    },
}
response = requests.post(
    "https://your-api-example.com/api/card",
    json=json.dumps(data)
)

The full example of this can be seen in the repo here

Favorite Searches from the Dataset

Using Semantic Search you can pull some unique results from the Data, we even included timestamps in the card metadata so you can go to the exact moment in the video that certain things were said.

Some of our favourite searches to run include

Conclusion!

This is the first of our upcoming influencer demos, and given our admiration for Louis we had a lot of fun compiling this one. We hope you have fun playing with it or standing up your own version on a different persona you follow or enjoy.

If you found this interesting, or got use out of the self-host tutorial, please star us on github.

me and the arguflow github star meme wojak