Skip to content

Chat with Enron's Past | Navigating the Enron Email Corpus with RAG and semantic search

Posted on:September 28, 2023 at 10:00 AM
By: skeptrune @ arguflow

Chat with Enron’s Past: Navigating the Enron Email Corpus with RAG and semantic search

At Arguflow, we provide an open source toolkit for quickly standing up semantic search and retrieval-augmented generation (RAG) on your data sources.

To that end, we are launching several demos showing how you can use Arguflow to build search and RAG products. Today, we are excited to unviel search and RAG for the Enron Email Corpus!

If you think this interesting or helpful, then please star our github project!

PCs pointing at github star meme

Chat with Enron and explore the email corpus yourself

Enron can cure male loneliness

The demo search and chat experiences are publicly available for you to try out and use at and

Chat is especially fun. You can literally talk to Enron.

How to self-host and deploy a mirror yourself

We did not do anything to clean the dataset, so the search results can frequently be very noise. By cleaning the dataset before upload, you could stand up a much higher quality search experience.

  1. Follow our self-hosting guide on to stand up the REST API and frontends.
  2. Download the dataset from the CMU page that hosts it
  3. Iterate over the csv and make the cards as you desire from there. The code for doing so will look roughly as follows:
headers = {'Content-Type': 'application/json',
           "Authorization": "af-pEJaygALr3ony0WkVv18JtOKccwCn7sj"}
data = {
    "card_html": row[-2],
    "link": row[2],
    "private": False,
    "metadata"  : {
        "Message-ID": row[1],
        "Date": row[3],
        "From": row[4][12:-2],
        "To": row[5][12:-2],
        "Subject": row[6],
        "X-From": row[7],
        "X-To": row[8],
        "X-CC": row[9],
        "X-BCC": row[10],
        "X-Folder": row[11],
        "X-Origin": row[12],
        "X-FileName": row[13],
        "User": row[15],
data ="http://localhost:8090/api/card", data=json.dumps(data), headers=headers)

You can look at the full code implementation including EDA here.

Our favorite themes from the dataset

Enron was an absolutely wild company. The dataset is insane. There are all kinds of fun bits you can find will searching. We compiled a few of our favorite themes as groupings of documents you can checkout below:


This was a fun project and will help us explain what we do to legal firms. If you are interested in self-hosting Arguflow and standing up something similar for your dataset, then please do get in touch!

If you find this helpful or interesting, then please also make sure to star us on Github!

the squad starring our github meme