Introduction to LLM-Powered Email Classification
Since the introduction of AI functions on Databricks, LLMs (Large Language Models) can be easily integrated into any data workflow. Analysts and business users who may not know Python or ML/AI infrastructure can complete advanced AI tasks directly from SQL queries.
Part 1: AI Functions
To classify emails, we will use ai_query()
, part of Databricks AI functions. Suppose we have available the following fields:
endpoint
: the name of the model endpoint we intend to use.request
: the prompt, which includes the “Email_body”.modelParameters
: additional parameters that we can pass to the LLM.
Implementing Email Classification
The prompt template used in this example is based on the research of Si et al. (2024), who designed and tested a few-shot prompt template for email spam detection. We can finally combine all the elements seen above in a single SQL query, running batch inference on all the emails, and generating the labels.
Part 2: Access to Gmail APIs
To ingest emails automatically, we will need to access Gmail APIs. Here is a step-by-step guide on how to use Gmail APIs:
- Configure your Gmail account to work with APIs.
- Access Gmail Mailbox from Databricks Notebooks.
Configuring Gmail Account
The recommended approach to enable Google APIs on your account is to use Service Accounts. However, for this demo, we are using a dummy Gmail account, so we will follow a more manual approach to authenticate to Gmail.
Accessing Gmail Mailbox
To authenticate to Gmail from a Databricks Notebook, we can use a function implemented in the repo. The function requires:
- For first-time access, the credentials JSON file, which can be saved in a volume.
- For future access, active credentials will be stored in a token file that will be reused.
Reading Emails from Gmail
Once we have authenticated, we can read emails from Gmail using a function, save email information to a Spark DataFrame, and eventually to a Delta Table.
Conclusions
In summary, this post demonstrated how straightforward it is to set up AI Functions and leverage LLMs to automate workflows across your organization. We shared a practical prompt template, designed for effective email classification using few-shot learning. We walked through integrating Gmail APIs directly within Databricks Notebooks.
FAQs
Q: What is LLM-Powered email classification?
A: LLM-Powered email classification is a method of using Large Language Models to classify emails based on their content.
Q: What is the purpose of AI functions in Databricks?
A: The purpose of AI functions in Databricks is to integrate LLMs into any data workflow, allowing analysts and business users to complete advanced AI tasks directly from SQL queries.
Q: How do I access Gmail APIs?
A: To access Gmail APIs, you need to configure your Gmail account to work with APIs and use a service account or authenticate manually.
Q: What is the benefit of using Service Accounts?
A: The benefit of using Service Accounts is that it eliminates the need for manual authentication and allows for automated access to Gmail APIs.