Introduction to NLP to SQL
With the availability of powerful large language models, we now have the capability to convert natural language into accurate SQL (NLP to SQL) with a single callout, enabling users to express their information needs naturally and efficiently.
In this blog, we will explore the creation of a Data Questioning and Answering web application.
Technology Stack
Python
Azure OpenAI
Sqlite3
Flow
ER Diagram
Prompt
While making a call to the Azure OpenAI model, we need to carefully craft our prompt. Precision is crucial, as an ambiguous prompt may result in unintended outcomes.
We have set up the following prompt, where we replace the user’s NLP query.
prompt = “””
Given the following SQL tables, your job is to write queries given a user’s request.
CREATE TABLE Branches (
BranchID INTEGER PRIMARY KEY,
City TEXT,
Country TEXT,
EmployeeNumber INTEGER,
GM TEXT
)
CREATE TABLE IF NOT EXISTS Departments (
DepartmentID INTEGER PRIMARY KEY,
Name TEXT
)
CREATE TABLE IF NOT EXISTS Employees (
EmployeeID INTEGER PRIMARY KEY,
Name TEXT,
Position TEXT,
Salary REAL,
DepartmentID INTEGER,
BranchID INTEGER,
FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID),
FOREIGN KEY (BranchID) REFERENCES Branches(BranchID)
)
”’)
Desired Format: Return valid sql query strictly for above tables otherwise return undefined
Translate the following natural language query into SQL: {}
SQL Query:
“””.format(user_nlp_query)
Azure OpenAI
NLP to SQL is one of the advantages of the Azure OpenAI model we can leverage for our Data Questioning and Answering application. In the code below, we have used the text-davinci-003 model, which converts human language to SQL
# Using Azure OpenAI API
from openai import AzureOpenAI
api_key = config[“AZURE_OPENAI_KEY”]
client = AzureOpenAI(
azure_endpoint=”https://vc-test-openai.openai.azure.com/”,
api_key = api_key,
api_version=”2023-09-15-preview”,
)
response = client.completions.create(
model=”vc-text-davinci-003″,
prompt=prompt,
max_tokens=1024
)
HTML
NLP to SQL Examples
Below are some human language queries I tried to input through an HTML form. I validated their corresponding SQL queries and their actual responses against the database. There are situations where we receive an SQL query that results in an error when executed on the actual database. We handle this scenario by asking the user a different or more appropriate question
/* 1. List all employees with their corresponding department names. */
T Employees.Name, Departments.Name
FROM Employees
LEFT JOIN Departments
ON Employees.DepartmentID = Departments.DepartmentID;
/* 2. Find the total number of employees in each department. */
SELECT DepartmentID, COUNT(*) AS Total_Employees
FROM Employees
GROUP BY DepartmentID;
/* 3. Identify employees who are managers along with their department names. */
SELECT e.Name, e.Position, d.Name
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID
WHERE e.Position = ‘Manager’
/* 4. List all employees in the IT department with their salaries. */
SELECT EmployeeID, Name, Position, Salary FROM Employees
WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE Name = ‘IT’)
/* 5. Find the average salary for employees in each department. */
SELECT DepartmentID, AVG(Salary)
FROM Employees
GROUP BY DepartmentID;
/* 6. Identify the department with the highest total salary expense. */
SELECT Departments.Name, SUM(Employees.Salary) AS TotalExpense
FROM Employees
INNER JOIN Departments
On Employees.DepartmentID = Departments.DepartmentID
GROUP BY Departments.Name
ORDER BY TotalExpense DESC
LIMIT 1;
/* 7. List employees who do not belong to any department. */
SELECT * FROM Employees WHERE DepartmentID IS NULL;
/* 8. Find the department with the least number of employees. */
SELECT Departments.Name
FROM Employees
LEFT JOIN Departments
ON Departments.DepartmentID = Employees.DepartmentID
GROUP BY Departments.Name
ORDER BY COUNT(Employees.EmployeeID) ASC
LIMIT 1;
/* 9. List employees who belong to the Finance department and have a salary above 1000 */
SELECT *
FROM Employees
WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE Name = ‘Finance’)
AND Salary > 1000;
/* 10. What is the salary of employee ID 101?*/
“SELECT Salary FROM Employees WHERE EmployeeID = 101;
Hallucination
Hallucination refer to instances where the model generates or predicts SQL queries that are not accurate or relevant to the user’s intent. This might result in incorrect or unintended database operations. I have tried all the human language queries above, but it didn’t generate invalid SQL
Challenges
Users can manipulate data either through a user form or via a prompt. We should handle all scenarios where the end user can only retrieve data from the application. In the above prompt section, I have included the desired format to constrain the model to generate a valid query; otherwise, it returns ‘undefined’ to handle invalid user queries. User can ask anything which is not relevant to a perticualar database.
@app.route(“/qa”, methods=[“POST”])
def prompt_to_qa():
nlp_query = request.form.get(“query”)
sql_query = get_sql_query(nlp_query)
print(sql_query)
if(sql_query.strip().lower() == “undefined”):
output = “Not a valid question”
else:
result = str(execute_sql_query(sql_query))
check_json = is_json(result)
if(check_json):
output = json.loads(result)
else:
output = result
return { “nlp_query”: nlp_query, “sql_query”: sql_query, “result”: output}
Conclusion
You can modify the technology stack mentioned above to streamline the development of applications, regardless of UI, LLM, or database. Additionally, you can leverage libraries such as pandas. By experimenting with different combinations of the technology stack, we can enhance the capabilities of the application.
Important Links
AI Blogs by Perficient
https://platform.openai.com/examples/default-sql-translate
Leave A Comment