Skip to content

πŸ“‘ Index

siddhant-mi edited this page Mar 11, 2024 · 4 revisions

The process of indexing involves incorporating context embeddings into the vector store. MindSQL provides a index() method, allowing users to store pertinent contexts such as table schemas (DDL statements), question-SQL pairs, and miscellaneous documentation associated with the database, business, or industry. These stored contexts are essential for the Large Language Model (LLM) to grasp the context of a user's question effectively. The index() method plays a pivotal role in training the system, enabling it to interpret and respond proficiently to a diverse array of inquiries.

πŸ–ŠοΈ Adding Question-SQL Pair:

When adding a question-SQL pair, you are directly instructing the system on how to interpret a specific user query. This is achieved by associating a question with its corresponding SQL query, allowing the system to understand the user's intent and retrieve relevant data accurately.

index(
    question="What is the average salary?", 
    sql="SELECT AVG(salary) FROM employees"
)

In this example, the user's question "What is the average salary?" is associated with the SQL query "SELECT AVG(salary) FROM employees". This pairing provides the system with a direct mapping between user queries and SQL queries, enabling it to respond appropriately to similar queries in the future.

πŸ’» Adding DDL Statement:

When adding a Data Definition Language (DDL) statement, you are defining the structure of database objects such as tables, columns, and data types. By incorporating DDL statements into the system's knowledge base, you provide essential schema information that enhances the system's ability to interpret and respond to database-related queries effectively.

index(ddl="CREATE TABLE employees (id INT, name VARCHAR(50), salary FLOAT)")

In this example, a DDL statement is used to create a table named "employees" with three columns: "id" of type INT, "name" of type VARCHAR(50), and "salary" of type FLOAT. By adding this DDL statement to the system's vectorstore, the system gains a deeper understanding of the database schema, enabling it to handle queries related to the "employees" table more effectively.

πŸ“ Adding Documentation String:

Adding a documentation string involves providing context-specific information crucial for the system to understand the domain it operates in. This documentation can include details about business logic, industry-specific terminology, or any other relevant information necessary for accurate interpretation of user queries.

index(documentation="The salaries of employees are in USD($)")

πŸ’½ Bulk Data Addition: To efficiently integrate multiple question-SQL pairs from an external source, use the index method with the bulk=True flag and specify the path to the JSON file (data.json). This allows for quick ingestion and integration of data into the system's knowledge base.

index(bulk=True, path="data.json")

Below is an example of the structure of the JSON file (data.json) used for bulk data addition:

[
  {
    "Question": "What is the average salary?",
    "SQLQuery": "SELECT AVG(salary) FROM employees"
  },
  {
    "Question": "Retrieve the latest order for each customer, including customer details and order date.",
    "SQLQuery": "SELECT Customers.CustomerID, Customers.ContactName, Orders.OrderID, MAX(Orders.OrderDate) AS LatestOrderDate FROM Customers LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID GROUP BY Customers.CustomerID, Customers.ContactName, Orders.OrderID;"
  },
  {
    "Question": "List suppliers who supply both beverages and dairy products.",
    "SQLQuery": "SELECT DISTINCT Suppliers.SupplierID, Suppliers.SupplierName FROM Suppliers INNER JOIN Products ON Suppliers.SupplierID = Products.SupplierID INNER JOIN Categories ON Products.CategoryID = Categories.CategoryID WHERE Categories.CategoryName IN ('Beverages', 'Dairy Products') GROUP BY Suppliers.SupplierID, Suppliers.SupplierName HAVING COUNT(DISTINCT Categories.CategoryID) = 2;"
  }
]

In this JSON file, each entry represents a question-SQL pair. The "Question" field contains the user's query, while the "SQLQuery" field contains the corresponding SQL query to retrieve the relevant data.

βš™οΈ Parameters:

  • question (str): Specifies the question or query to be added to the vectorstore.
  • sql (str): Specifies the corresponding SQL query associated with the provided question.
  • ddl (str): Defines a Data Definition Language (DDL) statement to structure database objects.
  • documentation (str): Provides additional context or documentation relevant to the system's understanding.
  • bulk (bool): Indicates whether to process data in bulk from an external JSON file.
  • path (str): Specifies the file path pointing to the JSON file containing bulk data for ingestion.
Clone this wiki locally