KB Search

`lean_automator.kb.search` ¶

Generates embeddings and performs semantic search in the Knowledge Base.

This module provides functions for generating text embeddings using a Gemini client and performing semantic search within the Knowledge Base by comparing vector similarity (cosine similarity) between a query embedding and stored embeddings.

Classes¶

Functions¶

`generate_embedding(text: str, task_type: str, client: GeminiClient) -> Optional[np.ndarray]` `async` ¶

Generates an embedding for the given text.

Uses the configured Gemini client to create a vector representation of the input text suitable for the specified task type.

Parameters:

Name	Type	Description	Default
`text`	`str`	The text content to embed.	required
`task_type`	`str`	The task type for the embedding (e.g., "RETRIEVAL_DOCUMENT", "RETRIEVAL_QUERY").	required
`client`	`GeminiClient`	An initialized GeminiClient instance.	required

Returns:

Type	Description
`Optional[ndarray]`	Optional[np.ndarray]: A numpy array representing the embedding vector,
`Optional[ndarray]`	or None if generation fails or the client is unavailable.

Source code in lean_automator/kb/search.py

async def generate_embedding(
    text: str, task_type: str, client: GeminiClient
) -> Optional[np.ndarray]:
    """Generates an embedding for the given text.

    Uses the configured Gemini client to create a vector representation of the
    input text suitable for the specified task type.

    Args:
        text (str): The text content to embed.
        task_type (str): The task type for the embedding (e.g.,
            "RETRIEVAL_DOCUMENT", "RETRIEVAL_QUERY").
        client (GeminiClient): An initialized GeminiClient instance.

    Returns:
        Optional[np.ndarray]: A numpy array representing the embedding vector,
        or None if generation fails or the client is unavailable.
    """
    if not client:
        warnings.warn("GeminiClient not available for embedding generation.")
        return None
    if not text:
        warnings.warn("Attempted to generate embedding for empty text.")
        return None

    try:
        # Assumes client.embed_content returns a list of embeddings,
        # even for single input. We take the first one.
        embeddings_list = await client.embed_content(contents=text, task_type=task_type)
        if embeddings_list and embeddings_list[0]:
            return np.array(embeddings_list[0], dtype=EMBEDDING_DTYPE)
        else:
            warnings.warn(
                f"Embedding generation returned empty result for task '{task_type}'."
            )
            return None
    except Exception as e:
        warnings.warn(f"Error generating embedding: {e}")
        return None

`find_similar_items(query_text: str, search_field: str, client: GeminiClient, *, task_type_query: str = 'RETRIEVAL_QUERY', db_path: Optional[str] = None, top_n: int = 5) -> List[Tuple[KBItem, float]]` `async` ¶

Finds KBItems with embeddings similar to the query text.

Generates an embedding for the query text and performs a brute-force cosine similarity search across all items in the database that have a pre-computed embedding for the specified field ('nl' or 'latex').

Parameters:

Name	Type	Description	Default
`query_text`	`str`	The natural language query.	required
`search_field`	`str`	Which embedding field to search against ('nl' or 'latex').	required
`client`	`GeminiClient`	An initialized GeminiClient instance used for generating the query embedding.	required
`task_type_query`	`str`	The task type for embedding the query. Defaults to "RETRIEVAL_QUERY".	`'RETRIEVAL_QUERY'`
`db_path`	`Optional[str]`	Path to the database file. If None, uses DEFAULT_DB_PATH. Defaults to None.	`None`
`top_n`	`int`	The maximum number of similar items to return. Defaults to 5.	`5`

Returns:

Type	Description
`List[Tuple[KBItem, float]]`	List[Tuple[KBItem, float]]: A list of tuples, each containing a
`List[Tuple[KBItem, float]]`	matching KBItem object and its similarity score (float between -1 and 1).
`List[Tuple[KBItem, float]]`	The list is sorted by similarity score in descending order. Returns an
`List[Tuple[KBItem, float]]`	empty list if the client is unavailable, embedding generation fails,
`List[Tuple[KBItem, float]]`	database access fails, no items have embeddings, or no matches are found.

Raises:

Type	Description
`ValueError`	If `search_field` is not 'nl' or 'latex'.