Skip to content

LaTeX Processor

lean_automator.latex.processor

Handles LaTeX generation and review for Knowledge Base items using an LLM.

This module orchestrates the process of generating LaTeX representations (both statement and optional informal proof) for mathematical items stored in the Knowledge Base (KBItem). It uses a provided LLM client (GeminiClient) to generate and subsequently review the LaTeX content based on predefined prompts and the item's context (description, dependencies). The process involves iterative refinement cycles until the LaTeX is accepted by the LLM reviewer or a maximum number of cycles is reached. It updates the status and content of the KBItem in the database accordingly.

Classes

Functions

generate_and_review_latex(unique_name: str, client: GeminiClient, db_path: Optional[str] = None) -> bool async

Generates, reviews, and refines LaTeX for a KBItem iteratively.

This function orchestrates the main workflow for obtaining accepted LaTeX for a given KBItem identified by its unique_name.

  1. Fetches the item and its dependencies from the database.

  2. Checks if the item's current status requires LaTeX processing.

  3. Enters a loop (up to MAX_REVIEW_CYCLES):

    a. Calls the LLM generator (_call_latex_statement_and_proof_generator) to produce/refine the LaTeX statement and proof (if applicable), providing previous feedback if it's not the first cycle.

    b. Parses the generator's output using _parse_combined_latex.

    c. Calls the LLM reviewer (_call_latex_statement_and_proof_reviewer) to evaluate the generated LaTeX.

    d. Parses the reviewer's judgment and feedback using _parse_combined_review.

    e. If accepted, updates the KBItem with the accepted LaTeX, sets status to LATEX_ACCEPTED, saves it (triggering embedding generation), and returns True.

    f. If rejected, stores the feedback, updates status, saves, and continues the loop.

  4. If the loop finishes without acceptance, sets status to LATEX_REJECTED_FINAL and returns False.

Handles errors during the process, updating the item status to ERROR.

Parameters:

Name Type Description Default
unique_name str

The unique name of the KBItem to process.

required
client GeminiClient

An initialized LLM client instance.

required
db_path Optional[str]

Path to the database file. If None, uses DEFAULT_DB_PATH.

None

Returns:

Name Type Description
bool bool

True if LaTeX was successfully generated and accepted within the

bool

allowed review cycles, False otherwise (including errors or final rejection).

Source code in lean_automator/latex/processor.py
async def generate_and_review_latex(
    unique_name: str, client: GeminiClient, db_path: Optional[str] = None
) -> bool:
    """Generates, reviews, and refines LaTeX for a KBItem iteratively.

    This function orchestrates the main workflow for obtaining accepted LaTeX
    for a given `KBItem` identified by its `unique_name`.

    1. Fetches the item and its dependencies from the database.

    2. Checks if the item's current status requires LaTeX processing.

    3. Enters a loop (up to `MAX_REVIEW_CYCLES`):

        a. Calls the LLM generator (`_call_latex_statement_and_proof_generator`)
           to produce/refine the LaTeX statement and proof (if applicable),
           providing previous feedback if it's not the first cycle.

        b. Parses the generator's output using `_parse_combined_latex`.

        c. Calls the LLM reviewer (`_call_latex_statement_and_proof_reviewer`)
           to evaluate the generated LaTeX.

        d. Parses the reviewer's judgment and feedback using `_parse_combined_review`.

        e. If accepted, updates the KBItem with the accepted LaTeX, sets status to
           `LATEX_ACCEPTED`, saves it (triggering embedding generation), and
           returns True.

        f. If rejected, stores the feedback, updates status, saves, and continues
           the loop.

    4. If the loop finishes without acceptance, sets status to `LATEX_REJECTED_FINAL`
       and returns False.

    Handles errors during the process, updating the item status to `ERROR`.

    Args:
        unique_name (str): The unique name of the KBItem to process.
        client (GeminiClient): An initialized LLM client instance.
        db_path (Optional[str]): Path to the database file. If None, uses
            `DEFAULT_DB_PATH`.

    Returns:
        bool: True if LaTeX was successfully generated and accepted within the
        allowed review cycles, False otherwise (including errors or final rejection).
    """
    # Check if necessary storage functions are available
    if not all([KBItem, ItemStatus, ItemType, get_kb_item_by_name, save_kb_item]):
        logger.critical(
            "KB Storage module components not loaded correctly. Cannot process LaTeX."
        )
        return False
    if not client:
        logger.error(
            f"GeminiClient not provided for LaTeX processing of {unique_name}."
        )
        return False

    effective_db_path = db_path or DEFAULT_DB_PATH
    item = get_kb_item_by_name(unique_name, effective_db_path)

    if not item:
        logger.error(f"LaTeX Processing: Item not found: {unique_name}")
        return False

    # --- Status Checks ---
    # Define statuses that trigger processing
    trigger_statuses = {
        ItemStatus.PENDING,
        ItemStatus.PENDING_LATEX,
        ItemStatus.LATEX_REJECTED_FINAL,
        ItemStatus.PENDING_LATEX_REVIEW,
    }  # Added review status
    # Define statuses that mean LaTeX is already acceptable
    already_accepted_statuses = {
        ItemStatus.LATEX_ACCEPTED,
        ItemStatus.PROVEN,
        ItemStatus.DEFINITION_ADDED,
        ItemStatus.AXIOM_ACCEPTED,
    }

    if item.status in already_accepted_statuses:
        logger.info(
            f"LaTeX Processing: Item {unique_name} status ({item.status.name})"
            f" indicates LaTeX is already acceptable. Skipping."
        )
        return True

    # Ensure item is in a state where LaTeX generation should start/resume
    if item.status not in trigger_statuses:
        logger.warning(
            f"LaTeX Processing: Item {unique_name} not in a trigger status "
            f"({ {s.name for s in trigger_statuses} }). Current: "
            f"{item.status.name}. Skipping."
        )
        return False

    original_status = item.status
    logger.info(
        f"Starting combined LaTeX processing for {unique_name}"
        f" (Status: {original_status.name})"
    )

    # --- Fetch Dependencies ---
    dependency_items: List[KBItem] = []
    logger.debug(f"Fetching dependencies for {unique_name}: {item.plan_dependencies}")
    if item.plan_dependencies:
        for dep_name in item.plan_dependencies:
            dep_item = get_kb_item_by_name(dep_name, effective_db_path)
            if dep_item:
                # Check if dependency has acceptable LaTeX statement
                if (
                    dep_item.latex_statement
                    and dep_item.status in already_accepted_statuses
                ):
                    dependency_items.append(dep_item)
                    logger.debug(
                        f"  - Including valid dependency: {dep_name}"
                        f" ({dep_item.status.name})"
                    )
                else:
                    logger.warning(
                        f"  - Dependency '{dep_name}' for '{unique_name}' excluded:"
                        f" Lacks accepted LaTeX statement (Status: "
                        f"{dep_item.status.name if dep_item.status else 'N/A'})."
                    )
            else:
                logger.error(
                    f"  - Dependency '{dep_name}' not found in KB for target"
                    f" '{unique_name}'."
                )
    else:
        logger.debug(f"No plan dependencies listed for {unique_name}.")

    # --- Start Processing Cycle ---
    try:
        # Initial status update to indicate processing has started
        item.update_status(ItemStatus.LATEX_GENERATION_IN_PROGRESS)
        await save_kb_item(
            item, client=None, db_path=effective_db_path
        )  # Save status update without generating embeddings yet

        # Initialize variables for the loop
        current_statement: Optional[str] = (
            item.latex_statement
        )  # Use existing LaTeX as starting point if available
        current_proof: Optional[str] = (
            item.latex_proof
        )  # Use existing proof if available
        review_feedback: Optional[str] = (
            item.latex_review_feedback
        )  # Use previous feedback if resuming
        accepted = False
        proof_expected = item.item_type.requires_proof()

        # --- Generation and Review Loop ---
        for cycle in range(MAX_REVIEW_CYCLES):
            logger.info(
                f"Combined LaTeX Cycle {cycle + 1}/{MAX_REVIEW_CYCLES} for"
                f" {unique_name}"
            )

            # --- Step (a): Generate / Refine Combined LaTeX ---
            item.update_status(
                ItemStatus.LATEX_GENERATION_IN_PROGRESS
            )  # Mark as generating
            await save_kb_item(
                item, client=None, db_path=effective_db_path
            )  # Save status

            raw_generator_response = await _call_latex_statement_and_proof_generator(
                item=item,
                dependencies=dependency_items,
                current_statement=current_statement,
                current_proof=current_proof,
                review_feedback=review_feedback,
                client=client,
            )

            item = get_kb_item_by_name(
                unique_name, effective_db_path
            )  # Refresh item state after potentially long LLM call
            if not item:
                raise Exception(f"Item {unique_name} vanished during LaTeX generation.")
            # Placeholder, actual prompt is complex
            item.generation_prompt = "See _call_latex_statement_and_proof_generator"
            item.raw_ai_response = raw_generator_response  # Store raw response

            if not raw_generator_response:
                logger.warning(
                    f"Combined LaTeX generator failed or returned empty response "
                    f"for {unique_name} in cycle {cycle + 1}."
                )
                item.update_status(
                    ItemStatus.ERROR,
                    f"LaTeX generator failed/empty in cycle {cycle + 1}",
                )
                await save_kb_item(item, client=None, db_path=effective_db_path)
                return False  # Critical failure

            # --- Step (a.2): Parse Generator Output ---
            parsed_statement, parsed_proof = _parse_combined_latex(
                raw_generator_response, item.item_type
            )

            if (
                parsed_statement is None
            ):  # Only statement parsing failure is always critical
                logger.warning(
                    f"Failed to parse statement from generator output for "
                    f"{unique_name} in cycle {cycle + 1}. "
                    f"Raw: {raw_generator_response[:500]}"
                )
                item.update_status(
                    ItemStatus.ERROR,
                    (
                        f"LaTeX generator output parsing failed (statement) in cycle "
                        f"{cycle + 1}"
                    ),
                )
                await save_kb_item(item, client=None, db_path=effective_db_path)
                return False  # Exit on critical parsing failure

            if proof_expected and parsed_proof is None:
                # If proof was expected but not parsed, log it but maybe let
                # reviewer catch it?
                # Or fail definitively? Let's log warning and proceed to review.
                # Reviewer should reject.
                logger.warning(
                    f"Expected proof for {unique_name} (type: "
                    f"{item.item_type.name}) but failed to parse from generator "
                    f"output in cycle {cycle + 1}."
                )
                # Proceed with parsed_statement and parsed_proof=None

            current_statement = parsed_statement  # Update working versions for review
            current_proof = parsed_proof

            # --- Step (b): Review Combined LaTeX ---
            item.update_status(ItemStatus.LATEX_REVIEW_IN_PROGRESS)  # Mark as reviewing
            # Save state before review (includes raw generator response)
            await save_kb_item(item, client=None, db_path=effective_db_path)

            raw_reviewer_response = await _call_latex_statement_and_proof_reviewer(
                item_type=item.item_type,
                latex_statement=current_statement,
                latex_proof=current_proof,  # Pass None if not expected/parsed
                unique_name=unique_name,
                nl_description=item.description_nl,
                dependencies=dependency_items,
                client=client,
            )

            item = get_kb_item_by_name(
                unique_name, effective_db_path
            )  # Refresh item state
            if not item:
                raise Exception(f"Item {unique_name} vanished during LaTeX review.")
            # Could store reviewer prompt/response too if needed
            item.raw_ai_response = (
                raw_reviewer_response  # Overwrite raw response with reviewer's output
            )

            if not raw_reviewer_response:
                logger.warning(
                    f"LaTeX reviewer failed or returned empty response for "
                    f"{unique_name} in cycle {cycle + 1}."
                )
                item.update_status(
                    ItemStatus.ERROR,
                    f"LaTeX reviewer failed/empty in cycle {cycle + 1}",
                )
                await save_kb_item(item, client=None, db_path=effective_db_path)
                return False  # Critical failure

            # --- Step (c): Parse and Process Review ---
            judgment, feedback, error_loc = _parse_combined_review(
                raw_reviewer_response, proof_expected
            )

            if judgment == "Accepted":
                logger.info(
                    f"Combined LaTeX for {unique_name} ACCEPTED by reviewer in "
                    f"cycle {cycle + 1}."
                )
                accepted = True
                item.latex_statement = current_statement  # Store accepted content
                item.latex_proof = current_proof  # Store accepted proof (or None)
                item.update_status(
                    ItemStatus.LATEX_ACCEPTED, review_feedback=None
                )  # Clear feedback on acceptance
                # Save final accepted state, potentially triggering embedding
                # generation in save_kb_item
                await save_kb_item(item, client=client, db_path=effective_db_path)
                break  # Exit review loop successfully
            else:
                # Rejected case
                logger.warning(
                    f"Combined LaTeX for {unique_name} REJECTED by reviewer in "
                    f"cycle {cycle + 1}. Location: {error_loc or 'N/A'}. "
                    f"Feedback: {feedback[:200]}..."
                )
                review_feedback = (
                    feedback  # Store feedback for the next generation attempt
                )
                item.latex_review_feedback = (
                    review_feedback  # Save feedback to item DB field
                )
                item.update_status(
                    ItemStatus.PENDING_LATEX_REVIEW
                )  # Set status indicating rejection needs rework
                # Save feedback and status update (no embedding generation needed here)
                await save_kb_item(item, client=None, db_path=effective_db_path)
                # Continue to the next cycle

        # --- After Loop ---
        if not accepted:
            logger.error(
                f"Combined LaTeX for {unique_name} was REJECTED after "
                f"{MAX_REVIEW_CYCLES} cycles."
            )
            # Get final item state before updating status
            item = get_kb_item_by_name(unique_name, effective_db_path)
            if item:
                # Set final rejected status, keeping the last feedback
                item.update_status(
                    ItemStatus.LATEX_REJECTED_FINAL, review_feedback=review_feedback
                )
                await save_kb_item(item, client=None, db_path=effective_db_path)
            return False  # Return False indicating failure after max cycles
        else:
            return True  # Return True as it was accepted within the loop

    except Exception as e:
        # Catch any unexpected errors during the process
        logger.exception(
            f"Unhandled exception during combined LaTeX processing for "
            f"{unique_name}: {e}"
        )
        try:
            # Attempt to set the item status to ERROR
            item_err = get_kb_item_by_name(unique_name, effective_db_path)
            if (
                item_err
                and item_err.status not in already_accepted_statuses
                and item_err.status != ItemStatus.LATEX_REJECTED_FINAL
            ):
                item_err.update_status(
                    ItemStatus.ERROR,
                    f"Unhandled LaTeX processor exception: {str(e)[:500]}",
                )
                await save_kb_item(item_err, client=None, db_path=effective_db_path)
        except Exception as final_err:
            # Log if even saving the error state fails
            logger.error(
                f"Failed to save final error state for {unique_name} after exception:"
                f" {final_err}"
            )
        return False

process_pending_latex_items(client: GeminiClient, db_path: Optional[str] = None, limit: Optional[int] = None, process_statuses: Optional[List[ItemStatus]] = None) async

Processes multiple items requiring LaTeX generation and review.

Queries the database for items in specified statuses (defaulting to PENDING_LATEX, PENDING, LATEX_REJECTED_FINAL), then iterates through them, calling generate_and_review_latex for each one up to an optional limit. Logs summary statistics upon completion.

Parameters:

Name Type Description Default
client GeminiClient

An initialized LLM client instance passed to the processing function for each item.

required
db_path Optional[str]

Path to the database file. If None, uses DEFAULT_DB_PATH.

None
limit Optional[int]

The maximum number of items to process in this batch. If None, processes all found items.

None
process_statuses Optional[List[ItemStatus]]

A list of ItemStatus enums indicating which items should be processed. If None, defaults to [PENDING_LATEX, PENDING, LATEX_REJECTED_FINAL].

None
Source code in lean_automator/latex/processor.py
async def process_pending_latex_items(
    client: GeminiClient,
    db_path: Optional[str] = None,
    limit: Optional[int] = None,
    process_statuses: Optional[List[ItemStatus]] = None,
):
    """Processes multiple items requiring LaTeX generation and review.

    Queries the database for items in specified statuses (defaulting to
    PENDING_LATEX, PENDING, LATEX_REJECTED_FINAL), then iterates through them,
    calling `generate_and_review_latex` for each one up to an optional limit.
    Logs summary statistics upon completion.

    Args:
        client (GeminiClient): An initialized LLM client instance passed to the
            processing function for each item.
        db_path (Optional[str]): Path to the database file. If None, uses
            `DEFAULT_DB_PATH`.
        limit (Optional[int]): The maximum number of items to process in this batch.
            If None, processes all found items.
        process_statuses (Optional[List[ItemStatus]]): A list of `ItemStatus` enums
            indicating which items should be processed. If None, defaults to
            `[PENDING_LATEX, PENDING, LATEX_REJECTED_FINAL]`.
    """
    # Check if necessary components are available
    if not all([ItemStatus, get_items_by_status, get_kb_item_by_name, save_kb_item]):
        logger.critical(
            "KB Storage or required components not loaded correctly. "
            "Cannot batch process LaTeX items."
        )
        return

    effective_db_path = db_path or DEFAULT_DB_PATH
    # Default statuses to process if none provided
    if process_statuses is None:
        process_statuses = [
            ItemStatus.PENDING_LATEX,
            ItemStatus.PENDING,
            ItemStatus.LATEX_REJECTED_FINAL,
            ItemStatus.PENDING_LATEX_REVIEW,
        ]  # Added review

    processed_count = 0
    success_count = 0
    fail_count = 0
    items_to_process = []

    logger.info(
        f"Starting LaTeX batch processing. Querying for items with statuses: "
        f"{[s.name for s in process_statuses]}"
    )
    # Collect items to process from all specified statuses
    for status in process_statuses:
        if limit is not None and len(items_to_process) >= limit:
            break  # Stop querying if limit already reached
        try:
            # Use the generator function to get items for the current status
            items_gen = get_items_by_status(status, effective_db_path)
            count_for_status = 0
            for item in items_gen:
                if limit is not None and len(items_to_process) >= limit:
                    break  # Stop adding if limit reached
                items_to_process.append(item)
                count_for_status += 1
            logger.debug(f"Found {count_for_status} items with status {status.name}")
        except Exception as e:
            logger.error(f"Failed to retrieve items with status {status.name}: {e}")
            # Continue to next status even if one fails

    if not items_to_process:
        logger.info(
            "No items found requiring LaTeX processing in the specified statuses."
        )
        return

    logger.info(
        f"Found {len(items_to_process)} items matching criteria for LaTeX processing."
    )

    # Process collected items
    for item in items_to_process:
        # Check item status again before processing, in case it changed concurrently
        try:
            current_item_state = get_kb_item_by_name(
                item.unique_name, effective_db_path
            )
            # Ensure item exists and is still in one of the target processing statuses
            if (
                not current_item_state
                or current_item_state.status not in process_statuses
            ):
                status_text = (
                    current_item_state.status.name if current_item_state else "deleted"
                )

                logger.info(
                    f"Skipping {item.unique_name} as its status changed "
                    f"({status_text})"
                    f" or item disappeared before processing."
                )
                continue
        except Exception as fetch_err:
            logger.error(
                f"Error fetching current state for {item.unique_name} before "
                f"processing: {fetch_err}. Skipping."
            )
            continue

        logger.info(
            f"--- Processing LaTeX for: {item.unique_name} (ID: {item.id}, "
            f"Status: {current_item_state.status.name}) ---"
        )
        processed_count += 1
        try:
            # Call the main combined processing function for the item
            success = await generate_and_review_latex(
                item.unique_name, client, effective_db_path
            )
            if success:
                success_count += 1
                logger.info(f"Successfully processed LaTeX for {item.unique_name}.")
            else:
                fail_count += 1
                logger.warning(
                    f"Failed to process LaTeX for {item.unique_name} "
                    f"(check previous logs for details)."
                )
        except Exception as e:
            # Catch unexpected errors from generate_and_review_latex itself
            logger.error(
                f"Unhandled error processing item {item.unique_name} in batch: {e}"
            )
            fail_count += 1
            # Attempt to mark the item as ERROR if processing failed critically
            try:
                err_item = get_kb_item_by_name(item.unique_name, effective_db_path)
                # Avoid overwriting already accepted/rejected statuses with ERROR
                if err_item and err_item.status not in {
                    ItemStatus.PROVEN,
                    ItemStatus.LATEX_ACCEPTED,
                    ItemStatus.LATEX_REJECTED_FINAL,
                }:
                    err_item.update_status(
                        ItemStatus.ERROR, f"Batch processing error: {str(e)[:500]}"
                    )
                    await save_kb_item(err_item, client=None, db_path=effective_db_path)
            except Exception as save_err:
                logger.error(
                    f"Failed to save error status for {item.unique_name} during "
                    f"batch exception handling: {save_err}"
                )

        logger.info(f"--- Finished processing {item.unique_name} ---")
        # Optional delay between items if needed
        # await asyncio.sleep(0.5)

    logger.info(
        f"LaTeX Batch Processing Complete. Total Attempted: {processed_count}, "
        f"Succeeded: {success_count}, Failed: {fail_count}"
    )