Skip to content

PDF Agent Tool

πŸ”‘ Key Concepts

The PDF Agent Tool lets your agents read, index, and query content from PDF files stored in your project’s file system. It's ideal for use cases like:

  • Uploading reports, whitepapers, contracts, or manuals
  • Asking questions over academic literature
  • Creating knowledge agents that extract answers directly from documents

Once configured, the tool builds a semantic index over the provided PDFs, making them queryable through LLMs.

πŸ“˜ Key Definitions

Term Description
PDF Files The .pdf documents you upload to your workspace for indexing.
Vector Index A semantic structure used to embed and retrieve content via similarity search.
QueryEngineTool A LlamaIndex interface that routes natural language queries to indexed PDF data.
Agent Tool A callable module used by INTELLITHING agents to process specific query types.

βš™οΈ Setup Guide: Using the PDF Agent Tool

To configure and use the PDF agent in your workflow:

1. Upload and Reference Files

  • Upload PDF files via the file upload interface in the UI or drag-and-drop.
  • Once uploaded, use the file names in the files field during configuration (no path needed).

  • Example: ["report_q3.pdf", "case_study.pdf"]

πŸ“ PDF files are expected to reside in the /data directory.

2. Configure Tool Parameters

Field Purpose Example
name Internal name of the tool "Report Reader"
description Used by the agent router to match the tool to queries "Answers questions from the uploaded quarterly report"
files List of PDF file names (must match uploads) ["report_q3.pdf"]

πŸ”„ How It Works

  1. The tool reads the file list from the /data directory.
  2. It loads the content using SimpleDirectoryReader, handling multiple PDFs.
  3. A VectorStoreIndex is created from the parsed content.
  4. A QueryEngineTool is returned, which enables LLM agents to search the indexed documents.

This allows the agent to answer questions like:

  • β€œWhat’s the annual leave policy?”
  • β€œSummarize the main points of the employee handbook.”
  • β€œHow do we handle vendor onboarding?”

βœ… Best Practices

  • Name descriptively: Use meaningful name and description to help agents route correctly.
  • Limit file size: For better performance, avoid uploading excessively large or scanned image-only PDFs.
  • Scope content: Use focused PDFs (e.g., one document per topic) for higher-quality retrieval.
  • Use multi-file config when necessary: You can include multiple PDFs in the files list.

πŸ“Œ Example Use Case

To create a tool that answers policy questions from HR PDFs:

  1. Upload hr_policy.pdf and leave_policy.pdf via the UI.
  2. Configure the tool:

  3. Name: "HR Docs Reader"

  4. Description: "Fetches answers from internal HR policy documents."
  5. Files: ["hr_policy.pdf", "leave_policy.pdf"]
  6. Ask your agent:

  7. β€œWhat’s our maternity leave policy?”

  8. β€œDo we offer sabbatical leave?”