ForgeClient API

The ForgeClient class is the main interface for interacting with the Glyph Forge API.

class glyph_forge.core.client.forge_client.ForgeClient(api_key=None, base_url=None, *, timeout=30.0)[source]

Bases: object

Local SDK-based client for Glyph Forge.

Uses the Glyph SDK directly to build and run schemas locally. No API key required - all processing happens on your machine.

Parameters:
  • api_key (Optional[str]) – Deprecated. No longer used (kept for backwards compatibility).

  • base_url (Optional[str]) – Deprecated. No longer used (kept for backwards compatibility).

  • timeout (float) – Deprecated. No longer used (kept for backwards compatibility).

Example

>>> from glyph_forge import ForgeClient, create_workspace
>>> ws = create_workspace()
>>> client = ForgeClient()
>>> schema = client.build_schema_from_docx(ws, docx_path="sample.docx")
__init__(api_key=None, base_url=None, *, timeout=30.0)[source]

Initialize ForgeClient.

Parameters:
__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
close()[source]

Close the client and cleanup resources.

build_schema_from_docx(ws, *, docx_path, save_as=None, include_artifacts=False)[source]

Build a schema from a DOCX file using the local SDK.

Parameters:
  • ws (Any) – Workspace instance for saving artifacts

  • docx_path (str) – Path to DOCX file (absolute or CWD-relative)

  • save_as (Optional[str]) – Optional name to save schema JSON (without .json extension)

  • include_artifacts (bool) – If True, save tagged DOCX + unzipped files (default: False)

Return type:

Dict[str, Any]

Returns:

Schema dict

Raises:

ForgeClientError – File not found or processing error

Example

>>> schema = client.build_schema_from_docx(
...     ws,
...     docx_path="sample.docx",
...     save_as="my_schema"
... )
build_glyph_from_docx(ws, *, docx_path, save_as=None, include_artifacts=False)[source]

Build both schema and markup from a DOCX file in one call.

Uses the local SDK to intake the DOCX, build the schema via GlyphSchemaBuilder, then produce a .glyph.txt markup file via GlyphMarkupBuilder (coordinated mode, no re-parsing).

Parameters:
  • ws (Any) – Workspace instance for saving artifacts

  • docx_path (str) – Path to DOCX file (absolute or CWD-relative)

  • save_as (Optional[str]) – Optional base name to save schema JSON and markup (e.g. “my_doc” -> my_doc.json + my_doc.glyph.txt)

  • include_artifacts (bool) – If True, save tagged DOCX + unzipped files (default: False)

Returns:

  • schema: The generated schema dict

  • markup: The generated markup string

  • schema_path: Path to saved schema JSON (if save_as provided)

  • markup_path: Path to saved markup file (if save_as provided)

Return type:

Dict with keys

Raises:

ForgeClientError – File not found or processing error

Example

>>> result = client.build_glyph_from_docx(
...     ws,
...     docx_path="sample.docx",
...     save_as="my_schema"
... )
>>> print(len(result["markup"]), "chars of markup")
>>> print(len(result["schema"].get("pattern_descriptors", [])), "descriptors")
run_schema(ws, *, schema, plaintext, dest_name='assembled_output.docx')[source]

Run a schema with plaintext to generate a DOCX using the local SDK.

Parameters:
  • ws (Any) – Workspace instance

  • schema (Dict[str, Any]) – Schema dict (from build_schema_from_docx or loaded JSON)

  • plaintext (str) – Input text content

  • dest_name (str) – Name for output DOCX file (saved in output_docx directory)

Return type:

str

Returns:

Local path to saved DOCX file

Raises:

ForgeClientError – Failed to run schema or save DOCX

Example

>>> docx_path = client.run_schema(
...     ws,
...     schema=schema,
...     plaintext="Sample text...",
...     dest_name="output.docx"
... )
run_schema_bulk(ws, *, schema, plaintexts, max_concurrent=5, dest_name_pattern='output_{index}.docx')[source]

Run a schema with multiple plaintexts to generate multiple DOCX files.

Parameters:
  • ws (Any) – Workspace instance

  • schema (Dict[str, Any]) – Schema dict (from build_schema_from_docx or loaded JSON)

  • plaintexts (list[str]) – List of plaintext strings to process

  • max_concurrent (int) – Ignored in local SDK mode (processed sequentially)

  • dest_name_pattern (str) – Pattern for output filenames. Use {index} placeholder

Return type:

Dict[str, Any]

Returns:

Dict containing results with status, paths, and timing info

Example

>>> result = client.run_schema_bulk(
...     ws,
...     schema=schema,
...     plaintexts=["Text 1...", "Text 2...", "Text 3..."],
...     dest_name_pattern="invoice_{index}.docx"
... )
compress_schema(ws, *, schema, save_as=None)[source]

Compress a schema by deduplicating redundant pattern descriptors.

Parameters:
  • ws (Any) – Workspace instance

  • schema (Dict[str, Any]) – Schema dict to compress

  • save_as (Optional[str]) – Optional name to save compressed schema JSON

Return type:

Dict[str, Any]

Returns:

Dict containing compressed_schema and stats

Example

>>> result = client.compress_schema(
...     ws,
...     schema=schema,
...     save_as="compressed_schema"
... )
intake_plaintext_text(ws, *, text, classify=False, save_as=None, **opts)[source]

Intake plaintext via text string (local processing).

Parameters:
  • ws (Any) – Workspace instance

  • text (str) – Plaintext content to intake

  • classify (bool) – If True, run heuristic classification on each line (adds classifications key to the result)

  • save_as (Optional[str]) – Optional name to save intake result JSON

  • **opts (Any) – Additional options (unicode_form, strip_zero_width, etc.)

Return type:

Dict[str, Any]

Returns:

Intake result dict

Example

>>> result = client.intake_plaintext_text(
...     ws,
...     text="Sample text...",
...     classify=True,
...     save_as="intake_result"
... )
intake_plaintext_file(ws, *, file_path, save_as=None, **opts)[source]

Intake plaintext from file (local processing).

Parameters:
  • ws (Any) – Workspace instance

  • file_path (str) – Path to plaintext file

  • save_as (Optional[str]) – Optional name to save intake result JSON

  • **opts (Any) – Additional options

Return type:

Dict[str, Any]

Returns:

Intake result dict

Example

>>> result = client.intake_plaintext_file(
...     ws,
...     file_path="sample.txt",
...     save_as="intake_result"
... )
detect_forms(ws, *, text, forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Detect heuristic forms (headings, lists, paragraphs, etc.) in plaintext.

Runs the SDK’s line classifier against each line and returns classifications filtered by form type and confidence threshold.

Parameters:
  • ws (Any) – Workspace instance

  • text (str) – Plaintext content to classify

  • forms (Optional[List[str]]) – Optional list of form codes to keep (e.g. ["H-SHORT", "L-BULLET"]). None returns all.

  • threshold (float) – Minimum confidence score (0.0–1.0, default 0.55)

  • use_context (bool) – Use surrounding-line context for better accuracy

  • save_as (Optional[str]) – Optional name to save result JSON in workspace

Return type:

Dict[str, Any]

Returns:

Dict with keys classifications, total_lines, matched_lines, forms_filter, threshold.

Example

>>> result = client.detect_forms(
...     ws,
...     text=open("doc.txt").read(),
...     forms=["H-SHORT", "L-BULLET"],
... )
>>> for c in result["classifications"]:
...     print(c["pattern_type"], c["text"][:60])
detect_forms_file(ws, *, file_path, forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Detect heuristic forms in a plaintext file.

Reads the file and delegates to detect_forms().

Parameters:
  • ws (Any) – Workspace instance

  • file_path (str) – Path to plaintext file

  • forms (Optional[List[str]]) – Optional form-code filter list

  • threshold (float) – Minimum confidence score

  • use_context (bool) – Use surrounding-line context

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Same dict as detect_forms()

chunk_plaintext_text(ws, *, text, threshold=0.55, heading_forms=None, save_as=None)[source]

Split plaintext into heading-bounded chunks.

Runs heading detection on each line and splits the text at heading boundaries so each chunk can be processed independently (e.g. fed to an LLM one section at a time).

Parameters:
  • ws (Any) – Workspace instance

  • text (str) – Plaintext content to chunk

  • threshold (float) – Heading-detection confidence threshold (default 0.55)

  • heading_forms (Optional[List[str]]) – Optional list of heading forms to split on (e.g. ["H-SHORT", "H-SECTION-N"]). None uses all.

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Dict with keys chunks (list), total_chunks, total_lines, headings_detected.

Example

>>> result = client.chunk_plaintext_text(
...     ws, text=open("doc.txt").read()
... )
>>> for chunk in result["chunks"]:
...     print(chunk["heading_text"], "->", len(chunk["plaintext"]), "chars")
chunk_plaintext_file(ws, *, file_path, threshold=0.55, heading_forms=None, save_as=None)[source]

Split a plaintext file into heading-bounded chunks.

Reads the file and delegates to chunk_plaintext_text().

Parameters:
  • ws (Any) – Workspace instance

  • file_path (str) – Path to plaintext file

  • threshold (float) – Heading-detection confidence threshold

  • heading_forms (Optional[List[str]]) – Optional heading-form filter list

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Same dict as chunk_plaintext_text()

target_chunks(ws, *, prompt, text=None, chunks=None, threshold=0.3, save_as=None)[source]

Run prompt-based targeting on chunks to select only relevant sections.

Given a user modify request (prompt) and either raw text or pre-built chunks, classifies the prompt intent and scores each chunk for relevance. Returns the selected subset plus scoring metadata.

Parameters:
  • ws (Any) – Workspace instance

  • prompt (str) – The user’s modify request to classify

  • text (Optional[str]) – Raw plaintext to auto-chunk first (mutually exclusive with chunks)

  • chunks (Optional[List[Dict[str, Any]]]) – Pre-built chunk dicts with plaintext or content key (mutually exclusive with text)

  • threshold (float) – Minimum relevance score for selection (0.0–1.0, default 0.3)

  • save_as (Optional[str]) – Optional name to save result JSON in workspace

Return type:

Dict[str, Any]

Returns:

Dict with keys analysis, selected_chunks, all_scores, strategy, chunks_total, chunks_selected, token_savings.

Raises:

ForgeClientError – If neither text nor chunks is provided, or processing fails

Example

>>> result = client.target_chunks(
...     ws,
...     prompt="Format the abstract as a block quote",
...     text=open("doc.txt").read(),
... )
>>> print(f"Selected {result['chunks_selected']}/{result['chunks_total']} chunks")
target_chunks_file(ws, *, prompt, file_path, threshold=0.3, save_as=None)[source]

Run prompt-based targeting on a plaintext file.

Reads the file and delegates to target_chunks().

Parameters:
  • ws (Any) – Workspace instance

  • prompt (str) – The user’s modify request to classify

  • file_path (str) – Path to plaintext file

  • threshold (float) – Minimum relevance score for selection

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Same dict as target_chunks()

chunk_docx(ws, *, docx_path, threshold=0.55, save_as=None)[source]

Chunk a DOCX document into heading-bounded sections.

Extracts paragraph text from the DOCX, detects headings via heuristics, and splits into chunks. Each chunk includes the heading metadata and the plaintext content of that section.

Parameters:
  • ws (Any) – Workspace instance

  • docx_path (str) – Path to DOCX file

  • threshold (float) – Heading-detection confidence threshold (default 0.55)

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Dict with keys chunks, total_chunks, total_paragraphs, headings_detected.

Example

>>> result = client.chunk_docx(ws, docx_path="report.docx")
>>> for chunk in result["chunks"]:
...     print(chunk["heading_text"], "->", len(chunk["plaintext"]), "chars")
index_document(ws, *, text, section_forms=None, annotate_forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Build a structured document index with heading-bounded sections and optional form-annotated segments.

Combines heading detection (for section boundaries) with line classification (for segment annotation) to produce an index that lets you request specific form types AND get the content between reference points.

Parameters:
  • ws (Any) – Workspace instance

  • text (str) – Plaintext content to index

  • section_forms (Optional[List[str]]) – Heading form codes that define section boundaries (e.g. ["H-SHORT", "H-SECTION-N"]). None uses all heading forms.

  • annotate_forms (Optional[List[str]]) – Form codes to annotate as segments within sections (e.g. ["L-BULLET", "T-ROW"]). None skips classification entirely (faster). [] runs classification but matches nothing.

  • threshold (float) – Minimum confidence score (0.0–1.0, default 0.55)

  • use_context (bool) – Use surrounding-line context for classification

  • save_as (Optional[str]) – Optional name to save result JSON in workspace

Return type:

Dict[str, Any]

Returns:

Dict with keys sections, preamble, total_sections, total_lines, headings_detected, section_forms, annotate_forms.

Example

>>> result = client.index_document(
...     ws,
...     text=open("doc.txt").read(),
...     annotate_forms=["L-BULLET", "T-ROW"],
... )
>>> for sec in result["sections"]:
...     print(sec["heading"]["text"], len(sec["segments"]), "segments")
index_document_file(ws, *, file_path, section_forms=None, annotate_forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Build a structured document index from a plaintext file.

Reads the file and delegates to index_document().

Parameters:
  • ws (Any) – Workspace instance

  • file_path (str) – Path to plaintext file

  • section_forms (Optional[List[str]]) – Heading form codes for section boundaries

  • annotate_forms (Optional[List[str]]) – Form codes to annotate as segments

  • threshold (float) – Minimum confidence score

  • use_context (bool) – Use surrounding-line context

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Same dict as index_document()

index_docx(ws, *, docx_path, section_forms=None, annotate_forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Build a structured document index from a DOCX file.

Extracts paragraph text from the DOCX, detects headings for section boundaries, and optionally annotates segments within each section.

Parameters:
  • ws (Any) – Workspace instance

  • docx_path (str) – Path to DOCX file

  • section_forms (Optional[List[str]]) – Heading form codes for section boundaries

  • annotate_forms (Optional[List[str]]) – Form codes to annotate as segments

  • threshold (float) – Minimum confidence score (default 0.55)

  • use_context (bool) – Use surrounding-line context

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Dict with keys sections, preamble, total_sections, total_paragraphs, headings_detected, section_forms, annotate_forms.

Example

>>> result = client.index_docx(
...     ws,
...     docx_path="report.docx",
...     annotate_forms=["L-BULLET", "T-ROW"],
... )
ask(*, message, tenant_id=None, user_id=None, conversation_id=None, conversation_history=None, current_schema=None, current_plaintext=None, current_document=None, real_time=False, strict_validation=False, enable_targeting=False)[source]

Send a message to the Glyph Agent multi-agent system via API.

This endpoint orchestrates: 1. Intent classification 2. Agent routing (schema, plaintext, validation, conversation) 3. Multi-step workflows 4. Markup application 5. Conversation state management

Parameters:
  • message (str) – The message to send to the agent (required)

  • tenant_id (Optional[str]) – Tenant identifier for rate limiting

  • user_id (Optional[str]) – User identifier for rate limiting

  • conversation_id (Optional[str]) – Conversation ID for context tracking

  • conversation_history (Optional[List[Dict[str, str]]]) – Previous conversation messages for context List of dicts with ‘role’ and ‘content’ keys

  • current_schema (Optional[Dict[str, Any]]) – Current schema state (for incremental modifications)

  • current_plaintext (Optional[str]) – Current plaintext content (for incremental modifications)

  • current_document (Optional[Dict[str, Any]]) – Legacy combined document state

  • real_time (bool) – Enable real-time sandbox updates

  • strict_validation (bool) – Enable strict validation mode

  • enable_targeting (bool) – If True and current_plaintext is provided, run local chunk targeting to reduce payload size before sending to the API (default: False)

Returns:

  • response: The agent’s response message

  • document: Generated or modified document (if applicable)

  • schema/document_schema: Document schema (if schema request)

  • plaintext: Generated plaintext content

  • validation_result: Validation results (if validation request)

  • metadata: Additional metadata (intent, routing, etc.)

  • usage: Token usage information

  • conversation_id: Conversation ID for tracking

Return type:

Dict containing

Raises:

Example

>>> client = ForgeClient(api_key="your-api-key")
>>> response = client.ask(
...     message="Create a schema for a quarterly report",
...     user_id="user123"
... )
>>> print(response['response'])
>>> if 'schema' in response:
...     print(f"Schema generated: {len(response['schema']['pattern_descriptors'])} descriptors")

Core Methods

Schema Building

ForgeClient.build_schema_from_docx(ws, *, docx_path, save_as=None, include_artifacts=False)[source]

Build a schema from a DOCX file using the local SDK.

Parameters:
  • ws (Any) – Workspace instance for saving artifacts

  • docx_path (str) – Path to DOCX file (absolute or CWD-relative)

  • save_as (Optional[str]) – Optional name to save schema JSON (without .json extension)

  • include_artifacts (bool) – If True, save tagged DOCX + unzipped files (default: False)

Return type:

Dict[str, Any]

Returns:

Schema dict

Raises:

ForgeClientError – File not found or processing error

Example

>>> schema = client.build_schema_from_docx(
...     ws,
...     docx_path="sample.docx",
...     save_as="my_schema"
... )

Schema Running

ForgeClient.run_schema(ws, *, schema, plaintext, dest_name='assembled_output.docx')[source]

Run a schema with plaintext to generate a DOCX using the local SDK.

Parameters:
  • ws (Any) – Workspace instance

  • schema (Dict[str, Any]) – Schema dict (from build_schema_from_docx or loaded JSON)

  • plaintext (str) – Input text content

  • dest_name (str) – Name for output DOCX file (saved in output_docx directory)

Return type:

str

Returns:

Local path to saved DOCX file

Raises:

ForgeClientError – Failed to run schema or save DOCX

Example

>>> docx_path = client.run_schema(
...     ws,
...     schema=schema,
...     plaintext="Sample text...",
...     dest_name="output.docx"
... )

Bulk Processing

ForgeClient.run_schema_bulk(ws, *, schema, plaintexts, max_concurrent=5, dest_name_pattern='output_{index}.docx')[source]

Run a schema with multiple plaintexts to generate multiple DOCX files.

Parameters:
  • ws (Any) – Workspace instance

  • schema (Dict[str, Any]) – Schema dict (from build_schema_from_docx or loaded JSON)

  • plaintexts (list[str]) – List of plaintext strings to process

  • max_concurrent (int) – Ignored in local SDK mode (processed sequentially)

  • dest_name_pattern (str) – Pattern for output filenames. Use {index} placeholder

Return type:

Dict[str, Any]

Returns:

Dict containing results with status, paths, and timing info

Example

>>> result = client.run_schema_bulk(
...     ws,
...     schema=schema,
...     plaintexts=["Text 1...", "Text 2...", "Text 3..."],
...     dest_name_pattern="invoice_{index}.docx"
... )

Schema Compression

ForgeClient.compress_schema(ws, *, schema, save_as=None)[source]

Compress a schema by deduplicating redundant pattern descriptors.

Parameters:
  • ws (Any) – Workspace instance

  • schema (Dict[str, Any]) – Schema dict to compress

  • save_as (Optional[str]) – Optional name to save compressed schema JSON

Return type:

Dict[str, Any]

Returns:

Dict containing compressed_schema and stats

Example

>>> result = client.compress_schema(
...     ws,
...     schema=schema,
...     save_as="compressed_schema"
... )

Plaintext Intake

ForgeClient.intake_plaintext_text(ws, *, text, classify=False, save_as=None, **opts)[source]

Intake plaintext via text string (local processing).

Parameters:
  • ws (Any) – Workspace instance

  • text (str) – Plaintext content to intake

  • classify (bool) – If True, run heuristic classification on each line (adds classifications key to the result)

  • save_as (Optional[str]) – Optional name to save intake result JSON

  • **opts (Any) – Additional options (unicode_form, strip_zero_width, etc.)

Return type:

Dict[str, Any]

Returns:

Intake result dict

Example

>>> result = client.intake_plaintext_text(
...     ws,
...     text="Sample text...",
...     classify=True,
...     save_as="intake_result"
... )
ForgeClient.intake_plaintext_file(ws, *, file_path, save_as=None, **opts)[source]

Intake plaintext from file (local processing).

Parameters:
  • ws (Any) – Workspace instance

  • file_path (str) – Path to plaintext file

  • save_as (Optional[str]) – Optional name to save intake result JSON

  • **opts (Any) – Additional options

Return type:

Dict[str, Any]

Returns:

Intake result dict

Example

>>> result = client.intake_plaintext_file(
...     ws,
...     file_path="sample.txt",
...     save_as="intake_result"
... )

Form Detection

ForgeClient.detect_forms(ws, *, text, forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Detect heuristic forms (headings, lists, paragraphs, etc.) in plaintext.

Runs the SDK’s line classifier against each line and returns classifications filtered by form type and confidence threshold.

Parameters:
  • ws (Any) – Workspace instance

  • text (str) – Plaintext content to classify

  • forms (Optional[List[str]]) – Optional list of form codes to keep (e.g. ["H-SHORT", "L-BULLET"]). None returns all.

  • threshold (float) – Minimum confidence score (0.0–1.0, default 0.55)

  • use_context (bool) – Use surrounding-line context for better accuracy

  • save_as (Optional[str]) – Optional name to save result JSON in workspace

Return type:

Dict[str, Any]

Returns:

Dict with keys classifications, total_lines, matched_lines, forms_filter, threshold.

Example

>>> result = client.detect_forms(
...     ws,
...     text=open("doc.txt").read(),
...     forms=["H-SHORT", "L-BULLET"],
... )
>>> for c in result["classifications"]:
...     print(c["pattern_type"], c["text"][:60])
ForgeClient.detect_forms_file(ws, *, file_path, forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Detect heuristic forms in a plaintext file.

Reads the file and delegates to detect_forms().

Parameters:
  • ws (Any) – Workspace instance

  • file_path (str) – Path to plaintext file

  • forms (Optional[List[str]]) – Optional form-code filter list

  • threshold (float) – Minimum confidence score

  • use_context (bool) – Use surrounding-line context

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Same dict as detect_forms()

Chunking

ForgeClient.chunk_plaintext_text(ws, *, text, threshold=0.55, heading_forms=None, save_as=None)[source]

Split plaintext into heading-bounded chunks.

Runs heading detection on each line and splits the text at heading boundaries so each chunk can be processed independently (e.g. fed to an LLM one section at a time).

Parameters:
  • ws (Any) – Workspace instance

  • text (str) – Plaintext content to chunk

  • threshold (float) – Heading-detection confidence threshold (default 0.55)

  • heading_forms (Optional[List[str]]) – Optional list of heading forms to split on (e.g. ["H-SHORT", "H-SECTION-N"]). None uses all.

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Dict with keys chunks (list), total_chunks, total_lines, headings_detected.

Example

>>> result = client.chunk_plaintext_text(
...     ws, text=open("doc.txt").read()
... )
>>> for chunk in result["chunks"]:
...     print(chunk["heading_text"], "->", len(chunk["plaintext"]), "chars")
ForgeClient.chunk_plaintext_file(ws, *, file_path, threshold=0.55, heading_forms=None, save_as=None)[source]

Split a plaintext file into heading-bounded chunks.

Reads the file and delegates to chunk_plaintext_text().

Parameters:
  • ws (Any) – Workspace instance

  • file_path (str) – Path to plaintext file

  • threshold (float) – Heading-detection confidence threshold

  • heading_forms (Optional[List[str]]) – Optional heading-form filter list

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Same dict as chunk_plaintext_text()

ForgeClient.chunk_docx(ws, *, docx_path, threshold=0.55, save_as=None)[source]

Chunk a DOCX document into heading-bounded sections.

Extracts paragraph text from the DOCX, detects headings via heuristics, and splits into chunks. Each chunk includes the heading metadata and the plaintext content of that section.

Parameters:
  • ws (Any) – Workspace instance

  • docx_path (str) – Path to DOCX file

  • threshold (float) – Heading-detection confidence threshold (default 0.55)

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Dict with keys chunks, total_chunks, total_paragraphs, headings_detected.

Example

>>> result = client.chunk_docx(ws, docx_path="report.docx")
>>> for chunk in result["chunks"]:
...     print(chunk["heading_text"], "->", len(chunk["plaintext"]), "chars")

Document Indexing

ForgeClient.index_document(ws, *, text, section_forms=None, annotate_forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Build a structured document index with heading-bounded sections and optional form-annotated segments.

Combines heading detection (for section boundaries) with line classification (for segment annotation) to produce an index that lets you request specific form types AND get the content between reference points.

Parameters:
  • ws (Any) – Workspace instance

  • text (str) – Plaintext content to index

  • section_forms (Optional[List[str]]) – Heading form codes that define section boundaries (e.g. ["H-SHORT", "H-SECTION-N"]). None uses all heading forms.

  • annotate_forms (Optional[List[str]]) – Form codes to annotate as segments within sections (e.g. ["L-BULLET", "T-ROW"]). None skips classification entirely (faster). [] runs classification but matches nothing.

  • threshold (float) – Minimum confidence score (0.0–1.0, default 0.55)

  • use_context (bool) – Use surrounding-line context for classification

  • save_as (Optional[str]) – Optional name to save result JSON in workspace

Return type:

Dict[str, Any]

Returns:

Dict with keys sections, preamble, total_sections, total_lines, headings_detected, section_forms, annotate_forms.

Example

>>> result = client.index_document(
...     ws,
...     text=open("doc.txt").read(),
...     annotate_forms=["L-BULLET", "T-ROW"],
... )
>>> for sec in result["sections"]:
...     print(sec["heading"]["text"], len(sec["segments"]), "segments")
ForgeClient.index_document_file(ws, *, file_path, section_forms=None, annotate_forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Build a structured document index from a plaintext file.

Reads the file and delegates to index_document().

Parameters:
  • ws (Any) – Workspace instance

  • file_path (str) – Path to plaintext file

  • section_forms (Optional[List[str]]) – Heading form codes for section boundaries

  • annotate_forms (Optional[List[str]]) – Form codes to annotate as segments

  • threshold (float) – Minimum confidence score

  • use_context (bool) – Use surrounding-line context

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Same dict as index_document()

ForgeClient.index_docx(ws, *, docx_path, section_forms=None, annotate_forms=None, threshold=0.55, use_context=True, save_as=None)[source]

Build a structured document index from a DOCX file.

Extracts paragraph text from the DOCX, detects headings for section boundaries, and optionally annotates segments within each section.

Parameters:
  • ws (Any) – Workspace instance

  • docx_path (str) – Path to DOCX file

  • section_forms (Optional[List[str]]) – Heading form codes for section boundaries

  • annotate_forms (Optional[List[str]]) – Form codes to annotate as segments

  • threshold (float) – Minimum confidence score (default 0.55)

  • use_context (bool) – Use surrounding-line context

  • save_as (Optional[str]) – Optional name to save result JSON

Return type:

Dict[str, Any]

Returns:

Dict with keys sections, preamble, total_sections, total_paragraphs, headings_detected, section_forms, annotate_forms.

Example

>>> result = client.index_docx(
...     ws,
...     docx_path="report.docx",
...     annotate_forms=["L-BULLET", "T-ROW"],
... )

Agent API

ForgeClient.ask(*, message, tenant_id=None, user_id=None, conversation_id=None, conversation_history=None, current_schema=None, current_plaintext=None, current_document=None, real_time=False, strict_validation=False, enable_targeting=False)[source]

Send a message to the Glyph Agent multi-agent system via API.

This endpoint orchestrates: 1. Intent classification 2. Agent routing (schema, plaintext, validation, conversation) 3. Multi-step workflows 4. Markup application 5. Conversation state management

Parameters:
  • message (str) – The message to send to the agent (required)

  • tenant_id (Optional[str]) – Tenant identifier for rate limiting

  • user_id (Optional[str]) – User identifier for rate limiting

  • conversation_id (Optional[str]) – Conversation ID for context tracking

  • conversation_history (Optional[List[Dict[str, str]]]) – Previous conversation messages for context List of dicts with ‘role’ and ‘content’ keys

  • current_schema (Optional[Dict[str, Any]]) – Current schema state (for incremental modifications)

  • current_plaintext (Optional[str]) – Current plaintext content (for incremental modifications)

  • current_document (Optional[Dict[str, Any]]) – Legacy combined document state

  • real_time (bool) – Enable real-time sandbox updates

  • strict_validation (bool) – Enable strict validation mode

  • enable_targeting (bool) – If True and current_plaintext is provided, run local chunk targeting to reduce payload size before sending to the API (default: False)

Returns:

  • response: The agent’s response message

  • document: Generated or modified document (if applicable)

  • schema/document_schema: Document schema (if schema request)

  • plaintext: Generated plaintext content

  • validation_result: Validation results (if validation request)

  • metadata: Additional metadata (intent, routing, etc.)

  • usage: Token usage information

  • conversation_id: Conversation ID for tracking

Return type:

Dict containing

Raises:

Example

>>> client = ForgeClient(api_key="your-api-key")
>>> response = client.ask(
...     message="Create a schema for a quarterly report",
...     user_id="user123"
... )
>>> print(response['response'])
>>> if 'schema' in response:
...     print(f"Schema generated: {len(response['schema']['pattern_descriptors'])} descriptors")

Client Management

ForgeClient.close()[source]

Close the client and cleanup resources.

Usage Examples

Basic Schema Build and Run

from glyph_forge import ForgeClient, create_workspace

# Initialize
client = ForgeClient(api_key="gf_live_...")
ws = create_workspace()

# Build schema
schema = client.build_schema_from_docx(
    ws,
    docx_path="template.docx",
    save_as="my_schema"
)

# Run schema
output = client.run_schema(
    ws,
    schema=schema,
    plaintext="Content here...",
    dest_name="output.docx"
)

With Context Manager

from glyph_forge import ForgeClient, create_workspace

ws = create_workspace()

with ForgeClient(api_key="gf_live_...") as client:
    schema = client.build_schema_from_docx(
        ws,
        docx_path="template.docx"
    )

Bulk Processing

# Process multiple documents at once
plaintexts = ["Text 1...", "Text 2...", "Text 3..."]

result = client.run_schema_bulk(
    ws,
    schema=schema,
    plaintexts=plaintexts,
    max_concurrent=5,
    dest_name_pattern="output_{index}.docx"
)

print(f"Processed {result['successful']} of {result['total']}")

Schema Compression

# Compress schema to optimize size
result = client.compress_schema(
    ws,
    schema=schema,
    save_as="compressed_schema"
)

print(f"Reduced from {result['stats']['original_count']} "
      f"to {result['stats']['compressed_count']} pattern descriptors")

Document Indexing

# Index a document with section structure only
result = client.index_document(ws, text=open("report.txt").read())

for sec in result["sections"]:
    print(f"{sec['heading']['text']} (lines {sec['span']['start']}-{sec['span']['end']})")

# Index with segment annotations for specific form types
result = client.index_document(
    ws,
    text=open("report.txt").read(),
    annotate_forms=["L-BULLET", "T-ROW"],
)

for sec in result["sections"]:
    for seg in sec["segments"]:
        print(f"  {seg['form']}: {seg['count']} lines at {seg['span']}")

# Index a DOCX file
result = client.index_docx(
    ws,
    docx_path="report.docx",
    annotate_forms=["L-BULLET"],
    save_as="report_index",
)