Ontology YAML Schema - Create Context Graph

Every domain in Create Context Graph is defined by a single YAML file. This file is the source of truth for everything the tool generates: Neo4j schema and constraints, Pydantic models, agent tools, NVL visualization config, and demo scenarios. Domain YAMLs live in src/create_context_graph/domains/ in the source tree and are copied into generated projects at data/ontology.yaml.

Top-level structure

inherits: _base          # Required: merge POLE+O base types

domain:                  # Required
  id: my-domain
  name: My Domain
  description: ...
  tagline: ...
  emoji: ...

entity_types: [...]      # Required: list of entity type definitions
relationships: [...]     # Required: list of relationship definitions
document_templates: [...] # Optional: templates for synthetic documents
decision_traces: [...]   # Optional: reasoning trace scenarios
demo_scenarios: [...]    # Optional: pre-built chat prompts
agent_tools: [...]       # Required: domain-specific agent tools
system_prompt: |         # Required: multi-line agent system prompt
  ...
visualization:           # Optional: NVL visualization overrides
  node_colors: {}
  node_sizes: {}
  default_cypher: ...

`inherits`

inherits: _base

All domain ontologies must declare inherits: _base. This instructs the ontology loader to merge the base POLE+O entity types (Person, Organization, Location, Event, Object) and their standard relationships into the domain. Base types are prepended to the entity list unless the domain explicitly redefines an entity with the same label. The three base relationships added automatically are:

WORKS_FOR: Person → Organization
LOCATED_AT: Organization → Location
PARTICIPATED_IN: Person → Event

`domain`

Domain metadata used in the generated project’s README, UI header, and configuration.

Field	Type	Required	Description
`id`	string	yes	Kebab-case identifier (e.g., `financial-services`). Must match the YAML filename.
`name`	string	yes	Human-readable display name.
`description`	string	no	One-sentence domain description.
`tagline`	string	no	Short tagline shown in the generated app’s UI.
`emoji`	string	no	Emoji displayed alongside the domain name. Use Unicode escape or a literal emoji.

domain:
  id: healthcare
  name: Healthcare
  description: Patient care, clinical encounters, diagnoses, treatments, and provider networks
  tagline: "AI-powered Clinical Intelligence"
  emoji: "\U0001F3E5"

`entity_types`

A list of entity type definitions. Each entry maps to a Neo4j node label and a generated Pydantic model.

Field	Type	Required	Description
`label`	string	yes	Neo4j node label (PascalCase).
`pole_type`	string	yes	POLE+O classification. One of: `PERSON`, `ORGANIZATION`, `LOCATION`, `EVENT`, `OBJECT`.
`subtype`	string	no	More specific classification within the POLE+O type (e.g., `PATIENT`, `CLINICAL_ENCOUNTER`).
`color`	string	no	Hex color for NVL visualization. Default: `#6366f1`.
`icon`	string	no	Icon identifier for the frontend node renderer. Default: `circle`.
`properties`	list	no	List of property definitions (see Property definitions below).

POLE+O types

The POLE+O model classifies every entity in the knowledge graph into one of five semantic categories. This drives how the agent tools are generated, how the graph is visualized, and how embeddings are indexed.

PERSON

Individuals — patients, employees, customers, researchers, players.

ORGANIZATION

Companies, teams, departments, agencies, institutions.

LOCATION

Physical places, addresses, regions, facilities, habitats.

EVENT

Time-bound occurrences — transactions, encounters, appointments, sightings.

OBJECT

Everything else — accounts, documents, products, equipment, medications.

Property definitions

Each property within an entity_types or relationships entry accepts:

Field	Type	Required	Description
`name`	string	yes	Property name (snake_case).
`type`	string	no	Data type. Default: `string`. See Supported property types.
`required`	boolean	no	Whether the property is mandatory. Default: `false`.
`unique`	boolean	no	Whether to create a uniqueness constraint in Neo4j. Default: `false`.
`enum`	list	no	Allowed values. Generates a Python `Enum` class.
`default`	any	no	Default value used during data generation.
`description`	string	no	Human-readable description passed to the LLM data generator.

Supported property types

Type	Python type	Neo4j type	Description
`string`	`str`	`STRING`	Text values
`integer`	`int`	`INTEGER`	Whole numbers
`float`	`float`	`FLOAT`	Decimal numbers
`boolean`	`bool`	`BOOLEAN`	True/false
`date`	`date`	`DATE`	Calendar date (no time component)
`datetime`	`datetime`	`DATETIME`	Date and time
`point`	`str`	`POINT`	Geographic coordinates (WGS-84)

When using boolean-like values in enum lists, you must quote them. Unquoted true and false are parsed by YAML as actual booleans, not strings.

# Wrong — YAML parses these as boolean true/false
enum: [true, false]

# Correct — quoted strings
enum: ["true", "false"]

Example — healthcare entity types

The following is taken directly from the healthcare.yaml domain:

entity_types:
  - label: Patient
    pole_type: PERSON
    subtype: PATIENT
    color: "#06b6d4"
    icon: user
    properties:
      - name: patient_id
        type: string
        required: true
        unique: true
      - name: name
        type: string
        required: true
      - name: date_of_birth
        type: date
      - name: blood_type
        type: string
        enum: ["A+", "A-", "B+", "B-", "AB+", "AB-", "O+", "O-"]
      - name: allergies
        type: string

  - label: Provider
    pole_type: PERSON
    subtype: PROVIDER
    color: "#14b8a6"
    icon: stethoscope
    properties:
      - name: provider_id
        type: string
        required: true
        unique: true
      - name: name
        type: string
        required: true
      - name: specialty
        type: string
        enum: [general_practice, cardiology, oncology, neurology, orthopedics,
               pediatrics, radiology, surgery, emergency, psychiatry]
      - name: license_number
        type: string

  - label: Diagnosis
    pole_type: OBJECT
    subtype: DIAGNOSIS
    color: "#ef4444"
    icon: heart-pulse
    properties:
      - name: icd_code
        type: string
        required: true
        unique: true
      - name: name
        type: string
        required: true
      - name: category
        type: string
        enum: [chronic, acute, infectious, autoimmune, genetic, mental_health]
      - name: severity
        type: string
        enum: [mild, moderate, severe, critical]

  - label: Encounter
    pole_type: EVENT
    subtype: CLINICAL_ENCOUNTER
    color: "#f59e0b"
    icon: calendar
    properties:
      - name: encounter_id
        type: string
        required: true
        unique: true
      - name: encounter_type
        type: string
        enum: [inpatient, outpatient, emergency, telehealth]
      - name: date
        type: datetime
        required: true
      - name: chief_complaint
        type: string
      - name: disposition
        type: string

  - label: Facility
    pole_type: LOCATION
    subtype: HEALTHCARE_FACILITY
    color: "#6366f1"
    icon: building
    properties:
      - name: facility_id
        type: string
        required: true
        unique: true
      - name: name
        type: string
        required: true
      - name: facility_type
        type: string
        enum: [hospital, clinic, urgent_care, lab, pharmacy, rehab_center]
      - name: bed_count
        type: integer

`relationships`

A list of relationship type definitions. Each entry maps to a Neo4j relationship type.

Field	Type	Required	Description
`type`	string	yes	Relationship type (UPPER_SNAKE_CASE).
`source`	string	yes	Source entity label.
`target`	string	yes	Target entity label.
`properties`	list	no	List of property definitions (same schema as entity properties).

relationships:
  - type: DIAGNOSED_WITH
    source: Patient
    target: Diagnosis
  - type: TREATED_BY
    source: Patient
    target: Provider
  - type: HAD_ENCOUNTER
    source: Patient
    target: Encounter
  - type: OCCURRED_AT
    source: Encounter
    target: Facility
  - type: AFFILIATED_WITH
    source: Provider
    target: Facility

`document_templates`

Templates that guide synthetic document generation. Each template produces a batch of documents when --demo-data is used with an LLM API key. Documents are stored as :Document nodes in Neo4j with :MENTIONS edges to the entities they reference.

Field	Type	Required	Description
`id`	string	yes	Template identifier.
`name`	string	yes	Human-readable template name.
`description`	string	no	What this document type represents.
`count`	integer	no	Number of documents to generate. Default: `5`.
`prompt_template`	string	no	LLM prompt template for generation. May reference entity field values via `{{entity.field}}` placeholders.
`required_entities`	list	no	Entity labels that must exist before documents of this type can be generated.

document_templates:
  - id: discharge_summary
    name: Discharge Summary
    description: Patient discharge summaries after hospital stays
    count: 8
    prompt_template: |
      Write a discharge summary for patient {{patient.name}} (ID: {{patient.patient_id}})
      discharged from {{facility.name}} on {{date}}.
      Attending physician: Dr. {{provider.name}} ({{provider.specialty}}).
      Diagnoses: {{diagnosis_list}}. Treatments received: {{treatment_list}}.
      Include discharge instructions and follow-up plans.
    required_entities: [Patient, Provider, Facility, Diagnosis, Treatment]

  - id: referral_letter
    name: Referral Letter
    description: Provider-to-provider referral communications
    count: 6
    prompt_template: |
      Write a referral letter from Dr. {{referring.name}} ({{referring.specialty}})
      to Dr. {{specialist.name}} ({{specialist.specialty}}) at {{facility.name}}
      for patient {{patient.name}}.
      Reason for referral: {{diagnosis.name}}. Include relevant history and findings.
    required_entities: [Provider, Patient, Diagnosis, Facility]

`decision_traces`

Decision trace scenarios define multi-step reasoning patterns for agent memory. Each trace records the thought process an agent follows for a given task, stored as :DecisionTrace → :HAS_STEP → :TraceStep chains in Neo4j.

Field	Type	Required	Description
`id`	string	yes	Trace identifier.
`task`	string	yes	The task or question being reasoned about. May reference entities via `{{entity.field}}` placeholders.
`steps`	list	no	Ordered list of reasoning steps.
`outcome_template`	string	no	Template for the final outcome string.

Each step contains:

Field	Type	Required	Description
`thought`	string	yes	The agent’s internal reasoning at this step.
`action`	string	yes	The action taken (tool call, Cypher query, external lookup, etc.).
`observation`	string	no	The result of the action. Populated at generation time.

decision_traces:
  - id: readmission_risk
    task: "Assess readmission risk for {{patient.name}} being discharged after {{diagnosis.name}} treatment"
    steps:
      - thought: Review patient's hospitalization and readmission history
        action: Query past encounters and discharge outcomes
      - thought: Evaluate completeness of discharge plan
        action: Check for scheduled follow-ups, medication reconciliation, and support services
      - thought: Identify risk factors from similar patients
        action: Find patients with similar diagnoses and demographics, check their readmission rates
    outcome_template: "Risk level: {{risk_level}}. Mitigation: {{actions}}"

`demo_scenarios`

Pre-built chat scenarios displayed in the generated frontend. Each scenario provides a sequence of prompts the user can click to demo the agent without typing.

Field	Type	Required	Description
`name`	string	yes	Scenario display name.
`prompts`	list	yes	Ordered list of chat messages to send.

demo_scenarios:
  - name: Patient Lookup
    prompts:
      - "Show me all patients with a chronic diagnosis"
      - "What medications are currently prescribed to patients in the cardiology department?"
      - "Find all recent patient encounters in the last 6 months"

  - name: Clinical Decision Support
    prompts:
      - "Are there any potential drug interactions in current prescriptions?"
      - "What treatments have been most effective for patients with heart failure?"
      - "Show me the most recent decision traces for treatment plans"

`agent_tools`

Domain-specific tools the AI agent can call. Each tool maps to a parameterized Cypher query executed against Neo4j. The description field is passed directly to the LLM as the tool’s description, so write it as you would a docstring. All agent tools must return their results as a JSON-serialized string (json.dumps(result, default=str)). The default=str handler ensures Neo4j-specific types like datetime and spatial values serialize correctly.

Field	Type	Required	Description
`name`	string	yes	Tool function name (snake_case).
`description`	string	yes	What the tool does. Passed to the LLM as the tool description.
`cypher`	string	no	Cypher query to execute. Use `$param_name` for parameters.
`parameters`	list	no	List of parameter definitions (same schema as entity properties).

agent_tools:
  - name: search_patient
    description: Search for patients by name or ID
    cypher: |
      MATCH (p:Patient)
      WHERE toLower(p.name) CONTAINS toLower($query)
         OR p.patient_id = $query
      OPTIONAL MATCH (p)-[r]-(related)
      RETURN p, type(r) AS rel_type, related
      LIMIT 20
    parameters:
      - name: query
        type: string
        description: Patient name or ID

  - name: get_patient_history
    description: Get comprehensive history for a patient
    cypher: |
      MATCH (p:Patient {patient_id: $patient_id})
      OPTIONAL MATCH (p)-[:HAD_ENCOUNTER]->(e:Encounter)-[:OCCURRED_AT]->(f:Facility)
      OPTIONAL MATCH (p)-[:DIAGNOSED_WITH]->(d:Diagnosis)
      OPTIONAL MATCH (e)-[:INCLUDES]->(t:Treatment)
      RETURN p, collect(DISTINCT e) AS encounters,
             collect(DISTINCT d) AS diagnoses,
             collect(DISTINCT t) AS treatments,
             collect(DISTINCT f) AS facilities
    parameters:
      - name: patient_id
        type: string
        description: Patient ID

  - name: list_patients
    description: "List Patient records with optional limit"
    cypher: |
      MATCH (n:Patient)
      RETURN n
      ORDER BY n.name
      LIMIT toInteger($limit)
    parameters:
      - name: limit
        type: string
        description: "Maximum number of results to return (default: 10)"

Every domain should include at least one list_* tool and one get_*_by_id tool so the agent can enumerate and drill into records. The built-in domains each provide 7–8 tools following this pattern.

`system_prompt`

A multi-line string that becomes the agent’s system prompt. It should describe the agent’s role, capabilities, and behavioral guidelines for the domain. Keep it grounded in what the agent tools can actually do.

The generated agent templates automatically append a tool-use emphasis suffix: "IMPORTANT: You MUST use the available tools to query the knowledge graph before answering any question about the data." You do not need to add this yourself.

system_prompt: |
  You are an AI clinical intelligence assistant with access to a comprehensive
  knowledge graph of healthcare data. You help clinicians, care coordinators,
  and medical staff analyze patient records, diagnoses, treatments, and
  provider networks.

  Your capabilities include:
  - Searching patient records and clinical history
  - Checking medication contraindications and interactions
  - Finding similar past cases for clinical decision support
  - Analyzing provider referral networks
  - Tracing treatment decisions and outcomes

  Always prioritize patient safety. Flag potential contraindications immediately.
  Provide evidence-based insights grounded in the clinical data available.

`visualization`

Configuration for the NVL (Neo4j Visualization Library) graph view in the frontend. All fields are optional — sensible defaults are derived from entity_types colors automatically.

Field	Type	Required	Description
`node_colors`	map	no	Label → hex color mapping. Overrides the `color` field from `entity_types`.
`node_sizes`	map	no	Label → pixel size mapping. Default: `20`.
`default_cypher`	string	no	Initial Cypher query for the graph view. Default: `MATCH (n)-[r]->(m) RETURN n, r, m LIMIT 100`.

If node_colors is not specified for a label, the color field from the corresponding entity_types entry is used automatically.

visualization:
  node_colors:
    Patient: "#06b6d4"
    Provider: "#14b8a6"
    Diagnosis: "#ef4444"
    Treatment: "#10b981"
    Encounter: "#f59e0b"
    Facility: "#6366f1"
    Medication: "#8b5cf6"
  node_sizes:
    Patient: 25
    Provider: 25
    Diagnosis: 20
    Treatment: 15
    Encounter: 15
    Facility: 30
    Medication: 15
  default_cypher: "MATCH (p:Patient)-[r]-(n) RETURN p, r, n LIMIT 100"

Complete minimal example

The following is a valid minimal domain YAML — enough to scaffold a working project with a single entity type and one agent tool:

inherits: _base

domain:
  id: bookstore
  name: Bookstore
  description: Book inventory, customers, and sales
  tagline: "AI-powered Book Recommendations"
  emoji: "\U0001F4DA"

entity_types:
  - label: Book
    pole_type: OBJECT
    color: "#8b5cf6"
    icon: book
    properties:
      - name: title
        type: string
        required: true
      - name: isbn
        type: string
        unique: true
      - name: genre
        type: string
      - name: price
        type: float

relationships:
  - type: PURCHASED
    source: Person
    target: Book
  - type: AUTHORED
    source: Person
    target: Book

agent_tools:
  - name: search_books
    description: Search for books by title or genre
    cypher: |
      MATCH (b:Book)
      WHERE b.title CONTAINS $query OR b.genre = $query
      RETURN b
    parameters:
      - name: query
        type: string
        required: true

system_prompt: |
  You are a bookstore assistant with access to the inventory
  and customer purchase history.

demo_scenarios:
  - name: Book Recommendation
    prompts:
      - "What science fiction books do we have in stock?"
      - "Who has purchased the most books this month?"

Adding a custom domain

You can generate a complete domain YAML from a plain English description using the --custom-domain flag. The LLM uses _base.yaml and two reference domain YAMLs as few-shot examples, then validates the output against the DomainOntology Pydantic model (up to 3 retry attempts).

uvx create-context-graph my-app \
  --custom-domain "veterinary clinic management" \
  --framework pydanticai \
  --anthropic-api-key $ANTHROPIC_API_KEY \
  --demo-data

To save a custom domain for reuse, generated YAMLs are stored at ~/.create-context-graph/custom-domains/. For the full domain list and CLI flag values, see the Domain Catalog. For framework selection, see Framework Comparison.

Documentation Index

​Top-level structure

​inherits

​domain

​entity_types

​POLE+O types

PERSON

ORGANIZATION

LOCATION

EVENT

OBJECT

​Property definitions

​Supported property types

​Example — healthcare entity types

​relationships

​document_templates

​decision_traces

​demo_scenarios

​agent_tools

​system_prompt

​visualization

​Complete minimal example

​Adding a custom domain

Top-level structure

`inherits`

`domain`

`entity_types`

POLE+O types

Property definitions

Supported property types

Example — healthcare entity types

`relationships`

`document_templates`

`decision_traces`

`demo_scenarios`

`agent_tools`

`system_prompt`

`visualization`

Complete minimal example

Adding a custom domain