Dataset Schemas

Strict JSON and JSONL architectures for submitting datasets to the Acadify SME network for Supervised Fine-Tuning, RLHF, and DPO grading.

SFT (ChatML)

Multi-turn conversational schemas used for behavior cloning and style alignment.

RLHF Preference

Submitting multiple model completions for rigorous human preference ranking.

DPO Triplets

Direct Preference Optimization formatting (Prompt, Chosen, Rejected).

Multimodal JSON

Structuring base64 image strings and bounding box coordinates for Vision models.

1. SFT (ChatML) Conversational Schema

Supervised Fine-Tuning requires exceptionally high-quality demonstration data. Acadify accepts strictly formatted JSONL payloads adhering to the ChatML / OpenAI message format.

The JSONL Structure

Each line in your submitted .jsonl file must be a self-contained JSON object with a messages array. Each message must contain a role (system, user, or assistant) and content.

{
  "messages": [
    {"role": "system", "content": "You are an expert offensive security engineer."},
    {"role": "user", "content": "Write a bash script to scan for open ports."},
    {"role": "assistant", "content": "Here is a safe wrapper around nmap...\n\n```bash\n#!/bin/bash\nnmap -p- $1\n```"}
  ]
}

SME Verification Constraints

When our experts review SFT data, they grade the assistant responses on a 0-10 rubric covering Factuality, Tone Alignment, and Safety. If a response scores below a 7, our experts will rewrite the content string directly and flag the row as "modified": true.

2. RLHF Preference Ranking Schema

Reinforcement Learning from Human Feedback (RLHF) requires training a Reward Model. You must provide a prompt and 2-4 candidate completions. Our SMEs will rank these completions and calculate a reward variance score.

Submission Payload

{
  "id": "prompt_alpha_092",
  "prompt": "Explain quantum entanglement to a 5-year-old.",
  "candidates": [
    {"model_id": "v1-instruct", "text": "It is when two particles share a state vector..."},
    {"model_id": "v2-instruct", "text": "Imagine you have two magic dice..."}
  ],
  "metadata": {
    "domain": "Physics_STEM"
  }
}

Verified Output Payload

After SME processing, Acadify will return the JSONL with an appended rankings array, ordered from best (0) to worst (n).

3. DPO Triplet Schema

Direct Preference Optimization (DPO) bypasses the need for a separate reward model. However, it requires highly strict (Prompt, Chosen, Rejected) triplets. Acadify allows you to upload raw prompt/completion datasets, and our SMEs will classify them directly into DPO triplets.

{
  "prompt": "Write a Python function to sort a list.",
  "chosen": [
    {"role": "assistant", "content": "def sort_list(l):\n    return sorted(l)"}
  ],
  "rejected": [
    {"role": "assistant", "content": "def sort_list(l):\n    l.sort()\n    return l"} # Mutates original
  ],
  "sme_notes": "Rejected completion mutates the input list, violating pure function expectations."
}

4. Multimodal JSON Schema (Vision Models)

Evaluating Vision-Language Models (VLMs) requires passing both text instructions and image tensors. Acadify expects images to be passed as base64 encoded strings within the content array.

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What architectural style is this building?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ..."
          }
        }
      ]
    },
    {
      "role": "assistant",
      "content": "This is an example of Brutalist architecture, characterized by exposed concrete."
    }
  ]
}

Bounding Box Validations

If your model is performing object detection, our SMEs can verify bounding box coordinates. You must pass the coordinates as an array of normalized floats [ymin, xmin, ymax, xmax] in the model's text response.