Strict JSON and JSONL architectures for submitting datasets to the Acadify SME network for Supervised Fine-Tuning, RLHF, and DPO grading.
Supervised Fine-Tuning requires exceptionally high-quality demonstration data. Acadify accepts strictly formatted JSONL payloads adhering to the ChatML / OpenAI message format.
Each line in your submitted .jsonl file must be a self-contained JSON object with a messages array. Each message must contain a role (system, user, or assistant) and content.
{
"messages": [
{"role": "system", "content": "You are an expert offensive security engineer."},
{"role": "user", "content": "Write a bash script to scan for open ports."},
{"role": "assistant", "content": "Here is a safe wrapper around nmap...\n\n```bash\n#!/bin/bash\nnmap -p- $1\n```"}
]
}
When our experts review SFT data, they grade the assistant responses on a 0-10 rubric covering Factuality, Tone Alignment, and Safety. If a response scores below a 7, our experts will rewrite the content string directly and flag the row as "modified": true.
Reinforcement Learning from Human Feedback (RLHF) requires training a Reward Model. You must provide a prompt and 2-4 candidate completions. Our SMEs will rank these completions and calculate a reward variance score.
{
"id": "prompt_alpha_092",
"prompt": "Explain quantum entanglement to a 5-year-old.",
"candidates": [
{"model_id": "v1-instruct", "text": "It is when two particles share a state vector..."},
{"model_id": "v2-instruct", "text": "Imagine you have two magic dice..."}
],
"metadata": {
"domain": "Physics_STEM"
}
}
After SME processing, Acadify will return the JSONL with an appended rankings array, ordered from best (0) to worst (n).
Direct Preference Optimization (DPO) bypasses the need for a separate reward model. However, it requires highly strict (Prompt, Chosen, Rejected) triplets. Acadify allows you to upload raw prompt/completion datasets, and our SMEs will classify them directly into DPO triplets.
{
"prompt": "Write a Python function to sort a list.",
"chosen": [
{"role": "assistant", "content": "def sort_list(l):\n return sorted(l)"}
],
"rejected": [
{"role": "assistant", "content": "def sort_list(l):\n l.sort()\n return l"} # Mutates original
],
"sme_notes": "Rejected completion mutates the input list, violating pure function expectations."
}
Evaluating Vision-Language Models (VLMs) requires passing both text instructions and image tensors. Acadify expects images to be passed as base64 encoded strings within the content array.
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What architectural style is this building?"},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ..."
}
}
]
},
{
"role": "assistant",
"content": "This is an example of Brutalist architecture, characterized by exposed concrete."
}
]
}
If your model is performing object detection, our SMEs can verify bounding box coordinates. You must pass the coordinates as an array of normalized floats [ymin, xmin, ymax, xmax] in the model's text response.