Training SWE-agents
Now the fun part - we provide details on how to operationalize SWE-smith for training SWE-agents!
Specifically, we'll cover the workflow for Rejection Sampling Fine Tuning.
SWE-agent
The documentation in this section is heavily grounded in the SWE-agent library. We do not plan to explicitly support non SWE-agent scaffolds, but it should not be difficult - the main adaptations would just be how you generate expert trajectories and predictions for evaluation.
There's several steps we'll cover:
- Creating a subset of SWE-smith task instances.
- Generating expert trajectories for those task instances.
- Training a model on the expert trajectories.
- Evaluating the model on SWE-bench (Lite/Verified/Multimodal).
Creating SWE-smith Subset
If you are using SWE-smith, the dataset of all SWE-smith is quite large. Usually, we recommend training on a subset. To curate a subset, you might use the following logic.
import json
from datasets import load_dataset
swesmith = load_dataset("SWE-bench/SWE-smith", split="train")
subset_name = "subset0"
def criteria(task_instance):
return ".pr_" in task_instance["instance_id"] and \
len(bug["FAIL_TO_PASS"]) <= 5 and \
len(bug["FAIL_TO_PASS"]) >= 2
bugs = [x for x in swesmith if criteria(x)]
print(f"Found {len(bugs)} bugs that match criteria")
with open(f"logs/experiments/{subset_name}.json", "w") as f:
json.dump(bugs, fp=f, indent=2)
Generate Expert Trajectories
-
Clone SWE-agent. Make sure to follow the installation instructions here.
-
Create a soft link of the
agent/
folder to SWE-agent, meaning in SWE-agent, run:ln -s path/to/SWE-smith/agent/ .
-
In SWE-agent, run exeprt trajectory generation:
Check the file to see how the script works. You'll need to adjust the./agent/_gen_trajs.sh
--instances.path
argument to point to the subset you created in the previous step.
Train Model
The previous step will generate individual trajectories per task instance under the SWE-agent/trajectories/<username>/<run ID>/
folder.
We'll now determine which trajectories correspond to resolved instances, convert them to a format that can be used for SFT, and then train a model with them.
- (From SWE-smith) Run evaluation on training task instances.
python -m swesmith.harness.eval \ --dataset_path path/to/subset0.json \ --predictions_path path/to/trajectories/<username>/<run ID>/preds.json \ --run_id <run ID> \ --max_workers 10 \ --timeout 240
preds.json
If there is no preds.json
, run sweagent merge-preds trajectories/<username>/<run ID>/
.
This evaluation will generate a logs/run_evaluation/<run ID>/
folder with a report.json
file indicating which instance IDs were successfully resolved.
- (From SWE-smith) Convert trajectories into SFT format.
python -m swesmith.train.traj_mgr.transform_to_ft \
--traj_dir path/to/trajectories/<username>/<run ID>/ \
--eval_dir logs/run_evaluation/<run ID>/ \
--only_resolved
This will product an ft_xml_*.jsonl
file under the trajectories_sft/
folder.
This dataset can be used directly for SFT.
- Run training. First, upload the file to Modal
modal volume put <volume> trajectories_sft/ft_xml_*.jsonl
Then, modify config/train/full_ft_qwen_7b.yml
to point to the file in Modal.
Finally, run the training script:
./scripts/train.run_ft_torchtune.py
Evaluation
Run inference on SWE-agent + your SFT'ed model on SWE-bench (Lite/Verified/Multimodal).
-
(From SWE-smith) Update
scripts/train.serve_sglang.sh
to point at SFT'ed model, then run it. -
(From SWE-agent) Run inference:
Make sure the Modal URL is correct and change the evaluation dataset as desired../agent/_infer_model.sh
-
When inference finishes, run evaluation on the model's predictions. (Check out sb-cli for more information on how to conveniently run evaluation for SWE-bench-* datasets.)
sb-cli submit swe-bench_verified test \ --predictions_path trajectories/<username>/<run ID>/preds.json \ --run_id <run ID>