If you'd like to do GRPO, it works in Unsloth if you disable fast vLLM inference and use Unsloth inference instead. Follow our Vision RL notebook examples.
Платон Щукин (Редактор отдела «Экономика»)
,推荐阅读clash下载获取更多信息
Otherwise, the argument is treated as a pattern and matched against the
Что думаешь? Оцени!