An independent AI engineer has successfully merged two powerful reasoning models into an eighteen-billion-parameter system that surpasses Alibaba’s latest large-scale release.
Developer Kyle Hessling created this efficient model by combining Qwopus, which captures the reasoning style of Claude Opus into a Qwen base, with another fine-tuned on GLM data.
The model runs comfortably on a budget-friendly graphics card like the Nvidia RTX 3060, using just over 9 GB of memory, yet it recently outperformed Alibaba’s 35-billion-parameter model across 40 capability tests.
To achieve this, Hessling used a unique method called passthrough frankenmerge. Instead of blending weights as traditionally done, he stacked 64 raw layers from two different fine-tunes originally developed by Jackrong.
The first half offers structured planning from Opus, while the second provides problem decomposition from GLM. Since existing tools couldn’t support this hybrid architecture, Hessling wrote a custom merge script and applied a targeted fine-tune to prevent the system from generating garbled code.
Although the model shows impressive benchmark results, real-world testing highlights a major usability issue for everyday consumer use. Its strong reasoning ability causes it to overthink simple prompts.
During testing on an Apple MacBook, asking the system to code a basic Snake game resulted in a lengthy forty-minute reasoning process.
The model reached its maximum token limit without producing working code, exposing a known software problem in which stacked reasoning layers induce repetitive internal loops.
While the overthinking issue remains a daily blocker for use, the broader implications for the open-source community are indeed huge.
This intriguing project demonstrates that independent tech enthusiasts using specialized, layer-by-layer solutions can successfully build lightweight models that easily rival frontier deployments from the largest corporate laboratories.
The model has generated considerable interest, garnering over 3,000 downloads within its first two weeks as the community actively works to address its repetitive reasoning issues and enhance its core functionality.