AI Stock-Picking Portfolios Are Managing $150 Million in Real Money. Here Is What the Early Results Actually Tell Us
-

The Autopilot mirror-trading platform hosting both the Grok and Claude AI portfolios manages roughly $150 million in mirrored capital across a wider lineup of AI-managed strategies, representing one of the most significant real-money tests of autonomous AI investment decision-making currently running publicly. Anyone following or mirroring these portfolios is making genuine financial decisions based on AI agent outputs, which makes the performance gap between Grok's 59% nine-month return and Claude's 2.6% two-month return more than an interesting experiment — it has direct financial consequences for the people whose capital is allocated to each strategy. The risks that market professionals flag are worth stating clearly: both portfolios carry concentration risk from their thematic tilts, fees apply to mirrored strategies that reduce net returns, and strong recent performance in a specific market environment does not predict future results when that environment changes. Grok's semiconductor and infrastructure bet has worked in a period defined by AI capex spending and mega-cap dominance; whether it holds up through a sector rotation or earnings disappointment is genuinely unknown.
The broader question the experiment raises is whether AI models can consistently outperform human portfolio managers or passive index investing over meaningful time horizons rather than favorable short windows. Raullen.eth, an AI builder with a significant following on X, expressed skepticism directly: "They lack real intelligence, so expecting them to trade and consistently beat humans in a reasonable timescale doesn't make sense."That critique applies equally to both portfolios and points at the fundamental challenge of AI-driven investing — models that are trained on historical data and equipped with real-time research capabilities can identify patterns and execute disciplined processes, but the market's forward-looking randomness, driven by geopolitical events, earnings surprises, and sentiment shifts, does not reliably reward pattern recognition in the short term. The two portfolios offer a genuinely useful public dataset as the experiment continues, but investors considering mirroring either strategy should weight the nine-month and two-month track records accordingly rather than extrapolating headline returns into forward expectations.