
At OpenAI, we believe that AI can meaningfully accelerate life science innovation. To test this belief, we collaborated with the Applied AI team at Retro Bio(opens in a new window), a longevity biotech startup, to create and research the impact of GPT‑4b micro, a miniature version of GPT‑4o specialized for protein engineering.
We are excited to share that we’ve successfully leveraged GPT‑4b micro to design novel and significantly enhanced variants of the Yamanaka factors, a set of proteins which led to a Nobel Prize for their role in generating induced pluripotent stem cells (iPSCs) and rejuvenating cells. They have also been used to develop therapeutics to combat blindness(opens in a new window), reverse diabetes(opens in a new window), treat infertility(opens in a new window), and address organ shortages(opens in a new window).
In vitro, these redesigned proteins achieved greater than a 50-fold higher expression of stem cell reprogramming markers than wild-type controls. They also demonstrated enhanced DNA damage repair capabilities, indicating higher rejuvenation potential compared to baseline. This finding, made in early 2025, has now been validated by replication across multiple donors, cell types, and delivery methods, with confirmation of full pluripotency and genomic stability in derived iPSC lines. To ensure the findings are discoverable and replicable to benefit the life sciences industry, we are now sharing insights into the research and development of GPT‑4b micro.

Fibroblasts (day 1)

Cells reprogrammed with
SOX2, KLF4,
OCT4, MYC (day 10)

Cells reprogrammed with
RetroSOX, RetroKLF,
OCT4, MYC (day 10)
Figure 1: Left is initial conditions; center is standard proteins; right is using our novel proteins. The increase in round, white cells indicate an increase in reprogramming. More specifically: phase-contrast images of human fibroblasts before induction (left) and 10 days after transduction with wild‑type Yamanaka factors (middle), and our re-engineered variants (right). The engineered cocktail promotes the emergence of colonies with the compact, round morphology typically seen during progression toward an iPSC‑like state.
To test our belief that AI can be used to accelerate life sciences research, we designed and trained a custom model—GPT‑4b micro—to possess a broad base of knowledge and skills across biology, with a particular focus on steerability and flexibility to enable advanced use cases such as protein engineering. We initialized it from a scaled-down version of GPT‑4o to take advantage of GPT models’ existing knowledge, then further trained it on a dataset composed mostly of protein sequences, along with biological text and tokenized 3D structure data, elements most protein language models omit.
A large portion of the data was enriched to contain additional contextual information about the proteins in the form of textual descriptions, co-evolutionary homologous sequences, and groups of proteins that are known to interact. This context allows GPT‑4b micro to be prompted to generate sequences with specific desired properties and, since most of the data is structure-free, the model handles proteins with intrinsically disordered regions just as well as structured proteins. This is particularly useful for targets like the Yamanaka factors, whose activity depends on forming numerous transient interactions with a diverse array of binding partners, rather than adopting a single stable structure (Figure 2).
Figure 2: Visualization of the 3D structure of the Yamanaka factors KLF4 (left) and SOX2 (right). Notice that the majority of these proteins are unstructured, with flexible arms that attach to other proteins.
Source: AlphaFold Protein Structure Database (left(opens in a new window), right(opens in a new window))
By training on proteins enriched with additional evolutionary and functional context, we substantially increased the effective context length of our training examples beyond that of standalone sequences. Consequently, we found that we could run prompts as large as 64,000 tokens during inference and continue to observe gains in controllability and output quality. While common in text LLMs, this context size is unprecedented in protein sequence models.
During development we observed the emergence of scaling laws similar to those seen in language models—larger models trained on larger datasets yielded predictable gains in perplexity and downstream protein benchmarks, allowing us to iterate at small scale before training the final GPT‑4b micro model. However, in silico evals for protein AI models are often of limited value, as it is unclear if such improvements translate to increased utility in the real world. To demonstrate that the model is capable of accelerating therapeutic development, we worked with Retro’s scientists, who used the model to re-engineer proteins relevant to their cell-reprogramming research program.
The Yamanaka factors—OCT4, SOX2, KLF4, and MYC (OSKM)—are some of the most important proteins in regenerative biology today, and are named after Shinya Yamanaka, who discovered their ability to reprogram adult cells into pluripotent stem cells, a breakthrough that earned him the Nobel Prize in Physiology or Medicine in 2012. Unfortunately, they suffer from poor efficiency: typically less than 0.1% of cells convert during treatment, and the process can take three weeks or more. Efficiency drops further in cells from aged or diseased donors(opens in a new window), so finding more efficient variants remains an active and important research focus.
Directly optimizing the protein sequences is hard. SOX2 contains 317 amino acids and KLF4 513; the number of possible variants is on the order of 10^1000, so traditional “directed-evolution” screens that mutate a handful of residues at a time are able to explore only a miniscule fraction of the design space. A leading academic effort(opens in a new window) tested a few thousand SOX2 mutants and found a handful of triple-mutants with a modest gain, while 15 years of work on chimeric SOX(opens in a new window) proteins has yielded variants that differ from natural SOX constituents by only five residues.
The team at Retro built a wet lab screening platform using human fibroblast (skin and connective tissue) cells, initially validating it with baseline OSKM and SOX2 variants manually designed by Retro’s scientists as part of their pilot screen (Figure 3). Then, they asked GPT‑4b micro to propose a diverse set of “RetroSOX” sequences. Over 30% of the model’s suggestions in the screen outperformed wild‑type SOX2 at expressing key pluripotency markers, even though they differed by more than 100 amino acids on average. For comparison, in traditional screens(opens in a new window), hit rates below 10% are typical.
The team next tackled reengineering KLF4, the largest of the Yamanaka factors. KLF4 is known to be replaceable with other KLF-family factors(opens in a new window) but without an increase in reprogramming efficacy. Prior attempts to improve KLF4 by expert-guided single amino acid substitutions(opens in a new window) produced a single hit out of 19. As with RetroSOX, we prompted the model to generate a set of enhanced RetroKLF variants. Overall, 14 model-generated variants were superior to the best cocktails from the RetroSOX screen—a hit rate of nearly 50% (Figure 4).
Figure 3: Percentage of cells in our wet-lab experiments that express early (SSEA4) pluripotency markers (left), and fraction of those cells that express late (TRA-1-60) markers (right) when reprogrammed with either standard OSKM or a pool of engineered variants. The pilot screen (green) uses standard methods, which have poor efficacy rates of around 0.1%, explaining the 0% expression rate of TRA-1-60.
Figure 4: Screen hit rate, i.e. fraction of protein candidates that elicited reprogramming‑associated markers at higher rates than the baseline (left), percent of protein sequence that was changed relative to the wild-type variant found in humans (right). “Conventional” methods refer to few-amino-acid mutation screens.
Combining the top RetroSOX and RetroKLF variants produced the largest gains. Across three independent experiments, fibroblasts showed a dramatic rise in both early (SSEA-4) and late (TRA-1-60, NANOG) markers, with late markers appearing several days sooner than under the wild-type OSKM cocktail (Figure 5).
Figure 5: Two late-stage markers (TRA-1-60 & NANOG) appear strongly enriched at day 10 whereas at this timepoint, no expression can be detected in cells reprogrammed with wild-type OSKM. RK1 through RK4 correspond to different RetroSOX and RetroKLF variants.
In addition, the RetroSOX and RetroKLF variants were validated by alkaline phosphatase (AP) staining at day 10, confirming that the colonies not only express late-stage pluripotency markers but also exhibit robust AP activity indicative of pluripotency (Figure 6).
Figure 6: Purple colonies indicate successful stem cell reprogramming; more intense and numerous colonies indicate greater efficiency. Alkaline phosphatase (AP) staining of fibroblast-derived colonies at day 10 after transduction with RetroSOX and RetroKLF variants demonstrates robust AP activity, suggesting that the engineered variants have induced pluripotency in the colonies.
To further confirm the improved reprogramming efficiency and explore clinical potential, we tested a different delivery method (mRNA instead of viral vectors) and another cell type—mesenchymal stromal cells (MSCs)—derived from three middle-aged human donors (over 50 years old). Within just 7 days, more than 30% of the cells began expressing key pluripotency markers (SSEA4 and TRA-1-60), and by day 12, numerous colonies appeared with morphology similar to typical iPSCs (Figure 7, left and center). Over 85% of these cells activated endogenous expression of critical stem cell markers, including OCT4, NANOG, SOX2, and TRA-1-60.
We then verified that these RetroFactor-derived iPSCs could successfully differentiate into all three primary germ layers (endoderm, ectoderm, and mesoderm). Additionally, we expanded multiple monoclonal iPSC lines over several passages, confirming healthy karyotypes (Figure 7, right) and genomic stability suitable for cell therapies. These results consistently surpassed benchmarks obtained from conventional iPSC lines generated by contract research organizations using standard factors, further supporting the robustness of our engineered variants. Moreover, they provide evidence of enhanced iPSC generation across different delivery modalities and cell types.



Figure 7: The three panels show key markers of successful stem cell reprogramming: colonies with the characteristic rounded, tightly-packed appearance of stem cells (left), positive staining for TRA-1-60 stem cell marker shown in green (middle), and normal chromosome structure (right)—together confirming healthy, fully reprogrammed stem cells.
Taken together, the high hit rates, deep sequence edits, accelerated marker onset, and AP+ colony formation provide early evidence that AI-guided protein design can substantially accelerate progress in stem cell reprogramming research.
Motivated by these results, we next investigated the rejuvenation potential of our re-engineered variants, specifically examining their ability to restore youthful characteristics to aged cells. We focus on DNA damage, which causes impaired cellular function and is a canonical hallmark of aging(opens in a new window). Earlier work(opens in a new window) has demonstrated that Yamanaka factors can erase DNA damage-related aging markers in cells derived from mice without fully reverting cell identity. We sought to find out whether our variants showed enhanced rejuvenation capabilities relative to baseline OSKM.
Figure 8: DNA‑damage assay following doxorubicin stress, lower is better. Human fibroblasts were treated with doxorubicin to induce double‑strand breaks and then reprogrammed with either GFP (negative control), wild‑type OSKM, or the engineered RetroSOX + RetroKLF variants along with the remaining reprogramming factors (O, K/S, M). γ‑H2AX immunostaining was used to quantify DNA damage: lower fluorescence intensity means fewer breaks and better repair. Cells expressing the Retro variants show a marked drop in γ‑H2AX signal relative to both controls (GFP vs RS4: p=0.03; GFP vs RS5: p=0.01; OSKM vs RS4: p=0.04), demonstrating more effective repair after the same genotoxic challenge.
In our DNA‑damage assay, cells treated with the RetroSOX/KLF cocktail showed visibly less γ‑H2AX intensity—a marker of double‑strand breaks—than cells reprogrammed with standard OSKM or a fluorescent control (Figure 8).
These results suggest that the RetroSOX/KLF cocktail reduces DNA damage more effectively than the original Yamanaka factors. By ameliorating one of the core hallmarks of cellular aging, the engineered variants offer a potential path toward improved cell rejuvenation and use in future therapies.
To OpenAI, this work is an illustration of how quickly a domain-specific model can deliver breakthrough results on a focused scientific problem. “When researchers bring deep domain insight to our language-model tooling, problems that once took years can shift in days,” says Boris Power, who leads research partnerships at OpenAI. “We look forward to seeing what other advances emerge as more teams pair their expertise with the models we’re building.”
OpenAI and Retro Contributors
Model development leads: John Hallman, Aaron Jaech, Rico Meinl
Science, dry lab: Madison Ueland (PhD student), Andrei Tarkhov (lead)
Science, wet lab: Kevin Joseph (PhD student), Jacqueline Larouche (lead)
Core contributors: Swathi Karthikeyan, Darren Yu, Sonja Mihailovic, Jonathan Hales, Tomas Matteson
Leadership: Boris Power, Joe Betts-LaCroix
Special thanks to: Chris Koch, Morgan Griffiths, Raul Arora, Tobias Fehlmann, Jared Tumiel, Bry Nguyen, Shelly Huynh
—
Notes: GPT-4b micro was developed for research purposes and is not broadly available. Sam Altman is an investor in Retro Biosciences. OpenAI’s research team formed the collaboration.


