Switching embedding models in production invalidates all existing vectors because embeddings from different models are incompatible; migration strategies include gradual rollover with dual-write and A/B testing, collection versioning, and background incremental re-embedding to avoid downtime.
When you switch embedding models, vectors generated by the old model and new model are not comparable—cosine similarity between them is meaningless[reference:12]. This is because each embedding model maps text into a different vector space. To migrate without downtime, use a phased approach: create a new collection/namespace for new embeddings, run dual-write to both old and new models for new/updated documents, and perform background re-indexing of existing documents. Gradual rollover over 1–2 weeks allows A/B testing and safe fallback[reference:13].
Versioned collections: Name collections by model version (e.g., embeddings_v1, embeddings_v2) for easy rollback[reference:14]
Store model metadata: Add embedding_model, model_version, embedding_dims to each document's metadata[reference:15]
Incremental updates: Track last embedding timestamp per document; re-embed only outdated documents using background workers[reference:16]
Semantic versioning: Classify changes as breaking (new dimensions) or non-breaking (same dimensions)[reference:17]
Gradual rollover: Run dual-write, A/B test relevance, migrate agents over 1–2 weeks, archive old collection[reference:18]
Fallback strategy: Keep previous embeddings for agents needing stability; newer agents use latest model[reference:19]