awesome-personalized-lmms

Awesome Personalized Large Multimodal Models

📝 A curated list about Personalized Multimodal Models, Personalized Representation Learning~ 📚


Problem Settings: Using 3-5 images of a novel concept/subject (e.g., a pet named `<bo>`), can we personalize Large Multimodal Models so that: (1) They retain their original capabilities (e.g., Describe a dog) while (2) Enabling tailored their capabilities for the novel concept? (e.g., Describe `<bo>`)

Papers
- Personalized Large Multimodal Models
- Personalized Representation Learning
Datasets
Applications

🌱 Contributing

Please feel free to create pull requests or an issue to add/ correct anything. I really appreciate any help or clarification!

* 🙋‍♀️ Personalization has been extensively explored in AI/ML/CV… It’s now time for personalizing Large Multimodal Models! 🙋‍♀️*
Over the years, we’ve witnessed the evolution of personalization across various tasks (e.g., object segmentation, image generation). Now, with the rise of Large Multimodal Models (LMMs) – We have opportunities to personalizing these generalist, large-scale AI systems. It’s time to take the leap and bring personalization into the realm of Large Multimodal Models, making them not only powerful but also user-specific!
^ Above caption are actually generated by GPT-4o, I feed it the figure and asked it to generate a caption, haha!

* 🙋‍♀️ Personalization has been extensively explored in AI/ML/CV… It’s now time for personalizing Large Multimodal Models! 🙋‍♀️*

Over the years, we’ve witnessed the evolution of personalization across various tasks (e.g., object segmentation, image generation).
Now, with the rise of Large Multimodal Models (LMMs) – We have opportunities to personalizing these generalist, large-scale AI systems.
It’s time to take the leap and bring personalization into the realm of Large Multimodal Models, making them not only powerful but also user-specific!

^ Above caption are actually generated by GPT-4o, I feed it the figure and asked it to generate a caption, haha!

(This figure is created by me. If there is anything incorrect, please feel free to correct me! Thank you!)

Papers

⚠️ Minor Note: The listed works below are specified for settings where users provide 3-5 images, and the system needs to learn about those concepts. There is research on other subtopics (e.g., role-playing, persona, etc.). For these topics, this repo might provide better coverage.

Personalized Large Multimodal Models

Title	Venue	Year	Input	Output	Link/ Code
─── Unified Models ───
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens	arXiv	2025	image, text	image, text
YoChameleon: Personalized Vision and Language Generation	CVPR	2025	image, text	image, text	Page
─── Vision Language Model ───
Training-Free Personalization via Retrieval and Reasoning on Fingerprints	arXiv	2025	image, text	text
PVChat: Personalized Video Chat with One-Shot Learning	arXiv	2025	video, text	text
Concept-as-Tree: Synthetic Data is All You Need for VLM Personalization	arXiv	2025	image, text	text
Personalization Toolkit: Training Free Personalization of Large Vision Language Models	arXiv	2025	image, text	text
Personalized Large Vision-Language Models	arXiv	2024	image, text	text
MC-LLaVA: Multi-Concept Personalized Vision-Language Model	arXiv	2024	image, text	text	Code
Personalized Visual Instruction Tuning	ICLR	2025	image, text	text
Retrieval-Augmented Personalization for Multimodal Large Language Models	CVPR	2025	image, text	text	Page, Code
MyVLM: Personalizing VLMs for user-specific queries	ECCV	2024	image, text	text	Page, Code
Yo’LLaVA: Your Personalized Language and Vision Assistant	NeurIPS	2024	image, text	text	Page, Code
─── Large Language Models ───
Personalized Large Language Models	ICDMw	2024	text	text
LaMP: When Large Language Models Meet Personalization	ACL	2024	text	text	Page, Code
Learning to Predict Persona Information forDialogue Personalization without Explicit Persona Description	ACL	2023	text	text
Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge	AAAI	2022	text	text	Code
A Personalized Dialogue Generator with Implicit User Persona Detection	COLING	2022	text	text
Personalizing Dialogue Agents: I have a dog, do you have pets too?	ACL	2018	text	text

Personalized Representation Learning

Title	Venue	Year	Link/ Code
Personalized Representation from Personalized Generation	ICLR	2025	Code
“This is my unicorn, Fluffy”: Personalizing frozen vision-language representations	ECCV	2024	Code

Datasets

Name	Year	# Concepts	Link	Notes
ConCon-Chi	2024	20	GitHub	with ConCon-Chi
PODS	2024	100	GitHub	with personalized-rep
MC-LLaVA	2024	–	GitHub	with MC-LLaVA, multiple concepts
Yo’LLaVA	2024	40	GitHub	with Yo’LLaVA, single concept
MyVLM	2024	29	GitHub	with MyVLM, single concept

Applications

Memory and new controls for ChatGPT

⣶⣶⣶⣶⣶⣖⣒⡄⠀⣶⡖⠲⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⣤⠠⡄⠀⠀⠀⠀ ⠙⠛⣿⣿⣿⡟⠛⠃⢀⣿⣿⣆⣦⣴⠂⠤⠀⠀⠀⣠⣤⣴⣆⠠⢄⠀⠀⠀⣤⡤⢤⣤⣤⠤⢄⠀⠀⢻⣿⣦⡇⢀⣤⢤⠀ ⠀⢀⣿⣿⣿⡇⠀⠀⢸⣿⣿⣿⠛⣿⣷⣄⡇⠀⣼⣿⣿⡟⢿⣷⡄⣣⠀⢘⣿⣿⣿⠿⣿⣧⣈⡆⠀⢹⣿⣿⣷⣾⣧⣴⠀ ⠀⢰⣿⣿⣿⠀⠀⠀⢸⣿⣿⣿⠀⣿⣿⣿⡇⠀⠙⠛⣻⣧⣾⣿⣿⡷⠀⢸⣿⣿⣿⠀⣿⣿⣿⡇⠀⢸⣿⣿⣿⣿⣿⡇⠀ ⠀⢸⣿⣿⣿⠀⠀⠀⢸⣿⣿⡿⠀⣿⣿⣿⠃⠀⣰⣾⣿⡿⣿⣿⣿⣟⠀⢸⣿⣿⣿⠀⣿⣿⣿⡇⠀⢸⣿⣿⣿⣿⡏⢇⠀ ⠀⣼⣿⣿⣿⠀⠀⠀⣸⣿⣿⣟⢠⣿⣿⣿⠀⠀⣿⣿⡟⣇⣾⣿⣿⣯⠀⢸⣿⣿⣿⠀⣿⣿⣿⡇⠀⢼⣿⣿⣿⣿⣷⡈⡀ ⠀⠻⠿⠿⠟⠀⠀⠀⠻⠿⠿⠏⠸⣿⣿⣿⠀⠀⢿⣿⣿⣿⣿⣿⣿⡇⠀⢸⣿⣿⣿⠀⣿⣿⣿⡇⠀⣿⣿⣿⡟⢻⣿⣧⣇ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠀⠀⠉⠉⠀⠀⠀⠉⠉⠁⠀⠉⠉⠉⠀⠀⠘⠙⠋⠁⠈⠋⠛⠉ ⠀⠀⠀⠀⠀⠀⢀⣠⣤⡀⠀⢀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣤⡤⠠⡄⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⢹⣿⣄⠱⣠⣿⣧⣴⠀⠀⣠⣤⣤⣀⣀⡀⠀⠀⢀⣤⠤⡀⢀⣠⡤⢄⠀⠈⣿⣿⣦⡇⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠈⢿⣿⣷⣿⣿⣿⡏⠀⣾⣿⣿⣿⣶⣄⡉⡄⠀⣿⣿⣤⣝⢸⣿⣦⣼⠀⠀⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢿⣿⣿⣿⠏⠀⠐⣿⣿⣿⠉⣿⣿⣷⡇⠀⣽⣿⣿⣯⢸⣿⣿⣿⠀⠀⢹⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⣿⣿⠀⠀⢠⣿⣿⣿⠀⣿⣿⣿⡇⠀⣻⣿⣿⡷⢸⣿⣿⣿⠀⠀⢸⣿⣿⠇⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⣿⣿⠀⠀⠀⢿⣿⣿⣄⣿⣿⣿⠇⠀⢹⣿⣿⣿⣸⣿⣿⣿⠀⠀⢠⣽⣧⡄⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠛⠛⠋⠀⠀⠀⠈⠛⠛⠛⠛⠛⠉⠀⠀⠈⠛⠛⠛⠋⠛⠛⠋⠀⠀⠈⠛⠛⠁⠀⠀⠀⠀⠀⠀⠀

And good luck with your research! 🤗✨