kmc020700@gmail.com • +82-10-8558-1695 • github.com/kmc0207
I am a self-motivated PhD student, deeply interested in the intersection of LLMs and RL. To me, reinforcement learning is essential for enabling LLMs to evolve beyond existing knowledge through interaction with the world. In particular, I believe that diversity and swarm intelligence are the keys to LLM RL.
We address the critical instability issues that arise when training LLMs with GFlowNet via a new loss function called Contrastive Trajectory Balance. The approach ensures stable LLM training while preserving GFN's diversity, discovering 7x more adversarial prompts than the original GFN.
Examining the distillation task on a DPO-style dataset from an RL perspective, we observed a multi-reward phenomenon. We propose a method that resolves this multi-reward issue and provide a mathematical proof for it.
Creating suitable prompts manually for each task is painful. We provide a method for generating prompts fully automatically using Online-RL, requiring only the task description, training dataset, the LLM to use the prompts, and the LLM to create them.
Advisor: Prof. Junmo Kim. Daejeon, Korea.
Advisor: Prof. Junmo Kim. Daejeon, Korea.
Minor in Math and Economics. TGPA: 3.88/4.5, Major GPA: 4.21/4.5. Gwangju, Korea.
Developed a Vision-Language Model based on LLM (Korean and Japanese) for the Naver Maps AI team. Studied model architectures with researchers, and refined Japanese/Korean vision-language data using CLIP score, aesthetic score, and other tools.