Hello!

I’m Unso, a computational historian and research engineer at Hugging Face.

I’m currently a researcher at Hugging Face, a start-up that aims to make AI software and AI-based science open-source and ethically responsible. I work on a wide variety of research topics ranging from privacy harms in AI systems and multilingual models. At Hugging Face, I’ve worked with researchers such as Nils Reimers on retrieval and Meg Mitchell on socio-technical investigations. My best known work to date is on data collection for socio-cultural AI systems. I am most interested in projects that have both technical rigor and humanistic depth. (last update: Dec 10, 2022)


Current Projects

Meg and I are currently finishing up a privacy project that involves creating a new benchmark for privacy in generative AI models. One of the major concerns of large language models is the problem of PII (personally identifiable information) in training data. Models could be exposed to personally sensitive information such as credit card numbers and social security numbers (paired with individuals’ names), then regenerate them upon the right prompting. While this possibility poses a major risk of leakage and libel allegations in the hands of malicious users, there exist no benchmarks or evaluation metrics to hold models against. With a new benchmark train/eval/test set for how much sensitive information can be extracted from AI models, we hope to establish a new norm for privacy auditing in corporate and research AI.

My latests release was with the SetFit group. I worked with collaborators at Intel and CohereAI on a few-shot learning method called SetFit (short for Sentence Transformer fine-tuning). Previous few-shot methods have been inaccessible for everyday researchers due to the cost of compute and need for special machines. Through a two-stage training process, SetFit matches state-of-the-art levels of text classification across very few (n=8) to few (n=64) labeled example training, while requiring little compute and functioning on models orders of magnitude smaller. Paper, repo, and blog.

I am also finishing up a multilingual retrieval project that has released a multi-terabyte sized multilingual dataset, called Clover Search. Monolingual retrieval models have suffered in cross-lingual settings where the query and relevant documents are not in the same language. Clover Search has been trained on a dataset of about 1.5 terabytes of multilingual news where the title and main body of the article examples have been treated as query and document sets and extends multilinguality to over a hundred languages. Releasing the dataset and trained model allows researchers to build on top of this work for various multilingual and cross-lingual downstream applications.

At Hugging Face, I’m also working with Kakao Brain on helping upload a bunch of open-source visual language models.

My past work can be seen here.

Education

I graduated last year from the Stanford History Department with a PhD thesis proposing a new approach of doing historical research in the era of digital abundance that I coined New Archival History. In it I demonstrate history’s transition to digital methods through an archival example of the Foreign Relations of the United States series. I had the fortune of working with two wonderful advisers across Economics and History, Gavin Wright and Zephyr Frank. I was part of the inaugural cohort of the Stanford Data Science Institute, where we launched several inter-departmental programs to foster cross-campus collaborations.

I also have an MS in Computer Science from Stanford where I focused on AI and I went to Brown University for college where I studied Economics and received a senior thesis award for best thesis in international history.

F.A.Q.s

Q. How do you pronounce your name? The most international friendly way to pronounce it is OON-SO. I go by Unso or Eun Seo in writing. :)

Q. Are you a historian? Yes, I’m a historian by training. My first publication was on the Asian-regional economic implications of the Vietnam War.

Q. Are you American? No, I was born and raised in South Korea. But I’ve lived in the U.S. for the majority of my adult-life and identify best with American culture. I’ve also lived in Singapore where I attended Australian and international schools. I’m currently located in Seoul, South Korea.