Dr. Olga Pelloni (Sozinova)

Senior Research Engineer, NLP

Curriculum Vitae


I am currently working at Telepathy Labs as a Senior Research Engineer. Within the Speech and Language Resources Team, my role involves collection, annotation, analysis and maintenance of big textual data.
My recent achievement is a developed sampling strategy for a domain-specific text-to-speech corpus. Lately, I've been experimenting with data generation using the LLMs like GPT-3.5 and LLaMA-2, and testing out prompt engineering and fine-tuning techniques.
Previously, I received a PhD degree in Computational Linguistics at the University of Zurich and worked as a researcher at the University Research Priority Program "Language and Space".

Projects

Andersen | GPT-2 Stories

Andersen | GPT-2 Stories

Project management, Flask web development
Project management for the scientific fair Scientifica 2021. Web development of a text generation robot in four languages. Backend debugging and full frontend development.
2021 University of Zurich

sBayes

Python package development
Participating in the Python package development, creating the architecture of classes, cleaning the code. Designing the logo.
2020 University of Zurich

Zurich Tangram Corpus

Flask website, web development
Full stack development of a multimedia corpus.
Information about the project (Online publication of the Zurich Tangram Corpus)
Link to the corpus (available only from the UZH VPN)

2018—2019 University of Zurich

Russian Rhyme Database

Russian Rhyme Database (demo)

Django website + Neo4j database, web development
Russian rhyme database is the first web resource for finding Russian rhymes with references to the actual verse lines from the Russian poetry (from the 18th century to the first third of the 20th century). Full stack development.

2016 HSE University, Moscow

Guess Bayes Factor

R Shiny application, web development
Game of guessing a Bayes factor (metric from Bayesian statistics) given a scatter plot with regression lines.
2016 University of Tübingen

HSE Thai Corpus

Crawling texts
Web crawling for the corpus of modern texts written in Thai language.
2015—2016 HSE University, Moscow

Beserman Dictionary

Frontend development
2015 HSE University, Moscow

Freaky Frequency

Research on frequency of Russian verb forms
Freaky Frequency is an information system based on the collection of Russian word forms and their frequency.
2013 HSE University, Moscow

Data Visualization

Data visualization for my talk Subword Geometry: Picturing Word Shapes at the workshop SIGTYP 2021, co-located with NAACL 2021.

2021 University of Zurich

Entropy at different BPE merges

R, ggplot & gganimate

Data visualization for the paper
Gutierrez-Vasques, X., C. Bentz, O. Sozinova and T. Samardzic (2021). From characters to words: the turning point of BPE merges. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 3454—3468.

2021 University of Zurich

TeDDi Sample corpus, overview

JavaScript based on amCharts
Dynamic overview of the number of tokens gathered for different languages and genres in TeDDi Sample (stats as per year 2020).

2020 University of Zurich

Clusters of Russian rhymes

D3.js
Dynamic visualizations of the Russian rhymes' clusters. Links to the visualizations of different time periods:
18th century
19th century, 1st third
19th century, 2nd third
19th century, last third
20th century, 1st third

Related links:
Abstract for DH2016
Project description (in Russian)

2016 HSE University, Moscow, Russia