cv

Info

Name Fengwei Liu
Label M. Cog. Sc. student
Email 1 fengweiliu@cmail.carleton.ca
Email 2 zhenya.isaac@gmail.com
Url https://isaac-fengwei-liu.github.io/
Summary My goal is to develop NLP systems inspired by multilingual minds.

Education

  • 2025.09 - 2027.07

    Ottawa, Canada

    Master of Cognitive Science
    Carleton University
    Cognitive linguistics, NLP
  • 2022.09 - 2023.06

    Kazan, Russia

    Exchange Program
    Kazan Federal University
    Philology
  • 2020.09 - 2024.06

    Shanghai, PRC

    B.A.
    Shanghai International Studies University
    Russian

Publications

Projects

  • 2024.03 - Present
    Bias mining in new media with semi-supervised clustering: Discursive representations of foreign talents in Shanghai
    A collaboration with Dr. Noboru Matsuda and Dr. Yuanyuan Liu
    • Independently modified a Python web scraper that successfully bypassed anti-scraping measures, compiling a corpus of more than 10,000 articles.
    • Cleaned and preprocessed the whole corpus, including tokenization and removal of Chinese characters, headers and footers, template texts, emoticons, and URLs.
    • Created an algorithm that exhaustively selects all collocate candidates by computing T-test over CF-ISF (Collocate Frequency – Inverse Sentence Frequency).
    • We are currently designing and experimenting with Differential Collocation Analysis, which first learns the “biased” collocates, then uses them as embeddings. I am also enhancing collocational analysis with dependency parsers, so that the limitations of context window could be solved.
    • We want to explore how discursive devices may bias the representations of foreign residents. The ultimate aim is to develop unsupervised or semi-supervised ML models that detect, cluster, and flag unknown media biases. This goal falls into the bigger framework of disinformation detection.
  • 2023.11 - 2024.04
    Diachronic Changes in Dependency Distance: Evidence from Russian Corpora
    My undergrad thesis
    • Implemented preprocessing, tokenization, and syntactic parsing on a 250-million-token diachronic corpus with spaCy.
    • Developed an algorithm that computed the Mean Dependency Distance of all sentences. Then conducted ANOVA and Tukey HSD analyses in R.
    • Reviewed literature on syntactic complexity and cognitive constraints during incremental processing. Interpreted empirical findings and deducted scientific explanations.
  • 2023.07 - 2023.09
    How topicality affects pronoun interpretation
    RAship under the supervision of Dr. Heeju Hwang at HKU
    • Coded two sentence-continuation experiments which manipulated the topicality of Mandarin sentences via comparison of canonical and left-dislocated structures. Interpreted the referents of the tested pronouns as a function of grammatical role.
    • Advanced the coding scheme via introducing a new exclusion type to assist with data cleaning.
    • Distributed experimental surveys to native Mandarin speakers and negotiated monetary incentives.
  • 2022.06 - 2023.11
    Multilingualism, language choice and identity construction: Diasporic Ukrainians in Shanghai
    An ethnographic study supervised by Dr. Yuanyuan Liu
    • Recruited 6 participants by surveying social media profiles. Conducted 6-month netnographic observations.
    • Led or assisted in 5 semi-structured interviews. Designed guided narrative frame writings.
    • Created a coding manual and conducted thematic analysis. Took substantial part in drafting and revising the manuscript.
    • Presented the findings at International Symposium on Bilingualism 14, 7th Word Congress of IPSA, and other conferences.

Employment & outreach

  • 2024.04 - 2024.07

    Shanghai, PRC

    Data Processing Intern
    Shanghai Artificial Intelligence Lab
    Assisted in the construction of a 30Tb multilingual, multimodal database
    • Investigated and provided over 300 content-rich websites and datasets in 8 languages. Drafted collection instructions for these websites, which constituted over 4tb data.
    • Conducted quality assessment for over 1000 pieces of Russian text, image, and audio samples. Annotated, categorized, and reported language-contingent issues, and suggested workarounds.
    • To address the issues found, wrote a manual that guided database specialist to distinguish Russian from other languages and deal with language-specific properties.
    • Assisted in other data-processing tasks such as toxic language in Russian.
  • 2022.08 - 2023.08

    Shanghai, PRC

    Social Media Analyst Intern
    China Youth, Shanghai International Studies University
    Translated and reported monthly on the social media activities of political organizations such as IUSY, IFM-SEI, and Asianforum Peace.
  • 2022.07 - 2022.07
    Data Contributor
    Language Index, Virtual Linguistics Campus (VLC)
    Delivered audio recordings, transcription, transliteration, and translation for Wu Chinese (Shanghainese).
  • 2022.07 - 2022.12

    Shanghai, PRC

    Marketing Intern
    BeKind, Mars Wrigley
    Supported market analysis and communicated with partner agencies to maintain campaign activities.
    • Collected sales data and managed databases for market analysis. Evaluated and assessed user behavior and assisted in selection of Key Opinion Leaders (KOL). Drafted and edited campaign materials.
    • Liaised between brand, marketing, and live-streaming agencies, as well as over eighty KOLs.

Skills

Programming Languages
Python (Pandas, sklearn, numpy, PyTorch, spaCy, nltk)
R (dplyr)
LaTeX
ML & NLP
BERT & Transformers
RNN & LSTM
Neural Networks
Classification & Clustering
Word Embedding

Languages

Wu & Mandarin Chinese
Native
English
Fluent
Russian
Intermediate