cv
Info
| Name | Fengwei Liu |
| Label | M. Cog. Sc. student |
| Email 1 | fengweiliu@cmail.carleton.ca |
| Email 2 | zhenya.isaac@gmail.com |
| Url | https://isaac-fengwei-liu.github.io/ |
| Summary | My goal is to develop NLP systems inspired by multilingual minds. |
Education
-
2025.09 - 2027.07 Ottawa, Canada
-
2022.09 - 2023.06 Kazan, Russia
-
2020.09 - 2024.06 Shanghai, PRC
Publications
-
2024.04.10 Multilingualism, language choice, and identity construction: Diasporic Ukrainians in Shanghai
Journal of Sociolinguistics
This study shows that the construction of diasporic identity is, first of all, an individualistic process, articulating the affective aspect in their seeking psychological stability; it is also a social practice indicating diasporic people's strategies in seeking social stability.
Projects
- 2024.03 - Present
Bias mining in new media with semi-supervised clustering: Discursive representations of foreign talents in Shanghai
A collaboration with Dr. Noboru Matsuda and Dr. Yuanyuan Liu
- Independently modified a Python web scraper that successfully bypassed anti-scraping measures, compiling a corpus of more than 10,000 articles.
- Cleaned and preprocessed the whole corpus, including tokenization and removal of Chinese characters, headers and footers, template texts, emoticons, and URLs.
- Created an algorithm that exhaustively selects all collocate candidates by computing T-test over CF-ISF (Collocate Frequency – Inverse Sentence Frequency).
- We are currently designing and experimenting with Differential Collocation Analysis, which first learns the “biased” collocates, then uses them as embeddings. I am also enhancing collocational analysis with dependency parsers, so that the limitations of context window could be solved.
- We want to explore how discursive devices may bias the representations of foreign residents. The ultimate aim is to develop unsupervised or semi-supervised ML models that detect, cluster, and flag unknown media biases. This goal falls into the bigger framework of disinformation detection.
- 2023.11 - 2024.04
Diachronic Changes in Dependency Distance: Evidence from Russian Corpora
My undergrad thesis
- Implemented preprocessing, tokenization, and syntactic parsing on a 250-million-token diachronic corpus with spaCy.
- Developed an algorithm that computed the Mean Dependency Distance of all sentences. Then conducted ANOVA and Tukey HSD analyses in R.
- Reviewed literature on syntactic complexity and cognitive constraints during incremental processing. Interpreted empirical findings and deducted scientific explanations.
- 2023.07 - 2023.09
How topicality affects pronoun interpretation
RAship under the supervision of Dr. Heeju Hwang at HKU
- Coded two sentence-continuation experiments which manipulated the topicality of Mandarin sentences via comparison of canonical and left-dislocated structures. Interpreted the referents of the tested pronouns as a function of grammatical role.
- Advanced the coding scheme via introducing a new exclusion type to assist with data cleaning.
- Distributed experimental surveys to native Mandarin speakers and negotiated monetary incentives.
- 2022.06 - 2023.11
Multilingualism, language choice and identity construction: Diasporic Ukrainians in Shanghai
An ethnographic study supervised by Dr. Yuanyuan Liu
- Recruited 6 participants by surveying social media profiles. Conducted 6-month netnographic observations.
- Led or assisted in 5 semi-structured interviews. Designed guided narrative frame writings.
- Created a coding manual and conducted thematic analysis. Took substantial part in drafting and revising the manuscript.
- Presented the findings at International Symposium on Bilingualism 14, 7th Word Congress of IPSA, and other conferences.
Employment & outreach
-
2024.04 - 2024.07 Shanghai, PRC
Data Processing Intern
Shanghai Artificial Intelligence Lab
Assisted in the construction of a 30Tb multilingual, multimodal database
- Investigated and provided over 300 content-rich websites and datasets in 8 languages. Drafted collection instructions for these websites, which constituted over 4tb data.
- Conducted quality assessment for over 1000 pieces of Russian text, image, and audio samples. Annotated, categorized, and reported language-contingent issues, and suggested workarounds.
- To address the issues found, wrote a manual that guided database specialist to distinguish Russian from other languages and deal with language-specific properties.
- Assisted in other data-processing tasks such as toxic language in Russian.
-
2022.08 - 2023.08 Shanghai, PRC
Social Media Analyst Intern
China Youth, Shanghai International Studies University
Translated and reported monthly on the social media activities of political organizations such as IUSY, IFM-SEI, and Asianforum Peace.
-
2022.07 - 2022.07 Data Contributor
Language Index, Virtual Linguistics Campus (VLC)
Delivered audio recordings, transcription, transliteration, and translation for Wu Chinese (Shanghainese).
-
2022.07 - 2022.12 Shanghai, PRC
Marketing Intern
BeKind, Mars Wrigley
Supported market analysis and communicated with partner agencies to maintain campaign activities.
- Collected sales data and managed databases for market analysis. Evaluated and assessed user behavior and assisted in selection of Key Opinion Leaders (KOL). Drafted and edited campaign materials.
- Liaised between brand, marketing, and live-streaming agencies, as well as over eighty KOLs.
Skills
| Programming Languages | |
| Python (Pandas, sklearn, numpy, PyTorch, spaCy, nltk) | |
| R (dplyr) | |
| LaTeX |
| ML & NLP | |
| BERT & Transformers | |
| RNN & LSTM | |
| Neural Networks | |
| Classification & Clustering | |
| Word Embedding |
Languages
| Wu & Mandarin Chinese | |
| Native |
| English | |
| Fluent |
| Russian | |
| Intermediate |