This is a work related to the social movement in Hong Kong in 2019. It is also about the subtle effects brought by the fact that there is different spoken languages and writing script of Chinese, especially in the context of Hong Kong and Mainland China.

The source text is written by Leung Man Tao (梁文道). A writer and critic who is active in Mainland China, Hong Kong, and Taiwan. He was attacked by some people online from both Hong Kong and Mainland China for his neutral stand in the social movement in 2019. He’s the only public figure I know that received attacks from both regions, and after reading posts from the attackers, I noticed that, although the people who attacked him from Hong Kong and Mainland have very different ‘political stands’, behind the stands, they have very similar logics and ways of thinking. This work, to some degree, tries to show, explore, and raise questions about these similarities.

I collected Leung’s essays published in 2019 (the ones in traditional Chinese are published in Hong Kong, and the ones in simplified Chinese are published in Mainland China), tokenized the essays into sequences of words with a customized Chinese tokenizer (which is based on CC-CEDICT dictionary) and put them into a simple Markov model to generate new paragraphs.

Most of Leung’s articles in 2019 are about serious topics like politics, culture, and art, but after going through the Markov model, the generated text appears to be, somehow, meaningless, absurd, confusing, and ridiculous. I like this contrast because it somehow reminds me of the fact that the Internet (which is also the space where the attacks happened) is deconstructing discussions. That is not necessarily a good thing or necessarily a bad thing. It all depends on how we react. With the deconstruction, can we jump out of the existing perceptions and stereotypes? And after deconstructing, can we find a way back to the original question and settle down on an answer?

Reflect on one essay in the input of Leung about Cantonese, and the fact that in recent years Cantonese has been given a kind of political meaning. I wrote an algorithm to randomly choose two words in the text, replace one of them with a word that sounds similar to it in Mandarin (in simplified Chinese), and do the same to the other but with Cantonese (and traditional Chinese), then repeat the process every few seconds. Due to historical reasons Mandarin, simplified Chinese, and Mainland China are closely bound together, and so are Cantonese, traditional Chinese, and Hong Kong, although there is a population in Mainland China that speak Cantonese and write in simplified script and vice versa in Hong Kong. This impression, or stereotype, of spoken languages and writing scripts, creates boundaries for people to discuss and understand. 

The algorithm makes the text change slightly over time. After a while, because of the replacements, the text becomes more and more senseless in terms of the literal meaning of the characters, but since the replaced words share the same or a similar pronunciation with the original ones, the visitors can still guess what words were in the places before and then get a fuzzy idea of the unreplaced text.

What happened to the text is a metaphor for what happened to Leung (and others) in real life: people quote, then ‘interpret’ and twist, what he (and they) wrote/said to support their opinions on him (and them), then A’s twisted quotation is quoted by B, and B’s by C… The meaning of the words in their quotation goes further and further away from the original one. Thus, it is harder and harder for people to get to the original idea of the text. 

It is also a metaphor for the more and more divided society. With algorithms on social media and the instigation from some individuals and groups, the space for neutral opinions is lost. You are either part of us or the enemy. There is no place in between, and there is nothing in common between “us” and the enemy. Thus, the possibility of discussion and negotiation is gone. Only hate and attack are left.

Pronunciation data collected from open Cantonese dictionary, CC-Canto database and CC-CEDICT database, pre-processed by me.

