二周一胡bot – TwoChowOneHu Bot

(should be) The first real literature bot on Weibo

John C

May 2020

Follow it on Weibo

@二周一胡bot

Concept

Weibo is the biggest (and only) social media in Mainland China. To be honest, before this project, I didn’t even have a Weibo account. So this project is also a kind of exploration of Weibo, which many people (including some of my friends) thinks it as a miniature of the cyberspace in Mainland China.

At first I imagined Weibo as the Mainland version of Twitter. Well, The are very similar: they both had a short text limit of 140 characters (although now you can go beyond that); people post text, image, and video, others comment, like, dislike and repost; etc.. But when I started to build the bot, I found Weibo is more complicated that I thought. Although the structure of Weibo is like Twitter, what people do on it is more like a mix of what people do on Twitter (posting text), Facebook (have a fun club) and Instagram (you know the hot girls showing there photos). Because no other competitor in Mainland China, everyone in Mainland, from the farmers to the Billionaires, uses it to comment on politics, Donald Trump (He’s really popular everywhere), pop stars, new digital products and everything. That make Weibo a very interesting place.

As twitter bots became popular in the recent years, there are many “bots” appeared on Weibo. However, because Weibo doesn’t provide cool APIs like Twitter, those “bots” are all people: some one hide behind the account and pretend to be a bot. (There are articles about this trend like this, but in Chinese only.) So the idea of making a real non-human bot on Weibo is also very interesting.

Artistic

The name of this bot contains 3 persons. TwoChow(二周) refers to S.R.Chow (周樹人) and Z.R.Chow (周作人), they are brothers, and also two of the best essayists in modern Chinese history. OneHu refers to Hu XiJin(胡錫進), the editor of Global Times, who’s the representative of “Weibo writers”.

There have been a discussion since 2010 something about how language become “flat”, “ugly” and lack of aesthetics in China’s cyberspace. And Weibo, is the hardest hit area of this language degenerate. The data I got from Weibo seems to agree with that, the language people use on it are highly compressed – there are only about 1000-2000 unique characters, which is nothing compared to the Chinese character set (more than 80,000 characters). And the words (a word normally consists of 1-4 characters) in different Weibo are highly similar too, from what I see I estimate there are just about a thousand or few hundreds words in the Weibos I got.

But how this language changed look like? How is the Weibo writings different from the writings that writers put on magazines in the old time? To compare the different writings, TwoChowOneHu Bot will post generated fragments, one from Weibo writing and one from the essays and diary of two Chows, every hour.

Personally, I think, from the generated fragment, the Weibo writing is far worse than the essays in early 20th century. I know that Chows are good writers and people on Weibo are mainly common people, it’s not that fair. But it still shows something: the difference of writing style shows the difference of what we read now and what people read in the old time. And I think the degeneration of Chinese language (in cyberspace) is an undeniable fact.

Technical

Making a bot on social media with out APIs is hard. To some degree it’s hacking the system. I use puppeteer to get content on Weibo, It take some time for me to figure out which elements on the page contain the Weibo text and other information. To send Weibo, I used the https library to simulate the POST request sent when you send a Weibo on the Weibo website.

I still don’t fix the login problem, since from 2018 Weibo require “I am not a Robot” test on every login. I have some idea, like get the picture of the test and send it to a AI Website to do the recognition, but try it and make it run reliably takes long time.

Because it’s a Weibo bot, it should use simplified Chinese. So I change the essays form traditional Chinese to simplified Chinese, but I don’t have time to look through the 300k characters text so there might be some errors in the transition (since the translate is know to be very unreliable).

To generate the text I use Rita version1 (There are some problems running Rita v2 on Heroku and I just don’t have time to fix them, yet), with lots of (strange) rules to clean up the text. There are so many strangle symbols people used on Weibo.

Screenshots

thumbnail

big ones

Github Repo

here

Different version can be tracked.

Process Images

here is some screenshots for the early develop stage

getting Weibos (full text can’t be get by that time XD)

generate and sending text

Reference

Use node.js and libraries: Rita v1, fs, https, puppeteer

Use essays by 周樹人 and 周作人; Use content from Weibo

More reference of coding in the code comment