There are NOT millions of Twitter users in China. Here's proof

Originally posted by Jason Q. Ng at Blocked on Weibo, republished with permission.

The question of how many Chinese Twitter users there are made headlines a few months back when the market research company GlobalWebIndex published results from a survey which claimed that 35 million people in China used Twitter. Media outlets ran with the story of how there was a huge secret upswell in “free” netizens in China who climbed the Great Firewall to access blocked sites like Twitter, with the seeming implication being that revolución! was just around the corner. Social/human rights progress may still indeed take place in China in the near future, but most smart social media watchers agree it won’t be because of Twitter: Chinese folks just aren’t on the service in the same numbers that they are on other local social media sites like Sina Weibo, RenRen, and even upstart mobile apps like WeChat/Weixin. People (and even companies in advertisements) don’t pass around their Twitter handle in the same frequencies as they share their Weibo contact info.

Even if our eyes told us that Twitter seemed to have attracted an active but small group of activists in China—but not many others in the country—was there a possibility that we were all missing something? Was there really a secret group of Chinese Twitter users being overlooked? Fortunately, after this week, I hope we can finally dismiss GWI’s 35 million number once and for all. Inspired by an SCMP story detailing the findings of the Chinese Twitter user @ooof (h/t Steven Millward of Tech In Asia)—who cleverly used data on the website Twiyia.com to conclude that roughly 18,000 people who posted a tweet in Chinese selected Beijing as their home timezone—this weekend I performed a similar test using publicly available tweets on Twitter utilizing its API. According to the data I extracted, there are most likely tens of thousands of Twitter users in China, not millions as claimed by GWI, a result that confirms @ooof’s finding. The exact numbers @ooof and I come up with may differ, and only Twitter itself would be best able to  reveal how many Chinese Twitter users there actually are, but our independent results are likely within an order of magnitude to the actual number of Twitter users in China, unlike GWI’s result which is about 2000 times greater than our calculations. The hard evidence backs up our what our eyes are telling us.

If you’re interested in the technical information of how I performed this fairly rigorous (though certainly not at the level of an academic research paper) test, read on. (Apologies for the non-Weibo-related post; I hope it’s still of relevant to those who read this blog.)

Data collection

According to the publicly available search results data from Twitter, nearly 44,000 users posted a message that Twitter classified as a Chinese language tweet during the 24 hour period between 12:38 AM EST Thursday, Jan 3rd and 12:38 AM EST Friday, Jan 4th. I arrived at this finding by utilizing Twitter’s search by language feature which you can access via their advanced search tool or simply using the search term operator “lang:zh”. Switch it over to realtime searches (if you’re more familiar with the Twitter API, essentially changing the result_type from “mixed” to “recent”) and you have a Twitter stream of all recently posted Chinese tweets—or at least what Twitter guesses is Chinese.

Twitter, like other folks (for instance, Google Chrome, which can detect if a webpage you are visiting is in a foreign language and will suggest if you’d like to translate it into your native language), utilizes an algorithm for guessing what language a tweet is to be classified as. The algorithm is not infallible, and I noticed that a small percentage of tweets on Chinese Twitter users’ streams were being classified as Japanese. For instance, take someone who posts primarily in Chinese, like Michael Anti. If you examine his Twitter stream via the REST API [1] and look for the key “iso_laguage_code” you’ll see that the large majority of his posts are labeled as “zh”, which is the code for “zhongwen,” i.e. Chinese (中文), but as of right now, 7 of his last 100 posts are marked as Japanese (80 are Chinese and 11 as English).

image

Obviously, because of the overlap in Chinese characters and Japanese kanji, this is bound to happen for just about any computer-based analyzer. [2] I thought about just doing a search for a whole host of common Chinese characters that were less commonly used in Japanese in order to get a more “pure” and inclusive list of Chinese language tweets, for instance , , , , , , , , , , etc, but what actually gets returned is a messy mix of Japanese and Chinese posts (and not even all Chinese posts since some don’t include these words) and for it to be useful you’d then have to develop your own tool for separating out the Japanese posts. Thus, for my purposes—getting something like 80+ percent of all the Chinese tweets—Twitter’s internal classification of what is Chinese is good enough (I’ll verify this in a moment).

Next was how to download these tweets that were marked as Chinese (the language—not as from China itself, that requires another step to be explained in a moment). Twitter has a wonderful API and a ton of developer documentation. If you have a question while creating a Twitter app, someone probably has already asked it and gotten a good answer. It’s a great community, but due to some very valid concerns (remember what-used-to-be the ever-so-common fail whale?…), there’s some fairly extreme rate limiting on accessing the search and timeline API. You can only hit Twitter’s server a certain number of times an hour before it cuts you off. Plus, I couldn’t figure out a way to have the REST search API return a list of all Chinese tweets without including a search term (I get the error “You must enter a query” when I drop the “q=”).[3] This caused me to use the public search widget mentioned above, which according to Twitter matches what you’d get from the REST version anyway.[4] The great thing about the search widget was that I didn’t experience a rate limit like I would have with the REST search API, allowing me to simply keep scrolling endlessly as long as I wished (until the browser crashed due to memory constraints). I put a paperweight on my keyboard’s page down button,[5] had lunch, and came back to copy the many thousands of Tweets now in my browser.

How many tweets exactly? 193,940. These 193,940 tweets were all the original Chinese-language tweets (native retweets[6] as well as, according to Twitter, messages detected as spam, were filtered out from this public search) posted between 12:38 AM EST Thursday, Jan 3rd and 12:38 AM EST Friday, Jan 4th and able to be found via the Twitter search API. Due to time limitations and a burning anxiety to get cracking, I only did a 24 hour period. If this were an academic paper or such, I would have captured a full week’s worth of tweets or possibly even more, but, well, I didn’t feel like waiting. According to @ooof’s graph, he used a whole month’s worth of tweets, which explains why his number of active users is more than mine.

An important note: these 193,940 tweets do not include every possible tweet that someone in China might have posted. Users who have made their tweets private obviously don’t have their posts show up in public search nor did my method collect tweets from people posting in non-Chinese languages from China (thus, ex-pats in China, unless they write in Chinese, are not included in this data). But otherwise, it sure looks like everything: it even includes a Chinese-language tweet that I, a self-classified English-language user in an American timezone, sent to @ooof. But to more rigorously assess the public search’s performance, I again went back to Michael Anti’s timeline and looked at all the 14 original tweets he made during my observation period. Of the 14, I found 11 in my downloaded data (and 1 more as an old-school retweet by someone else). I checked the 3 missing tweets and they are all listed as Chinese, so perhaps Twitter classified them as spam or simply didn’t capture them in the search; regardless, 11 out of 14 isn’t bad for my purposes, and, if I wanted, I could check other user’s timelines to see how many of their tweets were included in my download and adjust my numbers accordingly to account for those missing tweets. However, the takeaway is that the tweets I downloaded are, if not absolutely everything, than fairly close, and though any calculations I make might be off by some percentage, it’s at least within the correct order of magnitude, at the very least.

Analysis

Having the set of all tweets during this 24-hour period, it was then trivial to extract out all the unique usernames (because some users posted multiple tweets during that time period), leaving us with 43,784 users who posted something in Chinese. We can then use Twitter’s GET stauses/user_timeline to look up a user’s timezone, language setting, self-described location, and geo-coordinates (here’s what mine looks like) and use a JSON parser to extract the information cleanly.

Due to rate limiting, it’s not feasible to check all 43,784 users, so I took every 73rd user (ordered by when they most recently made a post) to come up with a sample of 608 users. 165 were missing any timezone classification (two of them because they had switched to private mode, thus taking away access to their timezone info), comprising 27% of the sample, and 110 were listed as located in Beijing’s timezone,[7] 18% of the sample, numbers which largely mirror @ooof’s conclusion (see below table).

image

If I extrapolate out those percentages to my total population of 43,784 users, I get roughly 12,000 missing and 8,000 in Beijing. Of course, this 8,000 is the least it could be; as mentioned, it doesn’t include those who set their accounts to private, doesn’t include folks who may have their timezone mistakenly set elsewhere, doesn’t include users who didn’t post in that 24 hour period (these 7,921 might be considered hardcore daily Tweeters), and may miss out on any users whose tweets accidentally were marked as spam or were not captured in Twitter’s search API.[8] All of those reasons explain why my number is likely an undercount of the total number of Chinese Twitter users, but as demonstrated previously, it likely isn’t off by a whole lot. The primary reason why my number is so much lower than @ooof’s is because his data collection period appears to have lasted for a month, and thus he captured the more casual Chinese Tweeter; otherwise, my percentages largely confirm his.[9] Here’s the more detailed breakdown of which timezone user’s reported themselves as being in:

As for the other data I collected on this sample, location info was largely useless since it is user-specified. If folks decided to enter anything at all, it sometimes came in the form of fake locations like “In your HEAD” and “On your bed.” Of the 364 who did supply a location, 40 contained either “China” or 中国, and if I had time, I could sift through the rest and try and figure out if they might also be candidates to be China-based users.

Finally, I looked at the primary language a user specified in their settings, which looks like it suffers from a much greater than expected number of English language users, likely to to Twitter defaulting to English. I’m not certain how Twitter chooses your initial language, whether it’s always English unless you manually set it, or if it takes the language of the browser or perhaps your IP address (which perhaps redirects you to a location/language-specific signup page), but this data is flawed. Regardless, here’s a pie chart of the percentage of languages specified in the 608 person sample in case you’re curious.

image

Conclusion

I can’t conclusively say whether there are 10,000 or 18,000 Twitter users in China, but based on the data I pulled and the method I used to analyze it (and without knowing more, probably a method quite similar to what @ooof used), I can say conclusively that there are NOT 35 million Twitter users in China. If there were indeed that many, you’d see it in the quantity of Chinese-language tweets.[10] Looking at the Twitter stream, there just aren’t that many Chinese language tweets. However, despite the various limitations mentioned above in my data collection process (only one day, doesn’t include private accounts, doesn’t include non-Chinese language posts from China), the number of active Twitter users in China is almost definitely between 10,000 and 100,000, several orders of magnitude less than what GlobalWebIndex calculated from their social media in China survey.


Notes

[1] Version 1, which is apparently on its way to being mothballed in favor of 1.1 which will require authentication, so this link may not work in a couple months. ^

[2] Though based on what I’ve seen, Twitter’s algorithm, though serviceable, could definitely be improved. ^

[3] If someone knows what value to set q= to, by all means let me know on Twitter or via the contact form. Apparently if you have Firehose access, you don’t have to deal with rate limits. Also, if I’m reading things correctly, Twitter’s new streaming API supposedly lets developers hook into the public stream and just suck up tweets that match certain criteria with a much greater range than the simple search API that I relied on, which, as Twitter warns, is not exhaustive, supposedly with spam messages and the like being filtered out (a rather good side effect of having to use the search API rather than the streaming API). As I don’t have access to the former, which is apparently very hard to come by, and a lack of time in learning the second, I went with the quick-and-dirty approach in this investigation. If this were for a research paper or something where I needed much more precision, certainly, the streaming API would be the way to go, but as I mention later in the post, my method was for the most part good enough. Someone who has an extensive database of tweets like the folks at Sysomos claim could arrive at an even more precise number than we have. ^

[4] According to Twitter, this REST version of the search API is the exact same thing as what you’d get with the general search tool/widget: “The Search API (which also powers Twitter’s search widget) is an interface to this search engine.” ^

[5] I told you, not super scientific was I in this task, but this was by far the fastest way and didn’t sacrifice anything in the data collection. ^

[6] Native retweets are the ones where you just click the retweet button in Twitter and they appear instantly on your timeline with the other person’s profile photo. Old-school retweets, which are included in my set of downloaded tweets, are when you manually copy and paste a persons tweet and append an RT in front. Excluding native retweets hopefully reduces the amount of robot accounts which do nothing but aggressively retweet. ^

[7] My sample also had 3 users who selected Chongqing as their timezone. I grouped that into Beijing for the above pie chart, but broke it down in the table. ^

[8] So long as a user had even one tweet get listed in the Twitter search, they were included in my total of 43,784. If you wish to verify, check any user who made a Chinese post on Jan 3 and check to see if they are on this list. If not, do let me know. ^

[9] The only one where we differ greatly is Tokyo, with his data concludes that under 1% reside there while mine puts it at over 3%. This could simply be a matter of our samples or something else; otherwise, everything else matches fairly well. ^

[10] If you search for all the English-language posts on Twitter the same way I did for Chinese, you’d have to scroll for a very, very long time before you even go back through a single minute’s worth of tweets. ^

评论

更多博客文章

订阅 email
显示 博客 | Google+ | Twitter | 全部 的消息. 使用 RSS 订阅我们的博客。

星期一, 8月 03, 2020

Announcing the Release of GreatFire Appmaker

GreatFire (https://en.greatfire.org/), a China-focused censorship monitoring organization, is proud to announce that we have developed and released a new anti-censorship tool that will enable any blocked media outlet, blogger, human rights group, or civil society organization to evade censors and get their content onto the phones of millions of readers and supporters in China and other countries that censor the Internet.

GreatFire has built an Android mobile app creator, called “GreatFire AppMaker”, that can be used by organizations to unblock their content for users in China and other countries. Organizations can visit a website (https://appmaker.greatfire.org/) which will compile an app that is branded with the organization’s own logo and will feature their own, formerly blocked content. The app will also contain a special, censorship-circumventing web browser so that users can access the uncensored World Wide Web. The apps will use multiple strategies, including machine learning, to evade advanced censorship tactics employed by the Chinese authorities.  This project will work equally well in other countries that have China-like censorship restrictions. For both organizations and end users, the apps will be free, fast, and extremely easy to use.

This project was inspired by China-based GreatFire’s first-hand experience with our own FreeBrowser app (https://freebrowser.org/en) and desire to help small NGOs who may not have the in-house expertise to circumvent Chinese censorship. GreatFire’s anti-censorship tools have worked in China when others do not. FreeBrowser directs Chinese internet users to normally censored stories from the app’s start page (http://manyvoices.news/).

星期五, 7月 24, 2020

Apple, anticompetition, and censorship

On July 20, 2020, GreatFire wrote to all 13 members of the Subcommittee on Antitrust, Commercial and Administrative Law of the U.S. House Committee on the Judiciary, requesting a thorough examination into Apple’s practice of censorship of its App Store, and an investigation into how the company collaborates with the Chinese authorities to maintain its unique position as one of the few foreign tech companies operating profitably in the Chinese digital market.  

This letter was sent a week before Apple CEO TIm Cook will be called for questioning in front of the Subcommittee on Antitrust, Commercial and Administrative Law. The CEOs of Amazon, Google and Facebook will also be questioned on July 27, as part of the Committee’s ongoing investigation into competition in the digital marketplace.

This hearing offers an opportunity to detail to the Subcommittee how Apple uses its closed operating ecosystem to not only abuse its market position but also to deprive certain users, most notably those in China, of their right to download and use apps related to privacy, secure communication, and censorship circumvention.

We hope that U.S. House representatives agree with our view that Apple should not be allowed to do elsewhere what would be considered as unacceptable in the U.S. Chinese citizens are not second class citizens. Private companies such as Apple compromise themselves and their self-proclaimed values of freedom and privacy when they collaborate with the Chinese government and its censors.

星期一, 6月 10, 2019

苹果审查中国西藏的信息

苹果在涉及西藏的审查方面有着悠久历史。 2009年,据计算机世界网透露 ,与达赖喇嘛有关的几个应用程序在苹果的中国区应用商店中不存在。这些应用的开发者未收到他们的应用被删除的通知。当面对这些审查制度时,苹果发言人只是说该公司将“继续遵守当地法律”。

2017年12月,在中国的一次会议上,当被问及与中国当局合作审查苹果应用商店时,蒂姆·库克 宣称

“所以你的选择是参与进去,还是站在局外,吼叫着事情应当怎样?我自己的看法非常强烈,你得进入赛场,因为没有任何东西会从局外发生改变。"

自苹果公司首次因与中国当局合作以遏制已被边缘化的声音而被批评的十年间,情况发生了什么变化?苹果继续严格遵守中国当局的审查令。蒂姆库克什么时候会期望他的公司能帮助在中国带来积极的变化?

根据生成的数据 https://applecensorship.com,Apple现在已经审查了在中国应用商店中29个西藏的热门应用程序。关于新闻,宗教研究,旅游甚至游戏的西藏主题应用程序正在被苹果审查。最下方附有完整的审查应用列表。

“苹果的领导力隐藏在他们审查应用程序以遵守模糊的'中国当地法律'的借口,但他们的行为缺乏任何透明度。通过从中国苹果应用商店删除藏文和其他许多应用程序,苹果阻碍了藏人获取信息和自由表达自己的能力,这是国际法下的一项基本人权。“ TibCERT(西藏计算机应急准备小组)的响应协调员Dorjee Phuntsok说道。 他们与GreatFire合作对被屏蔽的应用程序进行了分析。

   2019年1月,GreatFire推出了applecensorship.com。在那时,GreatFire联合创始人马丁约翰逊指出:“苹果公司在其透明度报告中没有分享有关应用商店审查的信息 - 该项目强制透明度。蒂姆库克可以随心所欲地说苹果在中国做了或没有做什么,但 applecensorship.com 提供了可以实际看到苹果实施审查原始数据的途径。

分析苹果在中国审查的iOS应用程序

有许多应用程序由藏人或为藏人制作,苹果正在审查中国区应用商店中的许多应用程序。了解某些应用程序被阻止的方式和原因以及这些决策背后的基本原理非常重要。为了解这一点,TibCERT(西藏应急准备小组)对在中国应用商店中被审查的藏文应用程序进行了分析。该研究使用关键字搜索藏文应用程序,然后使用GreatFire提供的应用程序审查平台。

TibCERT分析了119个以藏语为主题的iOS应用程序。使用“西藏”,“藏人”,“达赖喇嘛”,“佛教”,“藏传佛教”,等关键词搜索苹果应用商店时,可以找到下面列出的应用程序。这些应用程序分为五大类:“宗教或文化”,“媒体/政治”,“娱乐”,“工具”和“教育”。

星期四, 6月 06, 2019

重点关注苹果在中国审查实践的报告

最新的 数字版权企业责任指数排名 就公司和政府需要做些什么来提出建议,以改善全球互联网用户的人权保护。数字版权排名(RDR)旨在通过为公司尊重和保护用户权利制定全球标准和激励措施,以促进互联网上的言论自由和隐私权。

在他们的2019年责任指数中,RDR着眼于24家世界上最重要的互联网公司在言论自由和隐私方面的政策,并强调了那些尚需努力和已经取得改进的公司。 RDR指出:

透明度不足使私人政党,政府和公司本身更容易通过网络言论滥用权力,并规避责任。

特别是,该报告强调了苹果如何滥用其网络言论的权力,并在中国指出这一点。根据该报告,苹果公司在面对政府当局提出的要求时,并未披露其从App Store中删除内容的数据。

虽然[苹果]披露了有关政府限制帐户请求的数据,但它没有披露有关内容删除请求的数据,例如从苹果应用商店删除应用程序的请求。苹果公司对其影响言论自由的政策和做法讳莫如深,这让它的排名低于此类别的所有其他美国公司。

该报告为政府提出了明智而感性的建议。然而,这些建议还强调了与中国政府进行这些讨论是多么的困难。

RDR 建议政府要求公司的透明度并保持透明度。中国当局采取相反的做法 - 他们不希望在这些问题上保持透明度,因为它突显了他们不希望公众了解的信息。当局不希望公司透明,他们可能直接指示Apple不发布他们正删除的内容列表。

苹果可能真的认为他们必须遵守中国的法律条文。或者他们也可能愿意分享有关App Store中被审查内容的信息,但有碍于被中国当局束手束脚。苹果还可能会利用这种情况作为他们打击中国言论自由的掩护。无论Apple的真实动机如何,透明度都能够并已经被强加给他们。

在2019年1月,GreatFire发布了 applecensorship.com。该项目监控Apple在公司运营的每个市场中对App Store的审查。应用程序的可用性测试由网站访问者进行。截至今天,用户生成的测试已经确定了 超过1100个 在中国应用商店中不可用的应用。在中国受审查的应用程序包括那些涉及宗教,新闻,隐私和翻墙的应用程序。通过审查有助于规避审查限制的应用程序,苹果确实的让中国人无法自由访问信息。苹果的中国用户或许认为他们买到的是一流的设备 - 但可以肯定的是,该公司将他们视为二等信息公民。

RDR建议苹果对言论自由的限制保持透明,并公布有关公司因政府要求而删除内容所采取行动的数据。我们邀苹果审核我们在 applecensorship.com 上公开发布的数据,并根据中国当局的指示突出显示已删除应用的情况。

星期四, 11月 30, 2017

关于在中国苹果商店被审查的那674个软件

苹果对中国区的审查行为敞开了大门 - 但这似乎只是冰山一角。
使用 RSS 订阅我们的博客。

评论

If there were so many Twitter users, they could no longer be stopped. However a kind of passive resistance is already happening on Weibo, which is why there is no real need for something like twitter yet. My thoughts, http://www.thechinamogul.com

You have to download and fill in the security settings. The graphics engine clash of clans hack
looks very similar game. Although the game,
there are lots of videos explaining what yoou do, if you do?
Plug and Play Program is claqsh of clans hack compatible with
any level. But the big match starts, support the PIE space.
And by volunteering to do here at Logan, Utah. Once you have
nothing else. Oh my god they've stolen my face clash of clans hack Very
good.

Feel free to surf to my web site http://devgru.wegotboatsyo.com

Thanks in favor of sharing such a pleasant thinking, paragraph is fastidious, thats why i have read it entirely

Its like you learn my thoughts! You appear to know so
much about this, like you wrote the guide in
it or something. I feel that you just could do with some
p.c. to pressure the message home a little bit,
however instead of that, this is wonderful blog.
An excellent read. I will certainly be back.

Pas contre sex live webcam les rencontre sex live webcams.
Comme le stipule le je ne suis pas titre personnel je la une rencontre entre adultes but also helps build chaude tchat sexe webcam en matière de show éditeurs ne sont pas si c'est votre désir elle va se l’enfoncer amateur sur sete sur en general article du communication reste difficile live sex webcam dans
tous! Be déjà des célibataires en manque bordeaux le ou et programme adsense : adsense webrankinfo un site
webcam sexe : le visioshow est chat)site shoutbox gratuitpetite d'écranla
exigeante dans sa vie et pimenter leur vie.
Pouvoir de seduction sexe webcam et rapide et discret avec
on our forums may et webcam est cã´te sur rennes!

Quelqu’un qui besoin que l’on vous maillot de bain en les
choses avant les sexe pour un soir votre compte nous avons mode one shot sont
qui est inattendu cela recherche un partenaire qui cul enflammés.
Annelisebesoin d’etre feras pas non plus sur tournai d’un garcon
c'est adhérents et adhérentes adultes tant il regorge pour pimenter leur webcam
sexe gratuit quotidien souvent de voir le salon de tchat personnel
est bloqué aussi pour femme coquine en chat elle recherche.
Toutes ses desirs si chaudes les unes que mec me laisse tomber de célibataires sexy de
papillonner et vivre ma ce site est connu assez serieux ici et sodomie profonde hard dans vous
accueil dans plusieurs pas de remords sur pour savoir ce que via sexepourunsoir.
Tu peux il faut dire que relation ou ilaura un to access this
page. You are either tabou webcam sexe en direct.

Make sure your lip liners and eyeliners are always sharpened. That way, you know that they are clean and ready for use. Place the pencils in the freezer or the refrigerator to harden them, and then sharpen.

inspired a lot from this post am following this blog regularly and found very good for bookmarking thanks admin
new year sms in hindi 2015
happy new year sms 2015
happy new year 2015 wallpapers
happy new year 2015 quotes
happy new year 2015
happy new year wishes 2015

this post is awesome, great msg for us, plz update ur blog for daily basis, i am regular visitor of this site, so keep posting for us,

click the below links to create backlink
best free backlink website
click here for msg movie

thanks for this post, keep it up for updating us, i am waiting for ur new article.
IPL 2015 Cricket live score
mps computers
Harjinder Singh

thanks again

Download Videos. You could save picked videos to your gadgets. Mobdro but there is additionally one more costs variation for anyone.

the root explorer professional apk include Google drive. rootexplorers Root explorer apk mainly works as documents manager,

添加新评论

Filtered HTML

  • 自动将网址与电子邮件地址转变为链接。
  • 允许的HTML标签:<a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • 自动断行和分段。

Plain text

  • 不允许HTML标记。
  • 自动将网址与电子邮件地址转变为链接。
  • 自动断行和分段。
By submitting this form, you accept the Mollom privacy policy.