I’ve just submitted a revision of a paper I wrote in early September for inclusion as part of an academic book. It’s an attempt to provide a basic overview of blogging – and specifically bridgeblogging – around the world and suggest some questions other scholars may want to consider as we start trying to figure out how blogging works in communities other than universe of US-centric political punditry blogs.
(The draft of the paper is here, for anyone who’s curious, in HTML and PDF. It’s not accepted for publication yet, so please ask me first if you want to cite it in an academic context. Blogging it is fine, though there’s at least one more major revision to it coming…)
In the process, I’m rediscovering one of the huge reasons I decided not to become a “serious” academic – i.e., why I decided not to pursue a PhD and focus on publishing in peer-reviewed academic journals: the stuff that I’m interested in is moving so fast that it’s very hard to write about in an academically-compatible way.
This isn’t the fault of the editors of this book – blogger/academics Henry Farrell and Daniel Drezner – it’s simply a function the amount of time it takes to compile a set of academic papers, have them peer-reviewed and edited into a larger text, typeset and published. An aggressive timetable has a book going from inception to publication in a year. (And, indeed, the review process just saved me from making a huge error in the draft I just submitted – more on that below.)
The problem: when you’re writing about blogging, an assertion you make may be out of date in three months. Here’s something I wrote in September:
“Weblog search engine Technorati maintains a rankings page which lists the top 100 weblogs, as determined by total incoming links over that blog’s lifespan. This listing is fairly static, as the least popular page in the top hundred has 2,430 incoming links – the dozen or fewer new links created each day tends not to reorder rankings radically. (By contrast, Blogpulse’s top 40 page, which ranks blogs based on incoming links discovered in the past 30 days, tends to be significantly more dynamic.) Blogs listed on Technorati’s top 100 generally have built up a substantial audience over the course of months or years.”
This, as it happens, is (now) totally untrue. I double-checked the data from September three months later, only four of the fifteen blogs I was watching – non-English or non-US blogs in the Technorati top 100 – were still in the top 100. While 11 of the fifteen blogs I was watching fell out, 20 new non-English or clearly non-US blogs entered the top 100 when I checked last week.
What happened was that Technorati changed its algorithm for generating the top 100 during the three months in question. (Thanks, Henry, for pointing this out in reading my draft.) The previous algorithm calculated ranks based on total incoming links over the lifetime of the blog. The new algorithm calculates ranks based on links in the past six months. The previous algorithm tended to favor long-established blogs, while the new one favors blogs that have been popular in the recent past.
(This time I was smart enough to save a whole data set, not just the non-US/English blogs, which means in a few months I’ll be able to report on what percent of the top 100 changes in a three-month period in the new algorithm.)
Something else happened as well. The Chinese showed up.
The new rankings, which feature recently popular blogs, have some similarities – especially at the top of the rankings. But they’re radically different outside the top 20. Of those 20 new blogs in the top 100, 11 are in Chinese. In the September figures, three of the 14 blogs (21%) I was watching were in Chinese (tied with Portuguese as the best-represented non-English language in the top 100). In December, none of the Portuguese blogs are ranked and 60% of the non-US/non-English blogs are in Chinese.
Basically, the Chinese language blogosphere appears to be exploding in popularity. And the 12 Chinese blogs listed in Technorati’s top 100 may be just the tip of the iceberg.
All 12 of the Chinese blogs are hosted by MSN. This isn’t entirely surprising – research by my friend Matthew Hurst on pingserver data suggested that a huge percentage of total blog posts are coming from MSN and that a substantial percentage of MSN Spaces blogs are being written by people in China. Using data from a paper Matthew is publishing in a few weeks, I estimate that MSN is hosting a minimum of 2m Chinese language blogs, including Chinese and Taiwanese bloggers. That’s an amazing figure, as Technorati and Blogpulse each index roughly 20 million blogs in total – MSN’s Chinese-language blogs alone might represent 10% of the blogosphere.
Matthew’s analysis draws from pings sent to the weblogs.com server, where many weblogging platforms send pings whenever a blog is updated. “Many weblogging platforms” doesn’t appear to include Bokee or Blogbus, two of the most popular weblogging platforms in China. I just retrieved an hour’s worth of data from weblogs.com and did some simple searches. Of the 80,880 pings registered in that hour, MSN spaces blogs registered 9612 pings, Blogspot registered 6393 and Typepad registered 102… Neither Blogbus.com or Bokee.com registered a single blog.
Services like Technorati and Blogpulse have started sharing ping data with one another through an initiative called “FeedMesh” – if a ping is received by one of their pingservers, it gets shared with other weblog search engines. But it’s unclear to me whether any of these pingservers are seeing pings from Bokee, one of China’s largest blog hosting providers. (Recent statistics from blo.gs, a major pingserver in the FeedMesh project, doesn’t list any Chinese blog hosts other than MSN in their list of top 20 bloghosts seen on their pingserver.)
A search for the string “spaces.msn.com” on Technorati yields over 2.1 million links. Searches for “bokee.com” yields less than 3,000 links… and they all appear to be on MSN Spaces blogs, which are linking to Bokee blogs. A search for “blogbus.com” yields roughly 62,000 links, most from Blogbus blogs, but other prominent Chinese hosting sites, like Tianyablog.com or Blogcn.com, only appear on MSN Spaces blogs. (A search for “blogcn.com” gives a sense for just how big that site might be – despite apparently not indexing blogcn.com blogs, Technorati returns over 50,000 links for the search.)
A recent (and controversial) study by Baidu, a leading Chinese search engine, claims that there are 37 million blogs in China, maintained by 16 million bloggers. MSN Spaces is one of 658 blog hosting companies in China, and of the top ten companies cited in the Baidu study, at most 4 are well indexed by Technorati. (MSN is very well indexed, and Blogbus appears to be well indexed. MBlogger.cn turns up some posts, though a large number of errors, and cnblogs.com turns up a small number of posts.)
My point is not to beat up on Technorati, which has made great strides in indexing blogs in different character sets recently, but to point out that the 12% of the blogs in Technorati’s top 100 list may be an undercount. If there are some prominent, well-linked blogs on other major Chinese bloghosting providers (and it seems likely that there would be), there might be 20 or more Chinese blogs in the Technorati top 100.
This raises any number of interesting questions for folks like me who spend our free time making sweeping generalizations about “the blogosphere”. The next time I write something like “An easy route to popularity in the blogosphere is to write about US politics or about technology”, please feel free to respond by asking, “Just what blogosphere are you talking about?”
It’s not clear to me that it makes sense to use the term “the blogopshere” when talking about the set of Chinese and English bloggers. While there are some efforts out there to make the Chinese blogosphere more understandable to English speakers (notably EastSouthWestNorth) and vice versa, when I look at the links to these prominent Chinese-language blogs, they’re all in Chinese. (And when I look for links to prominent English sites, very, very few are in Chinese.) If Chinese blogservers aren’t sending pings to the same servers – and consequently aren’t getting indexed by the same search engines – we can’t even say that we’re using a common toolset.
Researchers hoping to make broad statements about weblogs are going to have to start getting profoundly polylingual. The new Technorati top 100 features several blogs in Japanese (including, intriguingly, some celebrity blogs – one by a swimsuit model, another by the catcher of the Yakult Swallows), a Spanish geek blog, an Italian political comedian, and a German journalism blog.
Monolingual idiot that I am, my best bet appears to be to hang out as much as possible with Rebecca MacKinnon, who’s fluent in Mandarin from her years in China. Thanks to her, I have an eighth of a clue about what these top Chinese blogs are about. The majority seem to be trading tips, tricks and software tools that help bloggers customize their MSN Spaces blogs.
Bloggers blogging about blogging? How very 2003!
Is the presence of all these Chinese blogs in the Technorati top 100 making English-speaking bloggers about what’s going on in the Chinese blogosphere? I remember a wonderful post by Kevin Marks, where he remarks on his frustration at not being able to understand the Persian-language weblog that used to be in Technorati’s top 100:
However, I look at blogs like this and feel like Ginger in Gary Larson’s classic What Dogs Hear.
’squiggle squiggle squiggle Blog squiggle squiggle squiggle Permalink’
Unfortunately, Google’s Chinese/English translation engine isn’t much help with those squiggles. Translating the most popular Chinese language page in the Top 100 into English begins with the lovely phrase “The hot spot pays attention to Msn splendid space wind and cloud announcement”. I’m guessing I might be missing something.
Okay, the translation gets better when you get into the body of the post. And it’s cool that locker2man is reading Slashdot and translating articles for his readers. But how much cooler would it be if someone were translating him into English as well? What do we miss if the blogosphere gets more and more culturally and linguistically diverse and idiots like me can only read a small fraction of what’s out there?