My Heart's in Accra

Ethan Zuckerman's musings on Africa, international development
and hacking the media.

02/26/2010 (8:05 pm)

Snowy journey, with journalism

Filed under: Media ::

I got stuck on a train between New York and Albany last night as a storm pounded the Hudson Valley. Fortunately, Amtrak’s warnings about service on the Empire Line were sufficiently dire that I used my layover in Penn Station to cache supplies: pretzels, diet pepsi, and enough reading material to keep me entertained until we arrived (which didn’t happen until almost 3am.)

Fortunately, I’d stumbled onto Conor Friedersdorf’s extraordinary The Best of Journalism (2009) through Metafilter. I’m not usually a huge fan of “best of” lists – they tend to be group curated and often reflect more about the group’s composition than any underlying characteristics of the works chosen. But this is an idiosyncratic, personal list, and it’s clear from some of Friedersdorf’s choices that we’ve got some tastes in common. (For one thing, he’s a big enough This American Life fanboy that he may already have downloaded the new Adam WarRock track, “That’s So Ira Glass“) I found enough good reads in the stories I downloaded that I’ll now try anything that Friedersdorf recommends on his Twitter feed, JournoCurator.

The gems of last night’s reading:

- I’d read about the disturbing evidence that repeated head traumas suffered by football players were leading to a disproportionate number of disabled and suicidal former NFL stars, especially offensive lineman, and that a new syndrome – Chronic Traumatic Encephalopathy – had been diagnosed. (The New York Times has had excellent coverage on the topic from Alan Schwarz.) But I hadn’t heard about the brilliant, persistent young Nigerian neuropathologist who tracked down the story and didn’t give up, despite a flood of efforts to insult, ostracize and generally marginalize his work. Give it up for Dr. Bennet Omalu, and for Jeanne Marie Laskas for telling his story in “Game Brain” for GQ.

- Mark Groubert pokes through a pile of abandoned possessions on an LA streetcorner and finds himself drawn into the mysteries of the man who left them behind. The story is intrusive, intimate, somewhat transgressive and very moving. Groubert watches four discarded DVDs which include home movies and follows a young Frenchman from Christmas in a Paris suburb in the 1970s through adolescence, a move to Los Angeles, a descent into depression and drug addiction. Using the clues found in the “Box of Broken Dreams“, he identifies the man and interviews him about his decision to leave LA. I alternated between being uncomfortable with Groubert’s voyeurism and moved by the narrative he uncovered, more or less the same set of emotions raised by “The House on Loon Lake“, a beautiful This American Life that begins with teenagers trespassing in an abandoned house and ends with an odd sort of family reunion. (Both pieces have multimedia components – House on Loon Lake features a wonderful set of sepia photographs, while the LA story includes a video trailer.)

- I like Michael Lewis. I really like Iceland. (My wife and I spent our honeymoon there, and I often fly to Europe via Reykjavik, hoping to get stuck for a day through a missed flight.) I’m not sure what I think of Lewis’s take on the Icelandic financial crisis – and its connection to gender – but it was a fascinating read.

Lewis travels to Iceland to untangle the global banking and real estate crisis from a country that’s been hit far harder than the US has. Parts of his diagnosis are fascinating and compelling: by figuring out how to private its fish stocks, Iceland securitized cod and turned fishermen into financiers. Other aspects seem a bit simplistic and, perhaps, too neatly rooted in a couple of unpleasant encounters he had with overly aggressive Icelandic men: basically, he concludes that the crisis was the result of testosterone-laden, culturally isolated, naïve guys convincing themselves that they had a god-given talent for finance. One wonders whether he’d have reached different conclusions had he run into one less drunken idiot.

There’s half a dozen articles I still need to get through and others in Friedersdorf’s recommendations that I liked, but not enough to write up. It was a terrific reminder of the values of good curation in the sea of online writing.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

02/22/2010 (7:27 pm)

Internet Freedom: Beyond Circumvention

Secretary Clinton’s recent speech on Internet Freedom has signaled a strong interest from the US State Department in promoting the use of the internet to promote political reforms in closed societies. It makes sense that the State Department would look to support existing projects to circumvent internet censorship. The New York Times reports that a group of senators is urging the Secretary to apply existing funding to support the development and expansion of censorship circumvention programs, including Tor, Psiphon and Freegate.

I’ve spent a good part of the last couple of years studying internet circumvention systems. My colleagues Hal Roberts, John Palfrey and I released a study last year that compared the strengths and weaknesses of different circumvention tools. Some of my work at Berkman is funded by a US state department grant that focuses on continuing to study and evaluate these sorts of tools and I spend a lot of time trying to coordinate efforts between tool developers and people who need access to circumvention tools to publish sensitive content.

I strongly believe that we need strong, anonymized and useable censorship circumvention tools. But I also believe that we need lots more than censorship circumvention tools, and I fear that both funders and technologists may overfocus on this one particular aspect of internet freedom at the expense of other avenues. I wonder whether we’re looking closely enough at the fundamental limitations of circumvention as a strategy and asking ourselves what we’re hoping internet freedom will do for users in closed societies.

So here’s a provocation: We can’t circumvent our way around internet censorship.

I don’t mean that internet censorship circumvention systems don’t work. They do – our research tested several popular circumvention tools in censored nations and discovered that most can retrieve blocked content from behind the Chinese firewall or a similar system. (There are problems with privacy, data leakage, the rendering of certain types of content, and particularly with usability and performance, but the systems can circumvent censorship.) What I mean is this – we couldn’t afford to scale today’s existing circumvention tools to “liberate” all of China’s internet users even if they all wanted to be liberated.

Circumvention systems share a basic mode of operation – they act as proxies to let you retrieve blocked content. A user is blocked from accessing a website by her ISP or that ISP’s ISP. She wants to read a page from Human Rights Watch’s webserver, which is accessible at IP address 70.32.76.212. But that IP address is on a national blacklist, and she’s prevented from receiving any content from it. So she points her browser to a proxy server at another address – say 123.45.67.89 – and asks a program on that server to retrieve a page from the HRW server. Assuming that 123.45.67.89 isn’t on the national blacklist, she should be able to receive the HRW page via the proxy.

During the transaction, the proxy is acting like an internet service provider. Its ability to provide reliable service to its users is constrained by bandwidth – bandwidth to access the destination site and to deliver the content to the proxy user. Bandwidth is costly in aggregate, and it costs real money to run a proxy that’s heavily used.

Some systems have tried to reduce these costs by asking volunteers to share them – Psiphon, in its first design, used home computers hosted by volunteers around the world as proxies, and used their consumer bandwidth to access the public internet. Unfortunately, in many countries, consumer internet connections are optimized to download content and are much slower when they are uploading content. These proxies could get the homepage at hrw.org pretty quickly, but they took a very long time to deliver the page to the user behind the firewall. Psiphon is no longer primarily focused on trying to make proxies hosted by volunteers work. Tor is, but Tor nodes are frequently hosted by universities and companies who have access to large pools of bandwidth. Still, available bandwidth is a major constraint to the usability of the Tor system. The most usable circumvention systems today – VPN tools like Relakks or Witopia – charge users significant sums annually to defray bandwidth costs.

Let’s assume that systems like Tor, Psiphon and Freegate receive additional funding from the State Department. How much would it cost to provide proxy internet access for… well, China? China reports 384 million internet users, meaning we’re talking about running an ISP capable of serving more than 25 times as many users as the largest US ISP. According to CNNIC, China consumes 866,367 Mbps of international internet bandwidth. It’s hard to get estimates for what ISPs pay for bandwidth, though conventional wisdom suggests prices between $0.05 and $0.10 per gigabyte. Using $0.05 as a cost per gigabyte, the cost to serve the Internet to China would be $13,608,000 per month, $163.3 million a year in pure bandwidth charges, not counting the costs of proxy servers, routers, system administrators, customer service. Faced with a bill of that magnitude, the $45 million US senators are asking Clinton to spend quickly looks pretty paltry.

There’s an additional complication – we’re not just talking about running an ISP – we’re talking about running an ISP that’s very likely to be abused by bad actors. Spammers, fraudsters and other internet criminals use proxy servers to conduct their activities, both to protect their identities and to avoid systems on free webmail providers, for instance, which prevent users from signing up for dozens of accounts by limiting an IP address to a certain number of signups in a limited time period. Wikipedia found that many users used open proxies to deface their system and now reserve the right to block proxy users from editing pages. Proxy operators have a tough balancing act – for their proxies to be useful, people need to be able to use them to access sites like Wikipedia or YouTube… but if people use those proxies to abuse those sites, the proxy will be blocked. As such, proxy operators can find themselves at war with their own users, trying to ban bad actors to keep the tool useful for the rest of the users.

I’m skeptical that the US State Department can or wants to build or fund a free ISP that can be used by millions of simultaneous users, many of whom may be using it to commit clickfraud or send spam. I know – because I’ve talked with many of them – that the people who fund blocking-resistant internet proxies don’t think of what they’re doing in these terms. Instead, they assume that proxies are used by users only in special circumstances, to access blocked content.

Here’s the problem. A nation like China is blocking a lot of content. As Donnie Dong notes in a recent blogpost, five of the ten most popular websites worldwide are blocked in China. Those sites include YouTube and Facebook, sites that eat bandwidth through large downloads and long sessions. Perhaps it would be realistic to act as an ISP to China if we were just providing access to Human Rights Watch – it’s not realistic if we’re providing access to YouTube.

Proxy operators have dealt with this question by putting constraints on the use of their tools. Some proxy operators block access to YouTube because it’s such a bandwidth hog. Others block access to pornography, both because it uses bandwidth and to protect the sensibilities of their sponsors. Others constrain who can use their tools, limiting access to the tools to people coming from Iranian or Chinese IPs, trying to reduce bandwidth use by American high school kids who’ve got YouTube blocked by their school. In deciding who or what to block, proxy operators are offering their personal answers to a complicated question: What parts of the internet are we trying to open up to people in closed societies? As we’ll address in a moment, that’s not such an easy question to answer.

Let’s imagine for a moment that we could afford to proxy China, Iran, Myanmar and others’ international traffic. We figure out how to keep these proxies unblocked and accessible (it’s not easy – the operators of heavily used proxy systems are engaged in a fast-moving cat and mouse game) and we determine how to mitigate the abuse challenges presented by open proxies. We’ve still got problems.

Most internet traffic is domestic. In China, we estimate (Hal’s got a paper coming out shortly) that roughly 95% of total traffic is within the country. Domestic censorship matters a great deal, and perhaps a great deal more than censorship at national borders. As Rebecca MacKinnon documented in “China’s Censorship 2.0“, Chinese companies censor user-generated content in a complex, decentralized way. As a result, a good deal of controversial material is never published in the first place, either because it’s blocked from publication or because authors decline to publish it for fear of having their blog account locked or cancelled. We might assume that if Chinese users had unfettered access to Blogger, they’d publish there. Perhaps not – people use the tools that are easiest to use and that their friends use. A seasoned Chinese dissident might use Blogger, knowing she’s likely to be censored – an average user, posting photos of his cat, would more likely use a domestic platform and not consider the possibility of censorship until he found himself posting controversial content.

In promoting internet freedom, we need to consider strategies to overcome censorship inside closed societies. We also need to address “soft censorship”, the co-opting of online public spaces by authoritarian regimes, who sponsor pro-government bloggers, seed sympathetic message board threads, and pay for sympathetic comments. (Evgeny Morozov offers a thoroughly dark view of authoritarian use of social media in How Dictators Watch Us On The Web.)

We also need to address a growing menace to online speech – attacks on sites that host controversial speech. When Turkey blocks YouTube to prevent Turkish citizens from seeing videos that defame Ataturk, they prevent 20 million Turkish internet users from seeing the content. When someone – the Myanmar government, patriotic Burmese, mischievous hackers – mount a distributed denial of service attack on Irrawaddy (an online newspaper highly critical of the Myanmar government), they (temporarily) prevent everyone from seeing it.

Circumvention tools help Turks who want to see YouTube get around a government block. But they don’t help Americans, Chinese or Burmese see Irrawaddy if the site has been taken down by DDoS or hacking attacks. Publishers of controversial online content have begun to realize that they’re not just going to face censorship by national filtering systems – they’re going to face a variety of technical and legal attacks that seek to make their servers inaccessible.

There’s quite a bit publishers can do to increase the resilience of their sites to DDoS attack and to make their sites more difficult to filter. To avoid blockage in Turkey, YouTube could increase the number of IP addresses that lead to the webserver and use a technique called “fast-flux DNS” to give the Turkish government more IP addresses to block. They could maintain a mailing list to alert users to unblocked IP addresses where they could access YouTube, or create a custom application which disseminates unblocked IPs to YouTube users who download the ap. These are all techniques employed by content sites that are frequently blocked in closed societies.

YouTube doesn’t take these anti-blocking measures for at least two reasons. One, they’ve generally preferred to negotiate with nations who filter the internet to try to make their sites reachable again than to work against them by fighting filtering. (This attitude may be changing now that Google has announced their intention not to cooperate with Chinese censorship.) Second, YouTube doesn’t really have an economic incentive to be unblocked in Turkey. If anything, being blocked in Turkey (and perhaps even in China) may be to their economic advantage.

Sites that enable user-created content are supported by advertising traffic. Advertisers are generally more excited about reaching users in the US (who’ve got credit cards, more disposable income and are inclined to buy online) than users in China or Turkey. Some suspect that the introduction of “lite” versions of services like Facebook are designed to serve users in the developing world at lower cost, since those users rarely create income. In economic terms, it may be hard to convince Facebook, YouTube and others to continue providing services to closed societies, where they have a tough time selling ads. And we may need to ask more of them – to take steps to ensure that they remain accessible and useful in censorious countries.

In short:
- Internet circumvention is hard. It’s expensive. It can make it easier for people to send spam and steal identities.
- Circumventing censorship through proxies just gives people access to international content – it doesn’t address domestic censorship, which likely affects the majority of people’s internet behavior.
- Circumventing censorship doesn’t offer a defense against DDoS or other attacks that target a publisher.

To figure out how to promote internet freedom, I believe we need to start addressing the question: “How do we think the Internet changes closed societies?” In other words, do we have a “theory of change” behind our desire to ensure people in Iran, Burma, China, etc. can access the internet? Why do we believe this is a priority for the State Department or for public diplomacy as a whole?

I think much work on internet censorship isn’t motivated by a theory of change – it’s motivated by a deeply-held conviction (one I share) that the ability to share information is a basic human right. Article 19 of the Universal Declaration of Human Rights states that “Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers.” The internet is the most efficient system we’ve ever built to allow people to seek, receive and impart information and ideas, and therefore we need to ensure everyone has unfettered internet access. The problem with the Article 19 approach to censorship circumvention is that it doesn’t help us prioritize. It simply makes it imperative that we solve what may be an unsolvable problem.

If we believe that access to the internet will change closed societies in a particular way, we can prioritize access to those aspects of the internet. Our theory of change helps us figure out what we must provide access to. The four theories I list below are rarely explicitly stated, but I believe they underly much of the work behind censorship circumvention.

The suppressed information theory: if we can provide certain suppressed information to people in closed societies, they’ll rise up and challenge their leaders and usher in a different government. We might choose to call this the “Hungary ‘56 theory” – reports of struggles against communist governments around the world, reported into Hungary via Radio Free Europe, encouraged Hungarians to rebel against their leaders. (Unfortunately, the US didn’t support the revolutionaries militarily – as many in Hungary had expected – and the revolution was brutally quashed by a Soviet invasion.)

I generally term this the “North Korea theory”, because I think a state as closed as North Korea might be a place where un-suppressed information – about the fiscal success of South Korea, for instance – could provoke revolution. (Barbara Demick’s beautiful piece in the New Yorker, “The Good Cook“, gives a sense for how little information most North Koreans have about the outside world and how different the world looks from Seoul.) But even North Korea is less informationally isolated than we think – Dong-A Ilbo reports an “information belt” along the North Korea/China border where calls on smuggled mobile phones are possible from North to South Korea. Other nations are far more open – my friends in China tend to be extremely well informed about both domestic and international politics, both through using circumvention tools and because Chinese media reports a great deal of domestic and international news.

It’s possible that access to information is a necessary, though not sufficient, condition for political revolution. It’s also possible that we overestimate the power and potency of suppressed information, especially as information is so difficult to suppress in a connected age.

The Twitter revolution theory: if citizens in closed societies can use the powerful communications tools made possible by the Internet, they can unite and overthrow their oppressors. This is the theory that led the State Department to urge Twitter to put off a period of scheduled downtime during the Iran elections protests. While it’s hard to make the case that technologies of connection are going to bring down the Iranian government (see Cameron Abadi’s piece in FP on the limitations of using Facebook to organize in Iran), good counterexamples exist, like the role of the mobile phone in helping to topple President Estrada in the Philippines.

There’s been a great deal of enthusiasm in the popular press for the Twitter revolution theory, but careful analysis reveals some limitations. The communications channels opened online tend to be compromised quickly, used for disinformation and for monitoring activists. And when protests get out of hand, governments of closed societies don’t hesitate to pull the plug on networks – China has blocked internet access in Xinjiang for months, and Ethiopia turned off SMS on mobile phone networks for years after they were used to organize street protests.

The public sphere theory: Communication tools may not lead to revolution immediately, but they provide a new rhetorical space where a new generation of leaders can think and speak freely. In the long run, this ability to create a new public sphere, parallel to the one controlled by the state, will empower a new generation of social actors, though perhaps not for many years.

Marc Lynch made a pretty persuasive case for this theory in a talk last year about online activism in the Middle East. It’s possible to make this case by looking at samizdat (self-published, clandestine media) in the former Soviet Union, which was probably more important as a space for free expression than it was as a channel for disseminating suppressed information. The emergence of leader like Vaclav Havel, whose authority was rooted in cultural expression as well as political power, makes the case that simply speaking out is powerful. But the long timescale of this theory makes it hard to test.

The theory we accept shapes our policy decisions. If we believe that disseminating suppressed information is critical – either to the public at large or to a small group of influencers – we might focus our efforts on spreading content from Voice of America or Radio Free Europe. Indeed, this is how many government forays into censorship circumvention began – national news services began supporting circumvention tools so their content (painstakingly created in languages like Burmese or Farsi) would be accessible in closed societies. This is a very efficient approach to anticensorship – we can ignore many of the problems associated with abusing proxies and focus on prioritizing news over other high-bandwidth uses, like the video of the cat flushing the toilet. Unfortunately, we’ve got a long track record that shows that this form of anticensorship doesn’t magically open closed regimes, which suggests that increasing our bet on this strategy might be a poor idea.

If we adopt the Twitter Revolution theory, we should focus on systems that allow for rapid communication within trusted networks. This might mean tools like Twitter or Facebook, but probably means tools like LiveJournal and Yahoo! Groups which gain their utility through exclusivity, allowing small groups to organize outside the gaze of the authorities. If we adopt the public sphere approach, we want to open any technologies that allow public communication and debate – blogs, Twitter, YouTube, and virtually anything else that fits under the banner of Web 2.0.

What does all this mean in terms of how the State Department should allocate their money to promote Internet Freedom? My goal was primarily to outline the questions they should be considering, rather than offering specific prescriptions. But here are some possible implications of these questions:

- We need to continue supporting circumvention efforts, at least in the short term. But we need to disabuse ourselves of the idea that we can “solve” censorship through circumvention. We should support circumvention until we find better technical and policy solutions to censorship, not because we can tear down the Great Firewall by spending more.

- If we want more people using circumvention tools, we need to find ways to make them fiscally sustainable. Sustainable circumvention is becoming an attractive business for some companies – it needs to be part of a comprehensive internet freedom strategy, and we need to develop strategies that are sustainable and provide low/zero cost access to users in closed societies.

- As we continue to fund circumvention, we need to address usage of these tools to send spam, commit fraud and steal personal data. We might do this by relying less on IP addresses as an extensive, fundamental means of regulating bad behavior… but we’ve got to find a solution that protects networks against abuse while maintaining the possibility of anonymity, a difficult balancing act.

- We need to shift our thinking from helping users in closed societies access blocked content to helping publishers reach all audiences. In doing so, we may gain those publishers as a valuable new set of allies as well as opening a new class of technical solutions.

- If our goal is to allow people in closed societies to access an online public sphere, or to use online tools to organize protests, we need to bring the administrators of these tools into the dialog. Secretary Clinton suggests that we make free speech part of the American brand identity – let’s find ways to challenge companies to build blocking resistance into their platforms and to consider internet freedom to be a central part of their business mission. We need to address the fact that making their platforms unblockable has a cost for content hosts and that their business models currently don’t reward them for providing service to these users.

- The US government should treat internet filtering – and more aggressive hacking and DDoS attacks – as a barrier to trade. The US should strongly pressure governments in open societies like Australia and France to resist the temptation to restrict internet access, as their behavior helps China and Iran make the case that their censorship is in line with international norms. And we need to fix US treasury regulations make it difficult and legally ambiguous for companies like Microsoft and projects like SourceForge to operate in closed societies. If we believe in Internet Freedom, a first step needs to be rethinking these policies so they don’t hurt ordinary internet users.

The danger in heeding Secretary Clinton’s call is that we increase our speed, marching in the wrong direction. As we embrace the goal of Internet Freedom, now is the time to ask what we’re hoping to accomplish and to shape our strategy accordingly.

Thanks to Hal Roberts, Janet Haven and Rebecca MacKinnon for help editing and improving this post. They’re responsible for the good parts – you can blame the rest on me.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

02/18/2010 (7:10 pm)

links for 2010-02-18

Filed under: del.icio.us links ::
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

02/17/2010 (3:07 pm)

Asashoryu resigns. Will sumo ever globalize?

Filed under: Sumo, xenophilia ::

Here’s a story I missed while I was out with eye surgery: Asashoryu, one of the greatest sumo wrestlers in history, has retired. And needless to say, for anyone who follows sumo: it’s not quite that simple as that. Indeed, this retirement might lead to an international falling out between Mongolia and Japan. And it provides an opportunity for reflection on the challenges the sumo world – and, perhaps, Japan as a whole – faces in an era of globalization.

Since 2003, Asashoryu – born Dolgorsurengiin Dagvadorj in Ulaanbaatar, Mongolia, has competed in Japanese sumo as a yokozuna, the highest achievable rank in the sport. He’s won 25 tournaments, giving him the third highest win total in the history of the sport, and in 2005, he won each of the six tournaments, an unprecedented feat. Given his success, you might think he’d be celebrated as a pillar of the sumo world. You’d be wrong.

Asashoryu’s got some strikes against him with potential Japanese fans. His rap sheet is almost as long as his list of tournament wins. His “crimes” range from violations of sumo’s strict laws of decorum, to real transgressions. Here’s how I explained his complex image in sumo the last time he trangressed – leading to an unprecedented two-tournament suspension:

Let’s imagine for a moment that you’re Asashoryu, the sole yokozuna in sumo for three and a half years, a near-unbeatable champion of a sport that demands not just physical prowess, but ritual stoicism and dignity. You report an injury from the most recent tournament in Nagoya, where you won your 21st Emperor’s cup, and return to your native Mongolia to recouperate from your injuries. Then you appear in a charity soccer game in Mongolia, apparently well enough to run around on the field. Obviously, you’re a faker, a fraud, a charlatan, who deserves punishment, either by losing your rank (which would mean retirement from the sport) or by being suspended from tournaments.

Okay, now let’s pretend that you’re a 26 year-old Mongolian named Dolgorsuren Dagvadorj. You live and work in Japan, where people loathe you. You’re constantly accused of participating in match fixing, which seems a bit odd as you win almost all your matches – shouldn’t they be accusing your opponents of throwing matches and complaining about their lack of honor? You’re criticized for transgressions real and imagined – being “too aggressive” and “staring too hard” at opponents in a sport that demands that you throw them to the ground or out of the ring, but also for pulling hair and for scraps with fellow wrestlers outside the ring. Your appearance at bars is the subject of constant tabloid headlines. And you’ve got a temper, which complicates matters.

On the other hand, you’re a national hero in your native Mongolia, and – unsurprisingly – you do your best to spend as much time there are possible. Despite recouperating from a back injury, friends ask you to take the field with Japanese soccer star Hidetoshi Nakata at an event designed to promote soccer in Mongolia. When this causes a shitstorm in Japan, the Mongolian embassy formally apologizes on your behalf…

Unfortunately, Asa’s most recent (alleged) transgression was more serious than an ill-advised foortball match. Japanese tabloids report that Asashoryu got quite drunk in a nightclub during the January basho and beat up someone who’s been variously identified as a fellow patron, a nightclub employee, the bartender, the bar owner… Asashoryu hasn’t commented on the incident, except to say that the reports of the incident were “quite different” than what actually occured. Faced with a likely ban from the sport, he resigned and will be allowed a formal retirement ceremony… and will recieve a retirement allowance of over $1m USD.

I was pissed off at the Japan Sumo Association when they suspended Asa for playing football in Mongolia. I’m more sympathetic to their decision here… but I’m deeply saddened. I’m sad not just that I won’t get to see Asa shatter the record for tournament wins (the conspiracy theory in the Mongolian community says that JSA had to find a pretext to eject Asa before he surpassed records held by Japanese yokozuna). I’m sad that sumo and Asa couldn’t find a way to work together to allow the most talented man in the sport to continue a record-setting career.

I don’t pretend to understand all the nuances of sumo decorum, but it always seemed to me that some aspect of Asa’s uneasy status in sumo circles had to do with his strong Mongolian identity. Non-Japanese have been a part of sumo for decades, and some have been embraced by Japanese fans… though generally to the same extent that they embraced Japanese culture. Hakuho, Asashoryu’s primary rival the past few years and fellow yokozuna, is also from Mongolia, but has been far more widely accepted in Japanese sumo circles, perhaps because he’s more soft-spoken and modest, perhaps because he married a Japanese girlfriend (a decision which angered some of his Mongolian fans.)

Geoff Dean has a thoughtful essay that tries to predict the future for Asashoryu. He notes that most retired rikishi look for work in the wider world of sumo: “He can become a stable master, open a sumo restaurant, become a sumo commentator, or in some way, stay connected to the sumo world.” That’s probably not an option for Asa. Instead, he might follow Akebono, a Hawaiian-born yokozuna, into the mixed martial arts and into less-dignified corners of Japanese pop culture. Underlying Dean’s essay is the point that former non-Japanese sumo wrestlers often have a better opportunity to maintain their status and fame by staying in Japan after their sumo careers have ended. It’s hard for me to imagine Asa doing this – I think it’s more likely that he’ll find a way to stay in combat sports while being based in his homeland.

Dean observes that the most recent golden age of sumo occurred when a Japanese yokozuna – Takanohana – faced off against foreign yokozuna Akebono. This could happen again if Kotomitsuki – one of two Japanese ozeki – makes a run for promotion to join Hakuho as yokozuna. (The other Japanese ozeki – Kaio – is older than I am and will retire soon.) But the real story of sumo this past decade has been the rise of foreign rikishi into the highest ranks – Hakuho (Mongolian, yokozuna), Harumafuju (Mongolian, ozeki), Kotooshu (Bulgarian, ozeki), Baruto (Estonian, sekiwake). There are some Japanese sumo fans who aren’t excited about the idea of a Mongolian/Bulgarian rivalry at the top of the sport. I attended the April basho in Tokyo a few years back and was stunned to see fans handing out colorful photos emblazoned with the image of Japanese ozeki Chiyotaikai… but no one handing out anything featuring the higher-ranked yokozuna, Asashoryu.

Writing in Forbes, Tim Kelly sees sumo’s resistance to accepting Asashoryu and other foreign competitors as a symptom of larger problems associated with a closed society: “Japan, like sumo, is closed, preferring to persevere through depopulation and economic stagnation rather than open its borders to the stimulus offered by opportunity-hungry foreigners. What they choose to ignore is that Japan is running out of money, people and ideas.” He makes that case that Japan needs to increase immigration to spur the Japanese economy and cultivate creativity, and suggests a good first step would be to figure out how to get used to controversial outsiders like Asa, rather than expelling him.

I’m not able to make sweeping generalizations about the Japanese economy or offer as strong a prescription as Kelly does for Japanese society. I will say that I’ve been very proud as a Red Sox fan of the way my team and its fanbase have embraced our two Japanese stars, Daisuke Matsuzaka and Hideki Okajima. Shortly after the Sox paid an unbelievable sum of money to negotiate with Matsuzaka, local sportswriters started referring to the new star as “Dice-K”, a nickname designed to help Boston fans correctly pronounce the unfamiliar Japanese name. (I’d love to figure out whether the team started this practice, or whether a clever sportwriter came up with it.) The Red Sox played regular season games in Japan in 2008, and there’s now a third Japanese pitcher – Junichi Tazawa – on the Sox roster. It’s routine to see Sox fans in Fenway sporting Matsuzaka shirts in Fenway with the pitcher’s name written in Hiragana.

Things could have gotten very ugly for Matsuzaka in Boston this past year. He had a lousy season, in part because he showed up for spring training nursing injuries from the World Baseball Classic, where he’d represented Japan and won the MVP trophy (and beat the US in the semifinal round.) Boston was pretty sympathetic, actually – I heard more commentary about the danger of the Baseball Classic for all MLB players than I did specific criticism of Dice-K.

I don’t mean to offer a facile comparison between Boston (which has its own complex history of racism and xenophobia to live down) and Japan and suggest that one’s open and the other closed. What I’ll say instead is that baseball’s become a global sport by embracing players from around the world at its highest level, the MLB. (And not just players – Ecuadorian radio personality Jaime Jarrin is a genuine celebrity in LA as the Spanish-language radio voice of the LA Dodgers.) Sumo could become a global sport by similarly embracing and celebrating this new wave of Asian and European talent. Instead, they’ve banned a pair of Russian wrestlers for alleged drug use and hounded the most talented man in a generation out of the sport. Not a great moment for sumo cosmopolitanism.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

02/16/2010 (11:33 pm)

Jure Leskovec on Memetracker, quantitative media analysis

Filed under: Berkman, Media ::

Jure Leskovec, professor of computer science at Stanford, is interested in what sorts of novel questions we can ask about media now that (much of our) news is on the web. His work on tracking memes by tracking short phrases led to MemeTracker.org, a tool celebrated for allowing a new way of examining media through watching how quotes spread through professional and citizen media. His talk at Berkman today starts with his Meme Tracker work and expands to two exciting computer science questions:

- Can we infer network structure based on examining who mentions certain pieces of information?
- Can we identify the most powerful influencers in networks, the sites most responsible for breaking news?

Jure tells us that he’s interested in the intersection of news media, technology and the political process. Specifically, he’s fascinated by the tension between global effects of mass media and local effects carried by social structure. “How does information transmitted by the media interact with the personal influence networks that arise from people’s social networks?”

These relationships are changing in an era of participatory media. “The dichotomy between global and local influence in evaporating – blogs can have influence both in personal and global media networks.” At the same time, the speed of media reporting and discussion is getting much faster. We sometimes refer to this as the 24-hour news cycle – basically, we’re starting to see a rapid progression of stories with no pauses.

Is there still a “news cycle”, where stories break at regular daily intervals? Jure’s work on Meme Tracker started by asking “What are the basic units of the news cycle?” We might look at the emergence of stories by looking for “cascading hyperlinks to articles” – based on doing some research in this field, Jure feels like this is too “fine-grained”, and suffers from the problem that news media don’t link very often. He rejected as too “coarse-grained” either looking for named entities (Obama is mentioned in the news every day – he’s not a very useful story marker) and defining topics as “probabilistic term mixtures”. Looking for common sequences of words – the appearance and decay of phrases – as too noisy.

To find markers of stories that could correspond to aggregates of articles, that vary over the order of days and can be handled at terabyte scale, Jure looked for quoted phrases. Quotes are an integral part of journalistic practice, and they tend to follow iterations of the story as it evolves. They’re attributed to individuals with a specific time and location, which means they’re very useful in figuring out the starting point for a specific story.

Using data from Spinn3r collected for three months leading up to the 2008 US presidential elections, Jure collected 1 million news articles and blog posts from the 20,000 sites that are part of Google News and 1.6 million blogs that are not. The system picked up roughly 100 million documents, from which he extracted 112 million quoted phrases. (As Jure’s slide puts it, he was looking for “.*”)

Once the system identifies quoted phrases, it’s challenging to figure out whether one quote is a degenerate version of another. Phrases change and mutate quite a bit. Jure shows a slide of Sarah Palin’s quote, “Our opponent though, is someone who sees America it seems as being so imperfect that he’s palling around with terrorists who would target their own country?” and dozens of partial version he found in the wild. Jure’s algorithms create a directed graph of subphrases, adding directional edges that lead from shorter quotes to longer ones, and then weigh and remove edges so that each node has a unique parent. This simplification gives single parentage to phrases, and can partition a graph into different subgraphs for different parent phrases. (Spam, he tells us in response to question, is not a major concern. He did, however, have to stoplist movie and CD titles, which often appeared as quoted phrases.)

A graph of the appearance of new quotes gives the opportunity to consider whether there’s a periodic structure to media. The phrase per hour count oscilates on a weekly basis – there’s less news on the weekend – but there’s no particular global data he’s been able to find. Quote flow is more or less constant.

Picture 1

A widely published image from his study – shown above – is a graph of the fifty largest clusters (in volume terms) over the three months of the study. Some of the phrases – Obama’s statement that “You can’t put lipstick on a pig” – have a huge volume in comparison to the average topic size. Based on typical clusters, Jure is looking for models to represent the shape of story attention. “The peak behaves like a delta fundtion with infinity at t=0,” which is to say, phrases are really short lived and exponential functions aren’t fast enough to model the peak.

Jure tags media sources as being either “news” or “blogs” depending on whether or not they’re reproduced by GOogle News. “News” – i.e., sites in the 20,000 reproduced by Google News – account for 44% of the stories. Based on this partition, Jure compares the peak in attention between news sites and blogs. “Peak blog intensity comes about 2.5 hours after news peak.” There are blogs that are exceptions to this rule – hotair.com, talkingpointsmemo.com, politicalticker.blogs.cnn.com, huffingtonpost.com, digg.com are well ahead of others. These blogs are run by professional bloggers, people who have resources to follow and report stories. He posits a model – professional bloggers sometimes break stories, followed by mainstream media, then followed a couple hours later by casual bloggers. How often do bloggers lead the media? He looked for a signature that this was happening – stories that appeared in the blogosphere before it did in news media – and discovered that, in total, 3.5% of phrases migrated from blogs to media.

While the meme data is useful for understanding some of the dynamics of the relationship between blogs and mainstream media, there are massive open questions about how information really spreads. Jure suggests thinking of the mediasphere as a massive hidden diffusion network. We can see when a node in this network gets “infected” with a new story, but we don’t see the edges between nodes in this graph. Could we trace the actual propogation from one source to another?

Basically, this becomes a tough graph theory problem. We can study a “cascade” of a story by watching how a quote appears over time. We can then make guesses as to whether one source infected another by building probability trees. If timestamps are closely linked – i.e., source i posts a story at 3:44 and source j at 4:12 – the probability is higher that i infected j than if the time gap is a long one. Jure tells us that we need to consider a huge set of possible graphs, each representing the probability of a particular path of infection (idea spread). This problem is difficult to solve, but the matrix tree theorem can poduce a solution in cubic time (O(n^3)). Jure tells us his implementation can find a near-optimal solution. This looks like a map of the mediasphere clustered around topical interest – a strong cluster around US politics, one around gossip and celebrity news, and another strong technology cluster.

If we can infer structure, can we decide what to read if we want to be most up to date? We could imagine one blog that covers lots of topics, but is usually late to the party. Another might have lots of breaking news, but not be especially comprehensive. Given a budget – I only want to read five blogs, for instance – how do we choose which five I should follow to maximize comprehensiveness and timeliness. Again, this is a hard problem, specifically an NP-complete problem. But there’s an algorithm to find an approximate solution, and it’s far better than just choosing blogs at random, or selecting them by their inlinks, outlinks or volume.


From blogcascades.org, which offers information on the algorithm

The goal behind experiments like this one isn’t to assign us blog attention budgets – it’s to build a framework for tracking memes and news as they track over the web. (Regular readers will know that this is an obsession of mine, and the idea behind MediaCloud, which helps explain why I was so excited Jure was coming to Berkman.) There are still lots of open questions which leave Jure with more work to do:

- Which elements of the news cycle are missed by analyzing quotes, rather than other structures?
- How do we identify and analyze polarization in populations who are propogating information?
- How do memes actually spread through groups of people?

Lots of questions from the Berkman crowd:

Q: Could you make the system work running with sentences?
A: Possibly, but there’s lots of noise in the system, and things get noisier the longer time periods we try to consider. We’re currently trying to make it work with tweets, which are short and cleanly timestamped.

Q: Did you come up with a rigorous definition or description of the news cycle?
A: We found some signatures for clusters. One common pattern is a meme that starts small on day one, is sharply amplified on day two, then decays. Another reverses days one and two. And a third just spikes quickly and decays over a long time.

Memetracker is now working with Pew’s Project for Excellence in Journalism, looking at coverage of the economic crisis, and that’s helping study of how media cycles operate.

Q: Are aggregators comprehensive in providing coverage of the mediasphere? Could we just read Google News?
A: Probably not. Our algorithm looks for news sources with little overlap – Google News has lots of overlapping news sources.

Q: So, what was the solution to reading three blogs? What three should we read?
A: Well, based on the data from 2006 – Instapundit, donsurber.blogspot.com, Science and Politics (now dead, with a note poking fun at Jure’s research), Watcher of Weasels

Q: Will influentials in networks continue to have influence over time?
A: Algorithms select influencers based on past behavior. If you choose naively, it won’t work well. We might need to limit these sorts of algorithms only to larger blogs.

Q: You’re inferring an infection tree by optimizing a graph. Is there any empirical data to test whether this is what really happens?
A: We can check by using data sets where we’ve got quotes and links. If we can reconstruct the link structure just based on quotes – which we can – that’s probably a pretty good empirical test of the algorithm.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

02/16/2010 (7:04 pm)

links for 2010-02-16

Filed under: del.icio.us links ::
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

02/13/2010 (12:30 am)

Watching Nigeria (even when my eyes hurt)

Filed under: Africa ::

I’m mostly offline this week, as I’m still healing from surgery and finding that reading and emailing are very painful… but one of the stories I haven’t been able to tear myself away from is the tale of Nigeria’s missing president.

Since late November, Nigerian president Umaru Yar’Adua has been in Saudi Arabia, receiving medical treatment. Fair enough. When he “won” a rigged election in 2007, rumors swirled that he was in ill health – during the campaign, he challenged political opponents to a game of squash to demonstrate that he’s a vital, healthy guy. But Yar’Adua hasn’t just been out of the country – he’s been incommunicado, out of touch with his cabinet and closest aides. As a result, Africa’s most populous nation has been functionally leaderless since November, a situation that’s pretty hard for me to comprehend.

One of the newspapers I follow most closely in Nigeria is 234Next, which has a track record of breaking interesting stories and a terrific online presence. On January 10, the paper reported that Yar’Adua had suffered serious brain damage and no longer recognized his close aides. They reported that the first lady was closely controlling access to the President and preventing journalists from understanding the depth of his illness. Two days later, Yar’Adua gave an interview via telephone to the BBC, announcing that he was receiving treatment in Saudi Arabia and recovering. 234Next questioned the authenticity of the voice on the tape and demanded to be allowed to send a journalist to visit the president.

Now the newspaper is claiming victory, as Nigeria’s National Assembly voted on Tuesday to make Vice President Goodluck Jonathan the acting president. It’s about time – ethnic violence has been flaring in Jos, militants continue to attack oil infrastructure, and Nigerians around the world are coping with the increased scrutiny that comes with Umar Farouk Abdulmutallab’s failed attempt at downing a US-bound airplane.

Dialog in the Nigerian blogosphere helps explain the balancing act that is Nigerian politics. Northerners – majority Muslim – and Southerners – majority Christian – have historically shared power by alternating who controlled the presidency. With a Northern, Muslim president (Yar’Adua), Nigeria has a Southern, Christian VP, Jonathan Goodluck. (It’s worth noting that Yar’Adua and Goodluck have both been characterized as marginal, obscure figures in Nigerian politics until their recent elevation to power. Yar’Adua was essentially appointed by former president Obasanjo, and Goodluck’s father was a respected politician, but his son was far less well known.) Some northerners are upset that power – which traditionally oscilates between the north and south – is now in the hands of a southerner, out of turn. Some are questioning the legality of the move that put Goodluck in power. And there’s a frantic scramble to see who’ll be Goodluck’s deputy, because it’s believed that the new acting VP will be a strong candidate for the presidency in 2011.

Two things have struck me about the situation in Nigeria. I shouldn’t be surprised, but I am that there’s been so little international media attention to the Nigerian crisis. It’s impossible to overstate the importance of Nigeria to African stability. It’s a huge nation, incredibly powerful in terms of natural resources, struggling through the religious and ethnic tensions that face many African nations. I would imagine that if Silvio Berlusconi suddenly disappeared from the world stage, there would be daily updates in global media. (Remember what happened when South Carolina governor Mark Sanford disappeared for less than a week?)

Second, I’m amazed by the resilience that Nigerian governmental institutions have shown in the face of this crisis. It’s easy to imagine military leaders taking advantage of the power vacuum to seize authority, as has happened so often in Nigerian history. In a country – both fairly and unfairly – associated with corruption, crime and misgovernment, it’s possible to imagine dismal scenarios emerging from this situation. Instead, elected representatives acted to ensure stability, put in place a stable transition and deal with circumstances that would challenge any nation. As blogger Solomon Sydelle put it on Nigerian Curiosity, “February 9th [the day Goodluck took power] could possibly go down in history as a day when democratic political measures where used to take Nigeria one step further down the path to becoming a true democratic nation.”

Here’s hoping a Goodluck presidency shows that Nigeria is a nation ruled by laws, not by the whim of those who can seize power… and that the rest of the world sees that Nigeria is a country coping with a tough situation and handling it with grace and stability.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

02/12/2010 (7:04 pm)

links for 2010-02-12

Filed under: del.icio.us links ::
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

02/11/2010 (7:04 pm)

links for 2010-02-11

Filed under: del.icio.us links ::
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

02/08/2010 (7:03 pm)

links for 2010-02-08

Filed under: del.icio.us links ::
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
Next Page »