Pixels & Pilcrows: linguistics

Monday, March 14, 2011

The Need for a New Social Science

Memetics, Marketing, and Sociology can only get us so far.

I'm going to be very, very careful in this post; I don't know how to present this in a way that doesn't make me sound a little eccentric. I've been thinking a lot lately about memetics, the study of the transmission and propagation of ideas. These ideas have been compared alternately to genes and to biological viruses, spreading about hither and yon. There's been a bit of work done in the area of what ideas stick and what ideas don't, (consensus: various values are important, truth is only one of them) but there are just some huge, fat, gaping holes in the current state of the study of the propagation of ideas.

Gaping Hole #1: Knowing What We Can Know

First, and this is the scariest, we just have a very limited ability to view memes. We can see the spread of macro images on the internet, and identify that Ceiling Cat is probably sticking around for a while, whereas Rebecca Black's "Friday" is probably not. But we don't really know what to do with this information; the shape of the meme continues to elude us.

We will almost certainly never be able to predict what data will spread, and we will almost certainly never fully get a bead on what makes data "spreadable". We can get a general idea, perhaps, but it won't be an equation. Psychohistory is an idea in Asimov books, not a thing that will actually exist. (Oh, yeah, link. Sorry.)

This gaping hole is that memetic endeavors have largely been misguided because they've been strapped to a biological framework. Memes have no chromosomes, they cannot be put under a microscope, and they cannot be "sequenced".

Gaping Hole #2: Ideas and Behaviors

Second, there's not much in the literature by way of determining what, if any, is the difference between ideas that spread and the behaviors they leave behind. A piece of information that spreads might just annoy me by leaving "Never Gonna Give You Up" stuck in my head, or it might convince me to vote for Ralph Nader. There is a big difference between a meme and its behavioral payload.

Gaping Hole #3: The Myth of Measurement

Third, memetics tends toward the very abstract, and generally fails to measure anything at all, instead engaging in length thought experiments and exegeses on what theoretical construct is better for the task, in essence applying none of them empirically. Here's an article example, and it's ridiculously long.

The Solution

The solution is to actually conduct experiments on actual cultural transmission. Richard Dawkins, the founder of memetics, once noted that memetics had not yet found its Crick and Watson; it hadn't even found its Mendel. I think he was wrong. Plenty of people have come before, studying memetics before it was called memetics. Most importantly, to my mind, is the sociolinguist William Labov.

Labov tackled issues of how language change spread, what factors made someone likely to adapt their dialect to another, and so forth. One of his earliest and most famous studies was one of employees in three different department stores in New York City. He found that one meme, the tendency to "correct" the New York City tendency to "drop" the r-sound in certain words (his test was "fourth floor"). The meme was more likely to spread along socioeconomic lines, meaning that those in stores with higher-priced goods tended to include r-sounds in "fourth floor".

I know this isn't really much, as far as studies go, but it was a start. Labov discovered a number of patterns of linguistic change, and scores of sociolinguists after him have followed suit. Sociolinguistics may be a little tame compared to full-on memetics, as language traits are hardly as world-changing as religious and political beliefs, but it is a sufficient, if terribly overlooked start. Not much different from Mendel's plants, if you think about it.

Monday, October 25, 2010

The Chinese Language is the Deep Web

Reading Nicholas Kristof's post "Liu Xiaobo and Chinese Democracy", about Mr. Liu's recent Nobel Peace Prize, I saw a piece of content stood out, not only for its content, but also for the offhand way in which it was presented:

Today, Liu presumably doesn’t know that he has won the prize, and the Chinese government is trying to censor the news. But China is changing and censorship no longer works so effectively. It can ban mobile phone users from texting the characters for his name, but young Chinese are smart enough to use substitute characters.

Assuming this actually is the case, it means that hidden within the Chinese languages (and it's clear that they are separate languages, not dialects of one overarching, crazily heterogeneous Chinese language) is a hidden world of possible ideogram-meaning combinations, connected by sound. Here's how that would work:

Every Chinese character represents a word. (Linguists: I know there are exceptions. Thanks.) For example, the word for "work" is 工, pronounced "gong" with a high, steady tone. The word for "attack" is 攻, also pronounced "gong" with a high, steady tone. The word for "supply" as in "power supply" is 供, also "gong" with a high, steady tone. So on with the words for "official business", "palace", and "bow" as in "bow and arrow".

Right now, the censors at Great Firewall HQ, actually called the Propaganda Department—I kid you not—are poring over blogposts and texts and other electronic content, finding subversive messages and stamping them out like bugs. Now, I imagine that a bit of this is done automatically, by keyword, and a great deal more is done by

a large government department, full of the average office assortment of flunkies, middle managers, angry bosses, and the ennui that comes along with this setup.

Now imagine an undercurrent of blogs that don't seem to make sense at first glance. They bring up no poisonous keyword hits. They carry no familiar subversive slogans. But for those who would read them aloud, they transfer hopeful messages of democracy, commentary on the Chinese political situation, and perhaps even plans for meetups and other events.

This sound-meaning correspondence is much like what serious internet people call the "deep web". The deep web consists of all the data on the Internet that's not directly accessible to the average end user of a search engine. Deep web data is significantly more voluminous than surface web data. From the wikipedia page:

Deep Web search reports cannot display URLs like traditional search reports. End users expect their search tools to not only find what they are looking for quickly, but to be intuitive and user-friendly. In order to be meaningful, the search reports have to offer some depth to the nature of content that underlie the sources or else the end-user will be lost in the sea of URLs that do not indicate what content lies underneath them.

By moving context outside of the scope of these messages of Chinese democracy, writers would easily circumvent any mechanical attempts at censorship. Certainly, it's not perfect, but even in a worst-case scenario, this practice could burden the Propaganda Department with the need for more human censors.