Sunday, 25 February 2018

Book review: Everybody Lies, Seth Stephens-Davidowitz (2017)

This is a happily optimistic book about big data by a single, middle-class white male who graduated from Harvard and worked at Google. It tries to elencate the gains to be made in a future where more data can be more quickly quantified and analysed by more people for more purposes.

It suggests a world where difficult questions can be solved using the vast amounts of digital information that is always being collected about us, most often through web searches. What people tell researchers who use surveys might not necessarily match reality in all cases, so conclusions that rely on such information have to be taken with a grain of salt.

But some things are impossible using big data. You can’t pick winners on the stock market using big data, for example. There is already so much information available about publicly-traded companies that is being studied by so many people, so trying to find causation or correlation using other things as indicators of stock price performance is unrealistic. (The author was asked by Lawrence Summers, treasury secretary, 1999–2001, to find out if it was possible to do this. Stephens-Davidowitz had by that time published some articles that had garnered attention in some quarters in the US.)

There are also some things that governments should not do with big data, such as trying to predict if someone will commit a crime based on their web searches.

But in other areas, such as health care, the use of valid comparisons – Stephens-Davidowitz calls them dopplegangers – between different people can lead to good outcomes. Companies like Amazon use them already when they recommend books for you to buy, for example.

The author paces his material well in some cases but he still goes too fast in many others. He tends to get carried away by the novelty of the instance and forgets that the reader is coming from a position of complete ignorance, while he remains an expert statistician. I found this a bit annoying at times, and wished he would have slowed down a bit so that he could background his material better, to ensure better comprehension. And the technical concepts to do with the art of statistics that he introduces get lost completely in the wash of information because he doesn’t explain them well enough. The end result is that I sit here in front of my keyboard trying to think about some of the individual case studies he describes, so that I can talk about them for readers, but my mind comes up a blank.

I tell a lie: there was one experiment which stuck in my head. It involved conversations between men and women that were recorded during a first date. The content was then converted into text files and analysed. The study found that if the conversation involved the woman using the word “I” a lot, there was likely to be a second date. If there were a  lot of questions asked, however, there was likely to be no second date. This was an interesting study and was the kind that might reveal truths that regular surveys would miss, because here you were dealing with people who acted all along as though they were not being observed.

Problems of pacing are common when you have an expert writing a book for the trade market. The art of writing for this market is different than the art of writing for a peer-review community, whichever discipline is involved. You have to make sure you treat your reader like a little old man with a walking stick, and you are helping him to cross the road. Each step has multiple phases with many moving parts to think about in order to complete it reliably. A crossing that a young man manages almost instantaneously and without any effort at all might take an old man minutes to complete only with intense concentration. Stephens-Davidowitz says in the book that he worked hard on each chapter to make sure his word selection was always right, but I think his editor could have done more to make the book more accessible for the layperson.

While overall the book was sometimes of more than passing interest, I furthermore found it a tad parochial. You can only ask questions that are as good as the categories you rely on, for a start. It comes down to a question of defining what you are measuring and how to identify the best representatives of the target population.

I think that while it might be possible to do something ephemeral like picking good batters for a baseball team by relying on statistical data, it might be a little more difficult to do important things, such as reducing the quantity of income inequality in the US, unless you ask the right questions of the right people. When it comes to raising living standards, if you think that unbridled capitalism is the answer to every question asked, as Republicans do, then you are never going to reach the right conclusions. Same goes with health.  I dare say that like many  people employed by technology companies in the US, Stephens-Davidowitz is likely a Democrat, but that doesn’t say much. A right-leaning Democrat is more like a true Australian conservative than anything else. We have few politicians here to compare with the types of extreme ideologues you routinely find in the Republican Party.

And categories are important for studies of this kind. In the US, for example, they airily mention “going to school” when they really mean you attend a tertiary education institution. So your categories and your way of perceiving the world – which will determine the questions you ask – are already determined to a certain degree by the language that you use when you talk about something. In other countries, they usually refer to “universities”, not “schools”. So how do you go about doing a search for data about tertiary study when the referents you use are different in different places?

The author, who is regrettably a big baseball fan, depends heavily on a view of the world made through American eyes, and so his choices when it comes to asking questions that might help to actually improve regular people’s lives are always going to be constrained by focalisation.

There are some things that he will never see, and I suspect no amount of big data analysis by people like him who belong to the elites is going to help solve the big problems that daily confront regular people living in the US, such as the ludicrous minimum wage, the ridiculous gerrymandering of electoral boundaries that keeps the Congress Republican, the fact that health care costs twice as much in the US compared to the OECD average but the average lifespan is about three years shorter. And forget about gun control. The number of wicked problems in America abound and everyone already knows what they are, except for Americans themselves. Will big data help solve them? I frankly doubt it.

On the upside, the book, subtitled ‘What the Internet Can Tell Us About Who We Really Are’, posits that big data might allow social scientists to finally earn the level of respect that society routinely bestows on physical scientists. A much bigger problem however is that Americans never see anything beyond their own borders, and so ignore gains that communities in other places have made over the years by intelligently adapting their politics, labour relations and economies to fit their unique cultures.

Learning from others’ successes is something that everyone, especially American social scientists, should do.

No comments: