Friday, October 30, 2009

Book review: The Illusions of Entrepreneurship, by Scott Shane

My rating: 2.0/5.0

The central goal of this book is to debunk overly optimistic myths about entrepreneurship in the US (and partly abroad). In doing so, however, the author effectively replaces such myths by an equally harmful proposition, namely that starting a business is statistically a bad idea (most startups go under in a few years, most business owners would be better off working for others, etc).

Although I believe academics might find his extensive bibliography research useful (hence the two stars, and not one), I question whether investors or prospective entrepreneurs would learn anything useful from this book (contrary to what is suggested by its marketing). Most of the data and discussion on survivability and profitability refer to all startups taken together–with the exception of a literally sketchy and brief Chapter 7 (see below)–and virtually no effort is made to quantify the likelihood of success for different types of business, product, service, or entrepreneur. As such, it's like looking at the face-value statistics of a marathon (most runners either don't finish or finish far later than the winner) without conditioning on crucial factors such as the level of preparation of the competitor, their diet, or how many previous marathons they have run for example. In other words, such data is of little use for the wanna-be runner who is seeking to obtain a competitive advantage in preparation for the big day.

The only part of the book where such issues are addressed is in the aforementioned Chapter 7, where the author identifies industries (e.g. high-tech) that yield a higher rate of success for startups. But then again, in the author's own words, "[...] the smart money already knows this. Just look at where venture capitalists put their money". The remainder of the chapter is rather cursory and oversimplified, at one point summarizing the findings of what seems to be hundreds of references in about one page (p. 117).

Now, even in the realm of face-value type of analysis, I have trouble following one of the central conclusions of the author, namely that business owners are worse off than workers except for the lucky ones that make it to the top decile (Chapter 6). Although the family income data distribution in Fig. 6.3 seemingly corroborates his point, Fig. 6.4 manifestly contradicts it (family wealth distribution). Indeed, if you look at the actual numbers in the reference provided by the author, the data shows that families whose head is a business owner typically accumulate ~4 times the wealth of worker families across *all* deciles.

There are other minor points that I have found distracting. First is the frequent use of "he/him/his" when referring to an entrepreneur. That is not just politically incorrect; it's also offensive for the female reader who is trying to overcome the gender bias that the author talks about in Chapter 8. Also, sometimes it's hard to know what the author is trying to convey: in page 105 he says that "People who have their own business are more likely than people who work for others to report that their work makes them unhappy or depressed", and three pages later he says that "Entrepreneurship provides a very important non-financial benefit: it makes people happier". (?!)

Perhaps this book is useful for policy makers or academics who need to quote accurate (but not necessarily constructive) statistics. But for the rest of us, the wisdom of seasoned investors and/or entrepreneurs is far more useful (see e.g. the articles by Paul Graham, from Y Combinator, or those of Greg Gianforte, author of Bootstrapping your business).

Saturday, September 26, 2009

Interdisciplinarity in patent law


One of the criteria for an idea to be patentable is that it must be non-obvious to "a person having ordinary skill in the art".

But what about persons having ordinary skill in a different art?

Take for example the case of Google's PageRank, the algorithm partly responsible for their initial success as a web search engine. The basic idea is very simple: First, one represents every document as a node, and every link between documents as a directed edge (see above). Then one imagines a web surfer who, starting from an arbitrary document, randomly follows a link to another document, ad infinitum. The PageRank of a document is then defined as the probability that the random surfer will be found at that particular document.

(For the technically oriented: Yes, we need an additional operation to ensure ergodicity, i.e. to avoid "getting stuck", but that's the basic idea; see for example Michael Nielsen's lecture).

I have mixed feelings about the fact that this algorithm is patented. On one hand, it is undoubtedly one of the most important additions to modern web search technology. On the other, it is the obvious solution for anyone with a background in chemical kinetics (or, more generally, Markov chains). Indeed, a typical depiction of a reaction network is precisely in the form of the above diagram, where the nodes represent species (or conformations), and the arrows represent rates. In this context, more important conformations correspond to nodes with higher equilibrium probabilities–i.e. higher PageRank.

As both academia and companies become more and more interdisciplinary, an important question arises: Should well-established ideas in one field be patentable simply because they have been straightforwardly applied in another field?

I'd be curious to see how courts would address these (emerging?) questions.

Friday, September 11, 2009

VIX silliness

VIX is an infamous financial index that attempts to measure implied short-term volatility in the S&P500. In other words, it is a "crystal ball" of sorts that tries to forecast the amplitude of overall stock price fluctuations in the next 30 days. This is achieved by means of a complicated formula based on current option prices.

To me, it seems to be one more example of unnecessary and distracting mathematical sophistication in economics.

Indeed, the simplest measure of volatility–the standard deviation of S&P500 one-day returns over the past month–is an equally good predictor of market volatility for the following month. The plots above speak for themselves; note in particular the nearly identical correlation coefficients r. The data spans the period of 1990-present (data source: Yahoo! Finance).

To econophysicists, this is not surprising: although returns are hardly correlated in time, it is well known that volatility can remain correlated for days. I wonder if economists have taken note of this straightforward observation as well?

Saturday, September 05, 2009

World's most generous donors

Forbes recently ran an article called Billion-Dollar Donors, which was summarized in this WSJ blog post. The articles list the top donors in the world, ranked by how much they have donated.

I thought a more useful–but obviously not absolute–measure of generosity would be to express their total donation as a percentage of their net worth. Although this would in principle require revisiting the entire list of donors, an interesting result is also obtained by simply re-ranking the above top 14 donors according to this new measure:
  1. Gordon Moore ($2.6b, $6.8b): 72%
  2. Klaus Tschira ($1.5b, $1.1b): 42%
  3. Bill Gates ($40b, $28b): 41%
  4. George Soros ($11b, $7.2b): 40%
  5. Ted Turner ($1.9b, $1b): 34%
  6. Stephan Schmidheiny ($2.5b, $1b): 29%
  7. Eli Broad ($5.2b, $2b): 28%
  8. Warren Buffett ($37b, $6.7b): 15%
  9. Michael Bloomberg ($16b, $1.5b): 9%
  10. Michael Dell ($12.3b, $1.2b): 9%
  11. Li Ka-shing ($16.2b, $1.37b): 8%
N.B.: The net worth data of the participants is taken from Forbes. The numbers in parentheses are the current net worth (N) and amount donated (D), respectively. The percentages are obtained from D/(N+D). Some names are not shown because their net worth data is not available. This ranking only reflects current net worth; it is possible that when an individual made a donation his/her net worth was considerably different.

Friday, September 04, 2009

Google Economic Indices

Google Finance has recently launched a new feature called Google Domestic Trends. Basically, it is a collection of Google Trends queries that gauge several aspects of our economy, such as Auto Buyers and Unemployment. Each query is translated into a time series, which can be regarded as an overall index for those aspects of the economy, much like the Dow or S&P500 for the stock market.

Not surprisingly, as with Google Flu Trends, such indices are often ahead of traditional performance measures adopted by other agencies.

Friday, August 28, 2009

Dangerous prejudice in correlation analysis

Greg Mankiw first nails it, and then hammers his own thumb in a recent post on SAT scores vs. family income.

His initial remark is a reminder from Stats 101 that correlation does not imply causation: just because test scores are positively correlated with income does not mean that the latter determines the former.

Then he suggests that the hidden variable that does uniquely determine test scores is family IQ, which–he hypothesizes–is also correlated with income, thereby explaining the observed score-income correlation.

I find his comment dangerous and, literally, one-dimensional (déjà vu?). It is dangerous because such types of remarks are a no-win situation: if they are not backed up by solid data, they are merely prejudice based on genetic elitism, while if they are unequivocally backed up by statistics, their outcomes can easily induce painful separatism through genetic screening and others.

Finally, it is one-dimensional because it ignores several other important variables. First and most obvious is that higher income families can afford to send their kids to expensive private schools, who compete precisely for such SAT scores. Also, kids from high-income families can afford to dedicate most of their energy to school, since they don't have to work to support themselves. And family culture changes substantially with income, with higher income families having higher expectations–and hence pressure–on their kids than lower income ones. The list can go on.

Since such socio-genetic matters are extremely difficult to prove and their outcomes can do more harm than good, I believe we are better off leaving them alone.

Tuesday, August 18, 2009

Inefficient debates on market efficiency

I find it amusing that so much attention is being given to the so-called efficient market hypothesis. This rather old debate is concerned with whether the hypothesis is true or false.

In my opinion, the reason there is so much disagreement surrounding this hypothesis is that it can never be proved or disproved, since it is ill-defined and therefore cannot be unambiguously tested.

Indeed, what does "correct price" mean, given that there is no unambiguous formula for this, and fundamental analysts frequently disagree on the value of a company?

The more pragmatic definition based on being able to "beat the market" is too ambiguous as well: Did Warren Buffett prove the market inefficient, or was he simply a lucky survivor among many fallen investors?

Fama has formulated market efficiency in terms of being able to profit from different types of information available, namely past prices (weak efficiency), public information (semi-strong), and both public and private information (strong efficiency), but I still think it’d be a nightmare–if not impossible–to test these hypotheses. Would you try all possible strategies in the world that make use of such information, and see if there was at least one that consistently outperformed the market? Did you really use all available information? Will that winning strategy continue to outperform the market, or was it only good for a given time period? Clearly all sorts of practical and conceptual issues emerge with such tests, and you can never convincingly prove or disprove the hypotheses.

In my opinion, this debate goes to show how precariously economics stands as a quantitative science.

Thursday, August 06, 2009

Stocks, markets, and their elusive purpose

There has been a heated debate in the media about further regulation of stock markets, fueled by the disproportionate profits of some Wall St companies that make use of high-speed trading. The first major consequence of this debate was seen today, when Nasdaq announced it will no longer offer the virtually instantaneous order-peeking feature behind this type of trading.

Some critics claim that this type of trading does no good other than making such traders rich. I think that this debate provides a good opportunity to reflect on the very purpose of the stock market as a whole, and not just on some technical aspects of it.

The simple argument that is frequently invoked to justify stock markets is that they "allocate capital to its most productive uses, for example by helping companies with good ideas raise money". Well, I find this widely accepted view funny, as the bulk of the market activities happens precisely after the initial public offering (IPO) of a company (which occupies only a brief moment in the history of the market):
The picture above summarizes my view of the main stages of this process. The pie represents all the shares of the company; the white slices the shares outstanding (i.e. offered to the public); the dark slice the shares kept by the company treasury.

Note that I have deliberately left "dividends" as a question since nearly half of the publicly traded companies (including Apple, Google, Cisco, etc) don't pay dividends at all, leaving none or little connection between companies and markets after the IPO.

Which leads me to the question: Why should companies care about what happens to the price of their shares after the IPO, if the capital has already been raised? Similarly, why do traders react to or care about the performance of a company, if the companies will/might not share their profits with them? The stock market at this stage seems to take a life of its own, one whose purpose is still elusive, at least as far as I am concerned.

(N.B.: Economists like to give sophisticated answers along the lines of "trading makes share prices more realistic by reflecting the public view on the value of the company, and consequently can serve as a useful economic indicator". But I am not convinced; markets of such magnitude cannot exist simply to yield another economic indicator. Also, I am aware of stock buybacks and additional stock offers that companies can resort to; but my understanding is that this link between companies and the public is brief and rare, and thus negligible when compared to the humongous volume of trades take takes place among the public itself.)

Friday, June 26, 2009

Theoretical limits on population density

This is from Carson Chow's blog, one of my fellow tenure-track colleagues at NIH. It could be considered a back-of-the-envelope theoretical limit on population density. From the perspective of geographic occupation alone, we seem to be far from exhausting the earth's capacity; however, when a rough estimate is made as to the required agrarian landmass per individual, we seem to be running at nearly half the earth's capacity (!). And that's without considering any of the socio-economic aspects of "capacity". According to the worst case scenario projected by the UN, Carson's theoretical limit might be reached as soon as 2150; or never, if the population stabilizes at about 8-9 billion in 2050, as predicted by their "medium" fertility scenario. Here's a copy of Carson's post:
Ever since Malthus, there has been a concern about overpopulation. I thought it would be an interesting excercise to see how much space the human population actually takes up. For example, how many oil tankers would it take to carry around the volume of humanity if converted to liquid. Let’s say there are 6 billion people on the planet and the average mass per person is 100 kg (this is an overestimate). Hence, the upper bound on the mass of humanity is 10 ^{12} kg, or a billion metric tons. Given that we are mostly water, we can assume that this is about 10^{12} litres. Taking the cube root gives 10^4 * .1 metres or a kilometre. Thus, if we liquefied the mass of all humans, it would fit in a cube whose sides are a kilometre long. The largest oil tankers can carry about five hundred thousand metric tons, so two thousand oil tankers could cart around all of humanity. To put that into perspective, according to Wikipedia, the current fleet of oil tankers moves around 2 billion metric tons a year, so half the world’s fleet could carry around the world’s population.

Now, how much area would we take up if we were to stand side by side. Let’s say 6 people can fit into a square metre of space, then we would all be able to fit into a billion square metres or 1000 square kilometres, about the size of Hong Kong (according to Wolfram Alpha), or we could all fit 4 to a square metre onto the island of Oahu in Hawaii. If we each wanted about 100 square metres of space, then we would take up about a million square kilometres or about twice the area of France. Wolfram Alpha also tells me that there is about 1.5\times 10^7 square kilometres of arable land in the world. If we assume that a square kilometre can feed 1000 people (10 people per hectare), then that puts the capacity of the earth at 15 billion people.

Saturday, June 06, 2009

Wolfram's response to my question

Last Thursday Steve Wolfram answered questions about Wolfram|Alpha in a live broadcast. I think there's a lot of potential for W|A, and I'm excited to see where it's going, especially with regard to analyzing all the data they have gathered. The full question was:
Do you have any plans to allow some type of cross-database analysis? Since you have such a diverse repertoire of quantitative databases at your disposal, it seems to me that the natural next step is to allow users to look at correlations among different variables, which could or could not belong to different databases.

Examples include correlation plots of two given variables (say, housing prices versus a stock market index), finding the best linear combination of a set of variables that correlate with another variable (multiple regression), etc.
His answer to my question is found around minute 44:00 of the above broadcast. Essentially, he says they have the technology to do so, but the main difficulty is parsing such queries.

I am currently helping them as a tester for the preview version of W|A, and I look forward to testing their attempts to address such types of queries.