Compulsory vs. Optional Census Data

There are two types of errors in any measurement: random and biassed. Random errors tend on average to give the correct answer; bias errors give the wrong answer, and are much more dangerous.

In a census, one random error is due to finite sample size. The "standard error" of a ideal sample of a million is the square root of a million, one thousand, 0.1%. Normally, it is the "19 times out of 20" error that is quoted, three times the standard error, or 0.3%.

All too often in statistical circles, that's the only error that is quoted. If you believe it, you're living in fantasyland! You'd never last five minutes in a standards lab. In fact, you'd have problems getting a paper accepted by any physics journal I've ever published in.

First, the calculations above assume an ideal mathematical form for the errors: Gaussian. And, we people definitely aren't ideal. But worst of all, they assume no consistent bias in the census answers. If people are compelled to answer a question, that isn't true either.

Let me give you two examples, both from personal experience.

When I was married, the Province of Ontario required both partners to declare a religion. But, not just any religion, only one from a selected list, or "atheist". There's no way I was, or ever will be, atheist. But, nothing approximating my religious beliefs was on the list. So, given the choice of not being allowed to marry or lying, of course I lied.

Anyone competent studying the resulting statistics would have been suspicious that the answers added to 100%, and, finding that the question had been compulsory, realize that the scale of sect memberships was high and take steps to estimate how high by checking church attendance figures and the like. That source of bias was evident from the data itself.

But, there was at least one other major bias error in them as well. Only two sects performed extensive missionary work in our north: Roman Catholic and Anglican. So, all our northern residents, notably first peoples with their spirituality that predates ours, would have chosen one of these two in preference to all the others put together. The statistical results were not just larger than reality, they were point blank wrong. And, knowledge outside the data was required in order to detect it.

That's why bias errors are the most dangerous of all. And, it's why people who rely on accurate measurements spend 90% of their time searching for them. It's "how am I wrong", not "if".

I've never been selected for the long form Canadian census, but if I had been, I'd have had a problem for many years with the "number of bedrooms" question. According to city bylaws, my house had three bedrooms. But, with five children and me, it actually had six. The dining room had been converted, also two rooms in the basement. The latter two in particular were probably in violation of untold city ordinances, but how else could I get six bedroom accommodation on a single parent income?

So, ordered to answer, I would have answered "three" to avoid any chance of unwinnable problems with city hall. Single fathers in Canada have quite enough hassles as it is, thank you. And, many other people with similar family situations, such as extended families, would do the same, because very little family housing in Canada has more than three official bedrooms. So, Statistics Canada data on the number of bedrooms in a dwelling unit, obtained as they are by compulsion, are almost certainly wrong: they underestimate the number of bedrooms in the more-than-three categories

If answering the question was optional, I would have left it blank. Then, analysts would know how many households were irregular with respect to that question. And, since they would have answers to most of the other questions, they'd be able to classify groups that predominated with that answer, then pin down what the likely bias error was using other specially focussed surveys and correct for it.

And that, in a nutshell, is why a question-by-question voluntary census can be more trustworthy than a compulsory one, not less. The random sample size error is increased, but the hidden bias errors can be greatly decreased. Most selection clusters are evident in the data and can be dealt with. When real people in all our variety are involved, the bias errors undoubtedly exceed the random errors, probably by a large margin.

So, if we want to have the most trustworthy Canadian census data possible, the first page of the census should make two points, in Grade 4 English/French so that those for whom a Canadian language is a recent adult acquisition can understand:

Please answer as much as you can. This data is important to your future, that of your community and your country.
If you are uncomfortable answering a question for any reason, please leave it blank and answer the others as accurately as you can.

As long as StatsCan knows to whom they sent the forms, they can correct for selection effects by linking to data from the compulsory part of the census. In addition, they know how many there are of hundreds of identifiable groups, again from the compulsory census, so can normalize separately for each. Most of the selection errors can be corrected for in these ways. Both of these are routine techniques used all the time by polling firms.

All of these techniques were available to StatsCan, were implemented, and partially-completed forms were accepted. The results of the 2011 survey should be at least as trustworthy as the smaller compulsory version was. For sensitive questions, it is undoubtedly more trustworthy.

John Sankey 2011
other notes on community matters

Postscript November 2015: Regrettably, political vengeance has resulted in the long form census being made mandatory again. Once again, many Canadians will be forced to lie to avoid the heavy hand of their governments.