Compulsory vs. Optional Census Data

There are two types of errors in any measurement: random and biassed. Random errors tend on average to give the correct answer; bias errors give the wrong answer, and are much more dangerous.

In a census, one random error is due to finite sample size. The "standard error" of a ideal sample of a million is the square root of a million, one thousand, 0.1%. Normally, it is the "19 times out of 20" error that is quoted, three times the standard error, or 0.3%.

All too often in statistical circles, that's the only error that is quoted. If you believe it, you're living in fantasyland! You'd never last five minutes in a standards lab. In fact, you'd have problems getting a paper accepted by any physics journal I've ever published in.

First, the calculations above assume an ideal mathematical form for the errors: Gaussian. And, we people definitely aren't ideal. But worst of all, they assume no consistent bias in the census answers. If people are compelled to answer a question, that isn't true either.

Let me give you two examples, both from personal experience.

When I was married, the Province of Ontario required both partners to declare a religion. But, not just any religion, only one from a selected list, or "atheist". There's no way I was, or ever will be, atheist. But, nothing even approximating my religious beliefs was on the list. So, given the choice of not being allowed to marry in Ontario or lying, of course I lied.

Anyone competent studying the resulting statistics would have been suspicious that the answers added to 100%, and, finding that the question had been compulsory, realise that the scale of sect memberships was high and take steps to estimate how high by checking church attendance figures and the like. That source of bias was evident from the data itself.

But, there was at least one other major bias error in them as well. Only two sects performed extensive missionary work in our north: Roman Catholic and Anglican. So, all our northern residents, notably first peoples with their spirituality that predates ours, would have chosen one of these two in preference to all the others put together. The statistical results were not just larger than reality, they were point blank wrong. And, knowledge outside the data was required in order to detect it.

That's why bias errors are the most dangerous of all. And, it's why people who rely on accurate measurements spend 90% of their time searching for them. It's "how am I wrong", not "if".

I've never been selected for the long form Canadian census, but if I had been, I'd have had a problem for many years with the "number of bedrooms" question. According to city bylaws, my house had three bedrooms. But, with five children and me, it actually had six. The diningroom had been converted, also two rooms in the basement. The latter two in particular were probably in violation of untold city ordinances, but how else could I get six bedroom accommodation on a single parent income?

So, ordered to answer, I would have answered "three" to avoid any chance of unwinnable problems with city hall. Single fathers have quite enough hassles as it is, thank you. And, many other people with similar family situations, such as extended families, would do the same, because very little family housing in Canada has more than three official bedrooms. So, Statistics Canada data on the number of bedrooms in a dwelling unit, obtained as they are by compulsion, are almost certainly wrong: they underestimate the number of bedrooms in the more-than-three categories

If answering the question was optional, I would have left it blank. Then, analysts would know how many households were irregular with respect to that question. And, since they would have answers to most of the other questions, they'd be able to classify groups that predominated with that answer, then pin down what the likely bias error was using other specially focussed surveys and correct for it.

And that, in a nutshell, is why a question-by-question voluntary census can be more trustworthy than a compulsory one, not less. The random sample size error is increased, but the hidden bias errors can be greatly decreased. Most selection clusters are evident in the data and can be dealt with. When real people in all our variety are involved, the bias errors undoubtedly exceed the random errors, probably by a large margin.

So, if we want to have the most trustworthy Canadian census data possible, the first page of the census should make two points, in Grade 4 English/French so that those for whom a Canadian language is a recent adult acquisition can understand:

  1. Please answer as much as you can. This data is important to your future, that of your community and your country.
  2. If you are uncomfortable answering a question for any reason, please leave it blank and answer the others as accurately as you can.

Note that the above comments do not apply to a situation where people can simply pitch the census in a wastebasket and no linkage of the fact that they have is made to other data about that person, such as the short form census. In such a case, the selection errors are not evident from the data and cannot be identified with groups so that remedial surveys can be used to estimate the selection bias errors. That is why Canada's Chief Statistician felt it necessary to resign, when the government of Canada claimed that such an untraceable totally optional survey could be an acceptable replacement for the compulsory long form census. It is not.

Statistics Canada isn't the only institution that assumes that people will answer all questions truthfully under duress. So does the survey firm Angus Reid.

For example, in their financial surveys Angus Reid first asks, "where do you keep your savings?" Then, they ask, "which of these institutions is your principal bank?". Everyone who keeps their savings with an investment institution but does all their day-to-day banking with a bank, can't answer truthfully! If you don't answer, they won't allow you to complete the survey. And, if you don't complete the survey, you are dropped from their list, along with the "survey dollars" you have earned. So, just as with StatsCan's compulsory census, most people will simply give Angus Reid the handiest untrue answer, then get on with their lives.

So, if you are relying on StatsCan or Angus Reid data for anything, be careful. It's less accurate than they claim.

It doesn't have to be that way.

John Sankey
other notes on community matters