Third parties submit amicus briefs (“friend-of-the-court” briefs) in almost every U.S. Supreme Court case. 781 were submitted in the term spanning 2014–2015—an average of 12 per case—and 147 in the marriage equality case alone.1,2

Canada’s closest analogue to the U.S.’s amici are called “interveners”.3 From 2000–2008, interveners participated in only about 50% of cases at the Supreme Court of Canada, averaging 2.4 interveners per case.4 This was despite the court granting 94% of the requests to intervene.

How has this changed since then? I wrote a Racket script to get the numbers from 2010–2016. Interveners participated in appeals more frequently than they did from 2000–2008, but still at nowhere near the rate of amici at the U.S. Supreme Court.


To give a sense of who these interveners are, here are the parties that intervened at least five times over this seven-year period.5


Attorneys general dominate the top of the list, as they always have. Together, they accounted for 25% of all interventions from 2010–2016. This is a noticeable decrease compared to the numbers from 10 and 20 years ago. In the period from 2000–2008, attorneys general accounted for 37% of all interventions. In the period from 1997–1999, they accounted for 42%.6 Non-government interveners like public interest groups, trade associations, and individuals are participating more now than ever before.

In the midst of this general increase in interventions, one category of cases continues to lag behind. Most of the caseload at the Supreme Court of Canada consists of discretionary appeals: the appellant requests that their appeal be heard, and the court decides whether they will hear the case. But, about 20–25% of the caseload at the Supreme Court of Canada consists of appeals that are heard “as of right”—automatically. As-of-right appeals generally involve indictable criminal offenses where one of the judges at the lower court disagreed with their court’s decision.7 In 2016, only two of the fourteen appeals in that category attracted any interveners.


I can think of two reasons why these cases are getting less help from interveners.

For a case to make it onto the court’s discretionary docket, it must pass through a filter. The court selects cases that involve “a question of public importance or […] an important issue of law”. As-of-right appeals sidestep this filter. This could lead to these cases just not being as interesting to outside parties.

Another explanation could be that the appellants in these cases (often, criminal defendants) don’t have the same ability to wrangle outside help for their position.


The Supreme Court of Canada is getting more input from third party interveners than ever before. This is potentially a good thing, but that depends on which theory regarding the role of interveners is true.8

The “Amicus Machine” in the U.S. grew up over a period of about 15-20 years1, largely undirected, and not necessarily best designed to fill its ostensible role.9

As interventions become more frequent at the Supreme Court of Canada, we (or rather, the justices) have the opportunity to direct how this practice grows. In particular, we should be on the outlook for potential disparities in access to justice that come from the parties’ differing access to interveners.

1. Larsen, Alli Orr and Devins, Neal, The Amicus Machine (November 15, 2016). Virginia Law Review, Vol. 102. Available at SSRN:

2. Franze, Anthony J. and Anderson, R. Reeves, Record Breaking Term for Amicus Curiae in Supreme Court Reflects New Norm (August 15, 2015). National Law Review.

3. Interveners are Canada’s analogue of the U.S.’s amicus curiae. Canada also has a role called an amicus curiae, but this is a person appointed by the court, not simply an interested third party.

4. Alarie, Benjamin and Green, Andrew James, Interventions at the Supreme Court of Canada: Accuracy, Affiliation, and Acceptance (October 8, 2010). Osgoode Hall Law Journal, Vol. 48, No. 3, pp. 381–410, 2010. Available at SSRN:

5. The full list is here. I didn’t clean up the data, so the “Attorney General of Saskatchewan” is treated as different from the “Attorney General for Saskatchewan”. I also didn’t separate the interveners when a group of them filed together.

6. Burgess, Amanda Jane., “Intervenors before the Supreme Court of Canada, 1997–1999: A content analysis.” (2000). Electronic Theses and Dissertations. 2490

7. Criminal Code, RSC 1985, ss. 691-693.

8. There are various theories regarding the nature of amici in the U.S.: interest-group lobbyists, genuine friends of the court or of the parties, or a group of Supreme Court experts who have learned what information the justices crave and who are part of a managed strategy by the parties to win their cases.1 Interveners in Canada have been described as: genuine friends of the court who are trying to help the court make more accurate decisions, interest-groups presenting the best partisan arguments with which the justices can align, or interested third parties that the court listens to in order to increase the legitimacy of its decisions.4

9. This idea was expressed in special episode of the First Mondays podcast, “Amici #7: One Big Superbrief“.

A trigger list for technical debt in machine learning

Technical debt doesn’t appear in neat, labelled units. It builds up slowly, insidiously—especially in machine learning systems. It starts as a feeling. Features become hard to implement because of complicated interactions and dependencies. Re-training and deploying your classifiers becomes increasingly tedious. And when you do become aware of technical debt, you’re not always in the right state of mind to start writing a list or to plan how to fix the thing.

So, to notice technical debt early, I look for it as part of a periodic review, and I use a trigger list for this.1 It’s based off of this paper (Hidden Technical Debt in Machine Learning Systems) and my personal experience. I’m not going to describe what all of these triggers mean. They don’t even have to mean the same thing to you as they do to me. But, this list helps me to recognize and itemize the technical debt in my machine learning systems.

  • The first thing you wrote
  • Documentation
  • Dead code
  • Unit tests
  • Software dependencies
  • API mess
  • Data that breaks abstraction boundaries
  • Live monitoring of performance
  • Correction cascades
  • Who is using each module’s output
  • Data dependencies
  • Legacy features2
  • Bundled features2
  • Epsilon features2
  • Correlated features2
  • Glue code
  • Pipeline jungles
  • Dead flags
  • Plain old data structures
  • Multiple languages
  • Left-over prototypes
  • Configuration debt
  • Versioning
  • Data for testing
  • Reproducibility

1. Every good productivity or planning system includes something like a weekly review. These reviews let me be sure of my direction as I navigate each week. They also clear my head so that I am free to just focus, relax, or do nothing, without any worry that there is anything else I should be doing. I’ve found this trigger list to be useful during these weekly reviews.

2. “Features” here means features, or variables, or covariates, in a model.

All the President’s Men

I just finished reading All the President’s Men: Carl Bernstein and Bob Woodward’s presentation of their Watergate journalism.

I’d barely known what Watergate was about before reading this book. While I found the who-did-what narrative to be interesting, even more interesting were the methods the reporters used to investigate it, the editorial scrutiny applied by the Post, and the reaction by the White House to the evidence closing in around them.

This was a very dense book. Skim a page and you’re lost. That was just a consequence of how scattered the information was that the reporters were tracking down. The cover-up went straight to the top—to President Nixon—but for over a year, Bernstein and Woodward were following leads so disparate that they were just puzzle pieces they couldn’t even tell belonged to the same puzzle.

With only one exception, Bernstein and Woodward made sure whatever they published was rock solid confirmed. Behind every published story, there was a pile of suspicions, leads, and questions. Washington Post’s readers, just like Bernstein and Woodward, were for a long time only seeing a small slice of the operation.

The tactics of the Executive Branch seemed familiar. Press Secretary Ziegler constantly questioned the accuracy and legitimacy of the Washington Post, accusing them of lying. The President would express “full confidence” in an aide before having to cut him loose. This happened with Stans, Chapin, Dean. All were involved with the cover-up. It was almost a tell. The White House became obsessed with the leaks to the media rather than the charges they made.

The book starts on June 17, 1972, the night of the attempted Watergate burglary. The acknowledgement is dated February 1974. Articles of impeachment wouldn’t be reported to the House of Representatives until July, and Nixon didn’t resign until August 9, 1974, more than two years after the burglary. When this book was published, Bernstein and Woodward had no idea their journalism would lead to the impeachment let alone the President’s resignation. The final sentence in the book quotes Nixon: “And I want you to know that I have no intention whatever of ever walking away from the job that the people elected me to do for the people of the United States.”

I read this book over a period of a few weeks. Despite the very compressed timeline, the buildup still felt more like a ramp than a roller-coaster. Knowing the ending helped pull me through this story. I’m going to watch the movie next, and am curious about how it manages to preserve that pace and the feeling that the incremental revelations aren’t actually getting anywhere.

Classic style

I’ve been studying a writing style called classic style. It’s different than the plain style and practical style (my default styles) that are presented by Strunk and White and Joseph Williams.

I came across a paragraph that I thought would be interesting to rewrite in this style.

The original:

Recent advances in deep learning have made it possible to extract high-level features from raw sensory data, leading to breakthroughs in computer vision [11, 22, 16] and speech recognition [6, 7]. These methods utilise a range of neural network architectures, including convolutional networks, multilayer perceptrons, restricted Boltzmann machines and recurrent neural networks, and have exploited both supervised and unsupervised learning. It seems natural to ask whether similar techniques could also be beneficial for RL with sensory data.


Convolutional networks, multilayer perceptrons, restricted Boltzman machines, and recurrent neural networks are now all being used to extract high-level features from raw sensory data and have lead to state-of-the-art performance in computer vision and speech recognition. [6, 7, 11, 16, 22] Can we also use these methods for reinforcement learning with sensory data?

Why do I think this is more like classic style than the original?

  • I direct the reader’s attention towards a truth that I want to present. The original talks about advances, which lead to breakthroughs, then methods (are those the advances or the breakthroughs?), then architectures, and techniques. I’m not sure what the object of attention is supposed to be. I focus the reader on a list of methods and what they’ve been used for. I then ask whether we can use those methods for a new task.
  • I rely on nuance, so that the subordinate message does not obscure the main object of attention. I don’t want to focus the reader on history, or how recently researchers have discovered how to use these methods. I have chosen to wrap up the concept of recency in a single word: “now”. Classic style takes the stance that the reader is intelligent and interested. The reader can infer that these methods weren’t all being used some time before “now” and can look to the references if they want more precise dates.
  • To keep the focus on the list of methods, I use the passive voice in the opening sentence.
  • I’ve tried to make the nouns concrete things rather than abstract concepts. Instead of advances and breakthroughs: convolutional networks and features. This is more a feature of plain style, though.
  • It hides the effort. I thought about this for over a day, and had several false starts, but I don’t think any of that comes through in the final product.

There are surely other improvements and alternatives. If you identify a different truth, you’ll write a different paragraph. I haven’t thought about how this paragraph links up with its neighbors. There are also entirely other styles that you could use. How would you rewrite this?

More info:

Learning TensorFlow

Over the past two weeks, I’ve been teaching myself TensorFlow, Google’s open source library for deep neural network (actually, graph computation in general).

It was so easy to get started with TensorFlow that I was fooled into thinking I’d be writing an character-based recurrent-neural-network language model in a couple of days.

The TensorFlow website gives a few learning paths: Get Started, Programmer’s Guide, and Tutorials. These were written for various versions of the API and they don’t use consistent idioms or up-to-date functions. Regardless, I found these useful to go through to give me a sense of what it would be like to use TensorFlow.

After going through a few tutorials, I made the following learning plan and now feel comfortable defining and training non-distributed models in TensorFlow:

  • Create a simple single-layer network to try to learn a function from random data. This teaches how graph definition is separate from running the computation in a session, and how to feed data into placeholder input variables.
  • Output some summary data and use TensorBoard to visualize that the loss doesn’t decrease.
  • Create some synthetic data for a simple function. I used y = x[0] < x[1]. This just lets you confirm that the loss actually decreases during training. You can also visualize the weights as they change during training.
  • Replace the synthetic data with data that is loaded from file using an input queue. Input queues were the most confusing part of TensorFlow so far. Here is a minimal example of an input queue that reads records from a file and creates shuffled batches. One thing that made this more confusing than necessary was that I was using an input queue to feed a model that was being evaluated ridiculously fast. TensorBoard was telling me that the shuffle_batch queue was never getting filled up. But, this was only because my simple model was being evaluated too quickly during the optimization step. Once I increased the complexity of the model by adding a few more fully-connected layers, the optimization step took long enough for the queue to actually be helpful.

The MonitoredTrainingSession is very helpful. It initializes variables, watches for stopping criteria, saves checkpoints and summaries, restarts from checkpoint files if training gets interrupted.

My first real TensorFlow model was a char-rnn (used to model text by predicting the next character based on the previous sequence of characters). The part of the TensorFlow API that deals with recursive neural networks has changed a lot over the past year, so various examples you’ll find online present different ways of doing things.

  • TensorFlow’s own tutorial does not use tf.nn.dynamic_rnn to create the recurrent neural network based on a prototype cell. Instead, they show an example that explicitly codes the loop over timesteps and explicitly handles the recurrent state between calls to the prototype cell.
  • This blog post by Denny Britz is a good explanation of how to use dynamic_rnn to avoid having to do all of that by hand. It mentions a helpful function: sequence_loss_by_example, but that appears to have been superseded by sequence_loss.
  • This blog post by Danijar Hafner is second example showing how to use dynamic_rnn. It also shows how to flatten the outputs from the recurrent cell across timesteps so that you can easily apply the weights used for the output projection. However, this example doesn’t take advantage of the sequence_loss function and instead computes the sequence labelling cost by doing a partial summation and then averaging.

My main point is: don’t assume you’ve misunderstood something when you can’t reconcile two different examples that claim to demonstrate the same thing. It’s likely just an API change.

My own example is here. It’s not perfect either. I’m not passing state from the end of one batch to the beginning of the next batch, so this isn’t standard truncated back-propagation through time. But, the dataset I’m learning on doesn’t appear to have dependences that are longer than the length that I chose for input sequences. R2RT discusses the distinctions between a couple of different styles of back-propagation through time. The approach I ended up implementing is almost what R2RT is calling “TensorFlow style”.

Further, wasn’t thinking ahead to how I would load the trained weights for sampling when I wrote the training script. Instead, I redefined parts of the model structure in my sampling script. This is not good. A better approach is to define the graph structure in a class (like in this example). This lets you use the exact same model during evaluation/sampling as was used during training, which is important for matching the saved weights to their variables based on their keys (names).

If you’ve already been using TensorFlow for some time, I’d appreciate any feedback you have for me on my early TensorFlow code that I’ve posted on GitHub. Are there TensorFlow design patterns I’m missing, or helper functions I don’t know about? Let me know!

Meanwhile, back in Canada

During the 2015 Federal Election campaign, Mr. Trudeau promised to end first-past-the-post elections in Canada. We voted with the understanding that a Liberal victory would mean the end of first-past-the-post elections. The Liberals won.

Mr. Trudeau and the Liberals made this promise knowing that Canadians are not united against first-past-the-post, nor united around a particular alternative.

Screen Shot 2017-02-19 at 2.15.06 PM.png

In 2005, single-transferrable-vote (STV) was put before British Columbians in a referendum, and 43% voted for the status quo. STV did not reach 60% support and was not adopted.

In 2008, BC held another referendum in which 60% voted for the status quo; 40% for STV.

A survey conducted by the Broadbent Institute just after the 2015 election found that “44% of Canadians prefer one of the proportional voting systems while 43% prefer the status quo, the single member plurality system.” This is consistent with previous surveys and historical data.

Despite the lack of clear consensus for a concrete alternative, the Liberals promised to do the hard work of selecting an alternative, educating the people, and passing the legislation needed to change our electoral system.

Why? Because proportional representation would produce better public policy.

And, they won. Sure, only 39.5% of Canadians voted for the Liberal Party in this past election, but that gave them 54% of the seats in Parliament, and a majority government.

Then, on February 1, 2017, Mr. Trudeau published this mandate letter. It said:

A clear preference for a new electoral system, let alone a consensus, has not emerged. Furthermore, without a clear preference or a clear question, a referendum would not be in Canada’s interest. Changing the electoral system will not be in your mandate.

Mr. Trudeau did not make his promise dependent on a consensus “emerging”. This consensus did not emerge in the past 25 years. It was never going to emerge in 12 months. In his promise, he committed to doing the hard work and expending the political capital to educate Canadians and develop whatever consensus is possible. A plan to passively wait for consensus to “emerge” is no plan at all.

If lack of consensus is all it takes to stymie the Liberal agenda, I don’t understand how they are proceeding with any of their promises (60.5% of Canadian voters didn’t vote for them), or how they are selecting which promises to work on and which to walk away from.

Washington v. Trump

Today, the 9th Circuit denied President Trump’s appeal that would have reinstated his Executive Order on immigration. This opinion addressed only a preliminary question of whether the Executive Order will remain in effect while its constitutionality is fully argued in a lower court, but it reveals the trouble that the Government will have defending it.

One of the Government’s arguments was that “the President’s decisions about immigration policy, particularly when motivated by national security concerns, are
unreviewable, even if those actions potentially contravene constitutional rights and protections.”

The court rejected that argument.

There is no precedent to support this claimed unreviewability, which runs contrary to the fundamental structure of our constitutional democracy.

The Government also argued that “if the four corners of the Executive Order offer a facially legitimate and bona fide reason for it […] the court can’t look behind that.” In this argument, they were trying to prevent the court from considering statements, tweets, and interviews by President Trump and his advisors that could reveal that the Executive Order was, in part, religiously-motivated.

The court rejected that argument.

The States argue that the Executive Order violates the Establishment and Equal Protection Clauses because it was intended to disfavor Muslims. In support of this argument, the States have offered evidence of numerous statements by the President about his intent to implement a “Muslim ban” as well as evidence they claim suggests that the Executive Order was intended to be that ban, including sections 5(b) and 5(e) of the Order. It is well established that evidence of purpose beyond the face of the challenged law may be considered in evaluating Establishment and Equal Protection Clause claims.

The Government also tried to rely on “authoritative guidance” from White House counsel that the Executive Order does not affect legal permanent residents. The Government argued that the court should understand the Executive Order based on the most recent interpretation by the White House counsel. The court was concerned about the Government’s “shifting interpretation”, and rejected that argument.

Nor has the Government established that the White House counsel’s interpretation of the Executive Order is binding on all executive branch officials responsible for enforcing the Executive Order. The White House counsel is not the President, and he is not known to be in the chain of command for any of the Executive Departments. Moreover, in light of the Government’s shifting interpretations of the Executive Order, we cannot say that the current interpretation by White House counsel, even if authoritative and binding, will persist past the immediate stage of these proceedings. On this record, therefore, we cannot conclude that the Government has shown that it is “absolutely clear that the allegedly wrongful behavior could not reasonably be expected to recur.”

The most interesting part of my past week was listening, along with a hundred thousand other people, to this case’s oral argument. It was a display of the kind of work the judiciary does every day: checking whether the case should even be before the courts, probing the limits of the arguments presented by each side, and at the core, just trying to understand the case and arguments before them so they can correctly apply the law.

There is nothing better than an adversarial dispute to crystallize the meaning of a statute, the limits of Government power, or the extent of our rights. I’ve spent as much time reading appellate opinions as any other material over the past few years. It’s not because I miss first year philosophy or want to be a lawyer; it’s because they contain tough questions that reveal how the various parts of our society fit together. And, much of it is decent writing. They are written as much for us as they are for lawyers. Good journalism answers “so what”, but nothing can sub in for the opinion itself.

Here are some of the law people I’m following on Twitter who give context to significant cases and insight based on their personal experiences with the courts (and also, some entertainment).

And here are a couple of sites that present primary sources: oral argument audio, transcripts, briefs, opinions.

I haven’t found anything close to the same for Canada. But, you can search our Supreme Court’s judgements by date, topic, party, etc. here. (Try to find the one where a farmer harvested, saved, and planted Monsanto seed across 95% of his farm and then claimed he wasn’t using it.)