A trigger list for technical debt in machine learning

Technical debt doesn’t appear in neat, labelled units. It builds up slowly, insidiously—especially in machine learning systems. It starts as a feeling. Features become hard to implement because of complicated interactions and dependencies. Re-training and deploying your classifiers becomes increasingly tedious. And when you do become aware of technical debt, you’re not always in the right state of mind to start writing a list or to plan how to fix the thing.

So, to notice technical debt early, I look for it as part of a periodic review, and I use a trigger list for this.1 It’s based off of this paper (Hidden Technical Debt in Machine Learning Systems) and my personal experience. I’m not going to describe what all of these triggers mean. They don’t even have to mean the same thing to you as they do to me. But, this list helps me to recognize and itemize the technical debt in my machine learning systems.

  • The first thing you wrote
  • Documentation
  • Dead code
  • Unit tests
  • Software dependencies
  • API mess
  • Data that breaks abstraction boundaries
  • Live monitoring of performance
  • Correction cascades
  • Who is using each module’s output
  • Data dependencies
  • Legacy features2
  • Bundled features2
  • Epsilon features2
  • Correlated features2
  • Glue code
  • Pipeline jungles
  • Dead flags
  • Plain old data structures
  • Multiple languages
  • Left-over prototypes
  • Configuration debt
  • Versioning
  • Data for testing
  • Reproducibility

1. Every good productivity or planning system includes something like a weekly review. These reviews let me be sure of my direction as I navigate each week. They also clear my head so that I am free to just focus, relax, or do nothing, without any worry that there is anything else I should be doing. I’ve found this trigger list to be useful during these weekly reviews.

2. “Features” here means features, or variables, or covariates, in a model.

All the President’s Men

I just finished reading All the President’s Men: Carl Bernstein and Bob Woodward’s presentation of their Watergate journalism.

I’d barely known what Watergate was about before reading this book. While I found the who-did-what narrative to be interesting, even more interesting were the methods the reporters used to investigate it, the editorial scrutiny applied by the Post, and the reaction by the White House to the evidence closing in around them.

This was a very dense book. Skim a page and you’re lost. That was just a consequence of how scattered the information was that the reporters were tracking down. The cover-up went straight to the top—to President Nixon—but for over a year, Bernstein and Woodward were following leads so disparate that they were just puzzle pieces they couldn’t even tell belonged to the same puzzle.

With only one exception, Bernstein and Woodward made sure whatever they published was rock solid confirmed. Behind every published story, there was a pile of suspicions, leads, and questions. Washington Post’s readers, just like Bernstein and Woodward, were for a long time only seeing a small slice of the operation.

The tactics of the Executive Branch seemed familiar. Press Secretary Ziegler constantly questioned the accuracy and legitimacy of the Washington Post, accusing them of lying. The President would express “full confidence” in an aide before having to cut him loose. This happened with Stans, Chapin, Dean. All were involved with the cover-up. It was almost a tell. The White House became obsessed with the leaks to the media rather than the charges they made.

The book starts on June 17, 1972, the night of the attempted Watergate burglary. The acknowledgement is dated February 1974. Articles of impeachment wouldn’t be reported to the House of Representatives until July, and Nixon didn’t resign until August 9, 1974, more than two years after the burglary. When this book was published, Bernstein and Woodward had no idea their journalism would lead to the impeachment let alone the President’s resignation. The final sentence in the book quotes Nixon: “And I want you to know that I have no intention whatever of ever walking away from the job that the people elected me to do for the people of the United States.”

I read this book over a period of a few weeks. Despite the very compressed timeline, the buildup still felt more like a ramp than a roller-coaster. Knowing the ending helped pull me through this story. I’m going to watch the movie next, and am curious about how it manages to preserve that pace and the feeling that the incremental revelations aren’t actually getting anywhere.

Classic style

I’ve been studying a writing style called classic style. It’s different than the plain style and practical style (my default styles) that are presented by Strunk and White and Joseph Williams.

I came across a paragraph that I thought would be interesting to rewrite in this style.

The original:

Recent advances in deep learning have made it possible to extract high-level features from raw sensory data, leading to breakthroughs in computer vision [11, 22, 16] and speech recognition [6, 7]. These methods utilise a range of neural network architectures, including convolutional networks, multilayer perceptrons, restricted Boltzmann machines and recurrent neural networks, and have exploited both supervised and unsupervised learning. It seems natural to ask whether similar techniques could also be beneficial for RL with sensory data.

Rewritten:

Convolutional networks, multilayer perceptrons, restricted Boltzman machines, and recurrent neural networks are now all being used to extract high-level features from raw sensory data and have lead to state-of-the-art performance in computer vision and speech recognition. [6, 7, 11, 16, 22] Can we also use these methods for reinforcement learning with sensory data?

Why do I think this is more like classic style than the original?

  • I direct the reader’s attention towards a truth that I want to present. The original talks about advances, which lead to breakthroughs, then methods (are those the advances or the breakthroughs?), then architectures, and techniques. I’m not sure what the object of attention is supposed to be. I focus the reader on a list of methods and what they’ve been used for. I then ask whether we can use those methods for a new task.
  • I rely on nuance, so that the subordinate message does not obscure the main object of attention. I don’t want to focus the reader on history, or how recently researchers have discovered how to use these methods. I have chosen to wrap up the concept of recency in a single word: “now”. Classic style takes the stance that the reader is intelligent and interested. The reader can infer that these methods weren’t all being used some time before “now” and can look to the references if they want more precise dates.
  • To keep the focus on the list of methods, I use the passive voice in the opening sentence.
  • I’ve tried to make the nouns concrete things rather than abstract concepts. Instead of advances and breakthroughs: convolutional networks and features. This is more a feature of plain style, though.
  • It hides the effort. I thought about this for over a day, and had several false starts, but I don’t think any of that comes through in the final product.

There are surely other improvements and alternatives. If you identify a different truth, you’ll write a different paragraph. I haven’t thought about how this paragraph links up with its neighbors. There are also entirely other styles that you could use. How would you rewrite this?

More info:

Learning TensorFlow

Over the past two weeks, I’ve been teaching myself TensorFlow, Google’s open source library for deep neural network (actually, graph computation in general).

It was so easy to get started with TensorFlow that I was fooled into thinking I’d be writing an character-based recurrent-neural-network language model in a couple of days.

The TensorFlow website gives a few learning paths: Get Started, Programmer’s Guide, and Tutorials. These were written for various versions of the API and they don’t use consistent idioms or up-to-date functions. Regardless, I found these useful to go through to give me a sense of what it would be like to use TensorFlow.

After going through a few tutorials, I made the following learning plan and now feel comfortable defining and training non-distributed models in TensorFlow:

  • Create a simple single-layer network to try to learn a function from random data. This teaches how graph definition is separate from running the computation in a session, and how to feed data into placeholder input variables.
  • Output some summary data and use TensorBoard to visualize that the loss doesn’t decrease.
  • Create some synthetic data for a simple function. I used y = x[0] < x[1]. This just lets you confirm that the loss actually decreases during training. You can also visualize the weights as they change during training.
  • Replace the synthetic data with data that is loaded from file using an input queue. Input queues were the most confusing part of TensorFlow so far. Here is a minimal example of an input queue that reads records from a file and creates shuffled batches. One thing that made this more confusing than necessary was that I was using an input queue to feed a model that was being evaluated ridiculously fast. TensorBoard was telling me that the shuffle_batch queue was never getting filled up. But, this was only because my simple model was being evaluated too quickly during the optimization step. Once I increased the complexity of the model by adding a few more fully-connected layers, the optimization step took long enough for the queue to actually be helpful.

The MonitoredTrainingSession is very helpful. It initializes variables, watches for stopping criteria, saves checkpoints and summaries, restarts from checkpoint files if training gets interrupted.

My first real TensorFlow model was a char-rnn (used to model text by predicting the next character based on the previous sequence of characters). The part of the TensorFlow API that deals with recursive neural networks has changed a lot over the past year, so various examples you’ll find online present different ways of doing things.

  • TensorFlow’s own tutorial does not use tf.nn.dynamic_rnn to create the recurrent neural network based on a prototype cell. Instead, they show an example that explicitly codes the loop over timesteps and explicitly handles the recurrent state between calls to the prototype cell.
  • This blog post by Denny Britz is a good explanation of how to use dynamic_rnn to avoid having to do all of that by hand. It mentions a helpful function: sequence_loss_by_example, but that appears to have been superseded by sequence_loss.
  • This blog post by Danijar Hafner is second example showing how to use dynamic_rnn. It also shows how to flatten the outputs from the recurrent cell across timesteps so that you can easily apply the weights used for the output projection. However, this example doesn’t take advantage of the sequence_loss function and instead computes the sequence labelling cost by doing a partial summation and then averaging.

My main point is: don’t assume you’ve misunderstood something when you can’t reconcile two different examples that claim to demonstrate the same thing. It’s likely just an API change.

My own example is here. It’s not perfect either. I’m not passing state from the end of one batch to the beginning of the next batch, so this isn’t standard truncated back-propagation through time. But, the dataset I’m learning on doesn’t appear to have dependences that are longer than the length that I chose for input sequences. R2RT discusses the distinctions between a couple of different styles of back-propagation through time. The approach I ended up implementing is almost what R2RT is calling “TensorFlow style”.

Further, wasn’t thinking ahead to how I would load the trained weights for sampling when I wrote the training script. Instead, I redefined parts of the model structure in my sampling script. This is not good. A better approach is to define the graph structure in a class (like in this example). This lets you use the exact same model during evaluation/sampling as was used during training, which is important for matching the saved weights to their variables based on their keys (names).

If you’ve already been using TensorFlow for some time, I’d appreciate any feedback you have for me on my early TensorFlow code that I’ve posted on GitHub. Are there TensorFlow design patterns I’m missing, or helper functions I don’t know about? Let me know!

Meanwhile, back in Canada

During the 2015 Federal Election campaign, Mr. Trudeau promised to end first-past-the-post elections in Canada. We voted with the understanding that a Liberal victory would mean the end of first-past-the-post elections. The Liberals won.

Mr. Trudeau and the Liberals made this promise knowing that Canadians are not united against first-past-the-post, nor united around a particular alternative.

Screen Shot 2017-02-19 at 2.15.06 PM.png

In 2005, single-transferrable-vote (STV) was put before British Columbians in a referendum, and 43% voted for the status quo. STV did not reach 60% support and was not adopted.

In 2008, BC held another referendum in which 60% voted for the status quo; 40% for STV.

A survey conducted by the Broadbent Institute just after the 2015 election found that “44% of Canadians prefer one of the proportional voting systems while 43% prefer the status quo, the single member plurality system.” This is consistent with previous surveys and historical data.

Despite the lack of clear consensus for a concrete alternative, the Liberals promised to do the hard work of selecting an alternative, educating the people, and passing the legislation needed to change our electoral system.

Why? Because proportional representation would produce better public policy.

And, they won. Sure, only 39.5% of Canadians voted for the Liberal Party in this past election, but that gave them 54% of the seats in Parliament, and a majority government.

Then, on February 1, 2017, Mr. Trudeau published this mandate letter. It said:

A clear preference for a new electoral system, let alone a consensus, has not emerged. Furthermore, without a clear preference or a clear question, a referendum would not be in Canada’s interest. Changing the electoral system will not be in your mandate.

Mr. Trudeau did not make his promise dependent on a consensus “emerging”. This consensus did not emerge in the past 25 years. It was never going to emerge in 12 months. In his promise, he committed to doing the hard work and expending the political capital to educate Canadians and develop whatever consensus is possible. A plan to passively wait for consensus to “emerge” is no plan at all.

If lack of consensus is all it takes to stymie the Liberal agenda, I don’t understand how they are proceeding with any of their promises (60.5% of Canadian voters didn’t vote for them), or how they are selecting which promises to work on and which to walk away from.

Washington v. Trump

Today, the 9th Circuit denied President Trump’s appeal that would have reinstated his Executive Order on immigration. This opinion addressed only a preliminary question of whether the Executive Order will remain in effect while its constitutionality is fully argued in a lower court, but it reveals the trouble that the Government will have defending it.

One of the Government’s arguments was that “the President’s decisions about immigration policy, particularly when motivated by national security concerns, are
unreviewable, even if those actions potentially contravene constitutional rights and protections.”

The court rejected that argument.

There is no precedent to support this claimed unreviewability, which runs contrary to the fundamental structure of our constitutional democracy.

The Government also argued that “if the four corners of the Executive Order offer a facially legitimate and bona fide reason for it […] the court can’t look behind that.” In this argument, they were trying to prevent the court from considering statements, tweets, and interviews by President Trump and his advisors that could reveal that the Executive Order was, in part, religiously-motivated.

The court rejected that argument.

The States argue that the Executive Order violates the Establishment and Equal Protection Clauses because it was intended to disfavor Muslims. In support of this argument, the States have offered evidence of numerous statements by the President about his intent to implement a “Muslim ban” as well as evidence they claim suggests that the Executive Order was intended to be that ban, including sections 5(b) and 5(e) of the Order. It is well established that evidence of purpose beyond the face of the challenged law may be considered in evaluating Establishment and Equal Protection Clause claims.

The Government also tried to rely on “authoritative guidance” from White House counsel that the Executive Order does not affect legal permanent residents. The Government argued that the court should understand the Executive Order based on the most recent interpretation by the White House counsel. The court was concerned about the Government’s “shifting interpretation”, and rejected that argument.

Nor has the Government established that the White House counsel’s interpretation of the Executive Order is binding on all executive branch officials responsible for enforcing the Executive Order. The White House counsel is not the President, and he is not known to be in the chain of command for any of the Executive Departments. Moreover, in light of the Government’s shifting interpretations of the Executive Order, we cannot say that the current interpretation by White House counsel, even if authoritative and binding, will persist past the immediate stage of these proceedings. On this record, therefore, we cannot conclude that the Government has shown that it is “absolutely clear that the allegedly wrongful behavior could not reasonably be expected to recur.”

The most interesting part of my past week was listening, along with a hundred thousand other people, to this case’s oral argument. It was a display of the kind of work the judiciary does every day: checking whether the case should even be before the courts, probing the limits of the arguments presented by each side, and at the core, just trying to understand the case and arguments before them so they can correctly apply the law.

There is nothing better than an adversarial dispute to crystallize the meaning of a statute, the limits of Government power, or the extent of our rights. I’ve spent as much time reading appellate opinions as any other material over the past few years. It’s not because I miss first year philosophy or want to be a lawyer; it’s because they contain tough questions that reveal how the various parts of our society fit together. And, much of it is decent writing. They are written as much for us as they are for lawyers. Good journalism answers “so what”, but nothing can sub in for the opinion itself.

Here are some of the law people I’m following on Twitter who give context to significant cases and insight based on their personal experiences with the courts (and also, some entertainment).

And here are a couple of sites that present primary sources: oral argument audio, transcripts, briefs, opinions.

I haven’t found anything close to the same for Canada. But, you can search our Supreme Court’s judgements by date, topic, party, etc. here. (Try to find the one where a farmer harvested, saved, and planted Monsanto seed across 95% of his farm and then claimed he wasn’t using it.)

Washington, D.C.

January 16, Martin Luther King Jr. Day

With thousands of others who waited hours for free tickets, I got to see Gladys Knight and the Let Freedom Ring Choir perform at the Kennedy Center. You can watch the event here. It was a joy-filled celebration of a man, a movement, and imperfect, incomplete success.

In this temple, as in the hearts of the people, for whom he saved the union, the memory of Abraham Lincoln is enshrined forever.

After the concert, I walked to the Lincoln Memorial. I read the words above his head with a comma: “the people, for whom he saved the union, …” Without the comma, it reads, “in the hearts of the people for whom he saved the union”, connoting that he saved the union only for some people. He saved the union for all people, at least a more expansive concept of people than at the outset of the union.

20170116_204300

January 17, Visit to the Capitol and the Library of Congress

My visit to D.C. let me see the people and institutions that might check executive power. The ten-minute pro-forma session was an example of that. It was a reminder that people trust in the power of their institutions. The Senate would not have held many of its pro-forma sessions except that they prevent the President from making recess appointments.

Next was the Library of Congress.

6322480339_39454fdff0_o

I cannot live without books, but fewer will suffice…

Before I spent some time in the Reading Room, I visited Thomas Jefferson’s library. Jefferson offered to Congress his entire collection after the Library of Congress was largely destroyed in the 1814 burning of Washington. He numbered them, and arranged them by subject. It was filled with history, fiction, science, politics, religion, law, literary criticism, math… He had a Koran. Anticipating that Congress might think this collection too diverse, he wrote to them: “there is in fact no subject to which a member of Congress may not have occasion to refer.”

January 18, United States Supreme Court

20170118_042529.jpg

I chose to attend the oral argument of Lee v. Tam. It was a First Amendment case, it had a sympathetic plaintiff (Mr. Tam and the Slants), and the outcome will almost certainly determine the outcome for the “Redskins” trademark. A couple of people told me that I should arrive at 4am or even 3am to be guaranteed a spot in the audience. I arrived at 3:08 and was 8th in line, behind mostly line-holders who looked like they had been there overnight. By 4am, the line was past capacity.

At the front of the line, I was surrounded with people very familiar with the case: family of the attorney who would argue the case for Mr. Tam, an author of an NSFW amicus brief from the CATO institute,  somebody close to the legal team for Pro-Football, Inc…

The First Amendment protection of speech is an important check on the government. Expressive speech comes in many forms: journalism, literature, comedy, music. Today, much of this expression takes place in the commercial sphere. Sometimes, a speaker is emboldened to choose a particular message because they can trust in the protection of trademark law to secure exclusive use of that message as an indicator of their goods or services. The government has decided to withhold from a certain category of speech (speech that disparages) the special protections that trademark registration brings.

This case asks: is this kind of restriction a burden on speech, is that burden is viewpoint-based, and if so, is it nonetheless justified because of the purpose of the government’s trademark registration program?

The session started promptly at 10am with four minutes spent admitting attorneys from around the country to the Supreme Court Bar. Then, from Chief Justice Roberts: “Justice Sotomayor has our opinion this morning…” — I had already seen on SCOTUSBlog’s calendar that there would be an opinion announced today, so this wasn’t a complete surprise, but these don’t happen every session, and you never know what opinion will be presented — “…in case No. 14-1055, Lightfoot versus Cendant Mortgage Corporation.” I was unfamiliar with this case. She read her prepared opinion summary. Turns out that federal courts don’t have automatic jurisdiction over cases that happen to involve Fannie Mae. Who knew?

Chief Justice Roberts then introduced the first case, “We’ll hear argument first this morning in case No. 15-1293, Lee versus Tam.” You can listen to the argument here. Counsel for Mr. Tam and the Slants did not have a good day. Here are some excerpts:

A third of the audience departed after Lee v. Tam, so the room was much less full for the second case, Ziglar v. Abassi. This case relates to how hundreds of middle eastern men were detained and treated in the weeks and months after 9/11. The main question before the court was: can the men who were detained sue then Attorney General John Ashcroft (and others responsible for the detention and conditions) in his personal capacity?

Only six justices heard the case, the minimum allowed. Justices Kagan and Sotomayor had each recused themselves from the already shorthanded court. This case has been around so long that they each probably worked on it in some capacity before their appointments to the Supreme Court.

It was very well argued by Ms. Merropol (grandchild of Ethel and Julius Rosenberg). She took every opportunity to turn the argument back to points she wanted to emphasize or clarify. She was prepared for every question and handled hypotheticals with consistency. The oral argument audio is here.

This was also Mr. Gershengorn’s final argument as Acting Solicitor General for President Obama.

20170118_121827.jpg

January 19, Museum Day

NPR HQ! They were not hosting tours that day, but I still got to see their history exhibit and gift shop, and now I have a new mug.

The Smithsonian National Museum of African American History and Culture (that needs a shorter name) was out of passes, so I spent the afternoon in the Natural History Museum and then wandered the National Mall amongst the people who had already arrived for the inauguration the next day. The day-before-the-inauguration vibe was mostly one of spectacle and people watching people.