I saw Grady’s tweet today, and I read the article it referenced; and it got me to thinking. Is the training of a neural net a kind of programming? It is the kind of activity that an agile team could plan, estimate, and execute in iterations? Or is it something entirely different?
Before I try to answer that question, let’s set our definitions. What is programming?
Alan Turing defined computer programming in his 1936 paper. Since that time we have used his definition without significant alteration. According to Turing a computer program is a sequence of simple statements of the following form:
Given that the system is in state
S1
,
When eventE
occurs,
Then set the system to stateS2
and invoke behaviorB
.
This statement is very obviously a transition of a finite state machine (FSM). And, indeed, every program ever written can be reduced to a linear sequence of such statements. We can, in fact, claim that the act of computer programming is simply the enumeration of all the transitions within the finite state machine that expresses the behavior desired by the programmer.
Of course we don’t often use FSM language to write our code. We use, instead, the language of Dijkstra: Sequence, Selection, and Iteration. However, it is not very difficult to prove that these two forms of expression are equivalent.
As a non-mathematical proof consider that the format of the transition above uses the same Given/When/Then triplet that has become so popular for the specification of behavior in Behavior Driven Development (BDD). One could, therefore, make the claim that BDD is simply the act of specifying the transitions of the state machine that the programmers will implement using Sequence, Selection, and Iteration.
As further proof we could look at the discipline of Test Driven Development (TDD) in which no production code can be written unless it is to satisfy a failing unit test.
Those unit tests always follow the pattern known as Arrange, Act, and Assert (AAA). First we arrange the system into the state appropriate for the test. Then we Act upon the system. Then we Assert that the results of that action were correct.
If you think carefully about this you will realize that AAA is simply an alternative expression for the Given/When/Then of BDD.
Given that we have arranged the system into the testable state,
When we act upon that system,
Then we expect our assertions to pass.
And, of course, that means that TDD is just a way to fully enumerate all the transitions of the FSM that expresses the desired behavior of the program we are writing.
So it should be no surprise that the act of programming a computer is the act of defining a finite state machine. What programmers do is to define the necessary states, events, and behaviors. Every computer program, no matter how simple or complex, is nothing more than a network of states, transitions, and behaviors.
And this is true, even if the program is a neural net.
Our Expectations are high.
The latest buzz words are things like Machine Learning, and Deep Learning. These terms represent the massive problem of designing and training a neural network. But if a neural network is really just an FSM, then why do we have to train it? Why don’t we simply specify all the states and transitions like any programmer would?
The answer is that we don’t know all the states and transitions. We have no way to enumerate them all. And so we expect the training of the neural network to cause it to infer those states and transitions.
Actually, that’s not quite right. The neural network may not know what state it is in. It may also not know what events it is receiving. It may only know the states and events in a statistical sense. And we expect it to infer the correct behavior from those statistics.
I can explain this by using a neural network that you are intimately familiar with – your brain.
Imagine that you are driving a car down a highway at a time when there’s very little traffic. What you see ahead of you is empty road. Your speed is stable. The weather is nice. The Sun is shining above. You know your state. You see the road curve ahead. You realize that, given your current state, you should adjust the angle of the wheel ever so slightly. This puts you in a new state; and you know what state that is. Your behavior is almost entirely deterministic. You could write the FSM.
Now imagine that that you are driving on a busy city street in the rain. There are cars jockeying for position around you. Traffic lights, one-way streets, construction, pedestrians, scooters etc. The amount of data you must process on a second by second basis is daunting. You must keep looking left and right, checking the car ahead, the car behind, the traffic lights, and signs. You never really know what state you are in. You have a statistical feel for your state. You can assign a probability to the states you might be in. The car in front of you lurches for some unexpected reason. You quickly increase pressure on your brake. “No,” you decide, “it wasn’t stopping.. It just hit a pothole.”
Somehow, in the chaos of all that conflicting data, you are expected to make good driving decisions, and execute concrete behaviors. It is your job, as the neural net in charge of that car, to extract and express order from chaos. No programmer could write the FSM to do this.
Training
So how do you train a neural net? The simple answer to that question is that you present the neural net with a sequence of events, and then “reward” or “punish” its behavior. In this case, the words “reward” and “punish” amount to incrementing or decrementing an “appropriateness” value. The neural net learns to maximize “appropriateness” based on current and previous events. In effect, the neural net builds up within it’s memory a fuzzy finite state machine. I say, in effect, because you could not find that FSM in the memory of the neural net. It is hidden within the variables and coefficients, and interconnections of the program. It’s there; but it’s encoded in a way that is very hard to decode.
So, to address Grady’s question. Is training a neural net programming? Is training a neural net a story, or a set of stories, for an agile team to prioritize and execute? Can you estimate how long it will take to train a neural net?
The answer to that, at least at the moment, is clearly “No”. Training neural nets is something of a black art. There may be a science behind it; but that science is not particularly rigorous yet.
Training a neural net is nothing like programming. It is not the enumeration of transitions into a finite state machine. Rather, it is an attempt to find enough events to present to the neural network, and a corresponding attempt to measure how appropriately the neural net behaves. And it is that last measurement that is the most fraught with uncertainty and danger.
Expectations too high?
Are we going to have self driving cars pretty soon? Are we going to see true machine intelligence in the near future? Will autonomous taxis be driving us from arbitrary point to arbitrary point within the decade?
Well, let’s do a little math.
The human brain has about 1E14 neurons. Each neuron connects to around 1E3 other neurons. If we consider each connection to roughly correspond to a bit of memory; then our brains have on the order of 1E17 bits of memory. That corresponds roughly to 10 petabytes of RAM. Don’t confuse this with disk or SSD. We’re talking about instant access memory. RAM.
On top of that each of those 1E17 connections can switch every 10ms or so. That means that the human brain can process around 1E19 bits per second. My laptop, with 4 cores running at 2.8ghz, and a 64 bit buss can process only 1E15 bits per second. So the human brain can outperform my laptop by four orders of magnitude in sheer processing power.
To mimic the processing capability of the human brain, I would need 10,000 apple powerbook laptops that were intimately connected (no, wifi wouldn’t do) to each other, and 10 petabytes of RAM. That is not a hardware technology that we currently possess.
Indeed, the structure of our computers is so vastly different from the structure of the brain, that making the comparison is laughable. The human brain is not a computer in any sense that we would recognize the word. It does not have addresses, it does not have instructions, it does not have stacks, integers, floating point numbers, busses, master clocks, or anything else we would recognize as part of a computer. In fact, the human brain is not even digital. It is an analog data processor composed of 100 billion components that are massively interconnected.
So, no. We are not going to see Hal 9000 any time soon. Johnny #5 will not soon be alive. Dialing MycroftXXX on your phone will not connect you with Mike, the leader of the revolution. Collossus and Guardian are not going to co-conspire to take over the world. Don’t expect to meet Neo in the Matrix. Don’t expect Kit to speak out of the radio of your car. Our computers are simply not going to wake up any time soon.
Autonomous Taxis
But can our computers drive cars? Can we make neural nets powerful enough to safely drive human passengers from any arbitrary point, to any other arbitrary point?
Let’s do a little more math. My brain weighs about 1 pound. Since it is 10,000 times more powerful than my laptop, I infer that if my laptop were made out of neurons, it would weight 0.0001 pounds, or about a twentieth of a gram. This is roughly the mass of the brain of a honeybee.
Actually, the million neurons in the honeybee’s brain are packed ten times more densely than ours, and operate much faster than ours. Which means the honeybee’s brain is significantly more powerful than my laptop.
So, do you believe that you could train a honeybee’s brain to drive your car? Safely? Legally? From any arbitrary point, to any other arbitrary point?
Perhaps that’s not a fair question. So let’s multiply that brain size by a factor of 600. A cat’s brain weighs 30 grams. It would require 600 of my laptops, intimately connected to each other and to terabytes of RAM in order to mimic a cat’s brain. Do you believe you could train your cat to drive you from place to place? Safely?
Or is it possible, that in order to properly train a neural net to safely drive a car, a neural net with the capabilities of the human brain will be required?