The purpose of this post is to take a good-natured jab, or two, at some friends over at the Centre For Computational and Animal Learning Research in the context of presenting connectionist-network technology. I recently received an email from them advertising a free simulator for a connectionist model developed by Nestor Schmajuk (Smajuck, Lam, & Gray, 1996). Being the nerd that I can be, I was very excited. I know Eduardo to be a very intelligent and creative individual.(Caveat: Unbeknown to me at the time of posting, Eduardo was *not* the author of the SLG simulation available at their site. Their site serves as a repository for the work of any author, not just their own work).
I have always had an interest in connectionist modeling. Though I have not published in that area I did speak on the topic at Yale in 1999. These models are mathematical descriptions of the interconnections of inputs to outputs. These connections serve to map a pattern of inputs onto a pattern of outputs. From the perspective of a learning theorist, these inputs would be environmental events such as lights & tones (putative CSs) and outputs would be other environmental events that co-occur with the inputs in some way (e.g., other lights or tones, or shocks, or food pellets etc.). By mapping these inputs onto the outputs, the system is able to take an input and predict an output.
The interest a learning theorist has in these models should be, initially, how well the model’s predictions match data from experiments with humans and animals. Then, that interest should turn to how well the organization and structure of the model can map onto theoretical constructs and serve not only as a descriptive framework, but also as an explanatory framework, for understanding Learning.
Anyhow, I am beginning to digress: Back to the presentation of connectionist-network technology and the ideas embedded within it, and taking a jab at my friends.
Learning theorists often dabble in computer programming, as it is a necessary evil. Knowledge of programming is generally required to control laboratory equipment, create methods of study (e.g., The Learning Game), and to do things like examine the function and accuracy of published models, such as that referenced above. Typically such programs are bare-bones works that function in a DOS environment (or Windows Console as it is called today). Much functionality is typically sacrificed, presumably, because we want to get to the conclusion or result as quickly as possible. I suspect, though, that much functionality is sacrificed simply because Windows programming appears to be just too much work. The presentation of the connectionist networks has always suffered, at least within the confines of associative learning.
Thus, I was very excited to see a simulation program coming from the Centre For Computational and Animal Learning Research., as Eduardo Alonso has an extensive background in computers. I eagerly downloaded the software, ran the program and … I found…
I was expected to enter commands and view text output, which could be copied, cut, and pasted into other applications. Ok – granted, programming in Windows could be a pain– 20 years ago. But nowadays, programming in Windows has been made so simple that it really is a technology for the masses. The type of program above only does the number crunching for the user. Getting those numbers and then presenting them in a way to begin to make sense of them is pretty much left as an exercise for the user. The elegance of the model is entirely hidden. Given the Rescorla-Wagner simulator found on the same site, with its useful front end and graphing capability, I expected something more here.
Sometime in the early 1990s I began working on my Preliminary Examination materials for Ph.D candidacy. For that project, I undertook the writing of a paper (never published) devoted to describing the role of connectionist modeling in learning theory at the time. High-minded work for a graduate student, but I believe it came out well. I passed.
I’ll not pretend. I am not math savvy. Thus, rather than look at a function or function set and understand what was going on, I had to see it in action. To achieve that goal I had to program these models myself t0 to “see” them in operation. I had to get creative about displaying what the numbers represented.
At the moment, I cannot get those old dos programs to run under Windows 7, or I would put up a screenshot of them in all their VESA glory. I can bypass that problem, but would need to re-compile the original code and I cannot find a suitable old turbo C++ compiler. I can’t remember the version I used, but neither Turbo C++ 3, nor 4.5, will do it accurately.
Success! The very knowledgeable and kind people over at http://vogons.zetafleet.com/ were able to provide me with what I needed to get the program going under dosbox. Here’s what it looked like in 1993-1994. I believe it to be even more attractive than the modern version.
When I received the invitation to speak at Yale that I mentioned earlier, I thought that the programs would be good tools through which to corroborate my conclusions. Thus, I re-wrote the programs in 1998 to be a single program. Along the way I upgraded them to function through Windows (using Borland C++ Builder V1) and ported all the graphics routines to OpenGL.
A screenshot of the program is below.
The menu bar is shown here.
Model: Selected between Backpropagation, Gluck & Meyers’ (1993) autoencoder, or Schmajuck & DiCarlo’s 1991 version of the hippocampus.
Hidden Units: Selected the number of hidden units desired for the selected model.
Learning Rates: Set the free parameters of the selected model.
Training: Selected a text file with the training input sequences.
Displays: Toggled the three displays (described later).
Context: Set how contextual stimuli should be used.
Bias: Toggled whether hidden units have bias input.
Run: Ran the simulation
Output Start: Determined how the output was initially set.
Link Displays: Linked the three displays so that the output of one display was limited by the output of another.
Show ITI: Showed responses in the absence of input
End Points: Determined whether all trials were shown in the 3d display (described later), or only the current trial.
The top panel showed the output of the model for each trial and trial type. Trial types were coded and corresponded to a legend presented in the left of the window. The legend isn’t really very visible here.
The next display showed the network itself.
The spheres, from bottom to top, represented inputs, hidden units, and the output. The colored connections represented the connection weights. Red connections were positive, blue ones were negative, and their brightness was a function of their absolute magnitude. This display could be animated in real time so that you could watch the model learn. The brightness of the spheres also corresponded to their degree of activation. The window displayed the trial number (e.g., 432), what was presented on that trial (e.g., A:AB- stimuli A and B in context A:, presented without an outcome). On the right the user could enter a trial number and skip to that trial.
The final display was what I think was my best invention, and perhaps the most informative for understanding how the various models work.
It was a 3d representation that could be fully rotated about any axis. Each hidden unit could take on an activation between 0 and 1. So, in the cases where the models had up to 3 hidden units, I let the activation of unit 1 represent an X axis space, the activation of unit 2 represent a Y axis space, and unit 3 a Z axis space and simply plotted, on each trial, the point (end of the vector) represented by the hidden units. Each point was color coded with respect to the legend presented in the earlier display so that trial types might be identified. With this display, you could immediately see how inputs were represented within the model across the hidden units. There was, and is, much that can be learned from that display.
But, the point of this post was not to discuss what I have learned about the models, but to make a point about the state of the technology through which these models are presented (or at least where I think it should be). Based on the command-line, text-based, technology I just downloaded, I’d say its at a level below where it should be.
The software I’m showing here was written 12 years ago, and is based on nearly equivalent software I wrote before obtaining my Ph.D approximately 17 years ago.
If the Centre For Computational and Animal Learning Research wants to make simulation tools available in a way that has a meaningful impact, then they might consider stepping up the functionality of their software a few notches (I should rather have said that those posting their simulations at the CCALR should step it up a notch).
Minimally, lets move out of the DOS ages so that we can at least browse an input directory and specify a save-to file.
Those of you who know me know I’m more stubborn than clever. I’m not a programmer by training, and I wrote this program with a 4-course courseload per academic semester and about 100 academic undergraduate advisees. Those facts are to elaborate the point that the program was not a full-time project If a “hack” like me with just a bit of study into the relevant API’s can do a program like what is presented above,someone more versed and capable than I should produce true wonders to behold.
I guess opinions may vary, but when I see what are to be state-of –the-art ideas presented in a command-line DOS box, well, I just feel I can’t be bothered (insert smug look and a wink here).
If you have read this far through my arrogant tirade, well… thank you. If you have any interest in what I’ve posted here, what I’ve learned about networks which I have not posted, feel free to contact me.