The module is life
The module is love
It gets the work finished
When push comes to shove
The module is the base unit of computation in our simulation system. You can think of it as an individual block in an application like LabView. The module serves as a black box that encapsulates a certain functionality. The functionality could be as simple as adding two numbers together, or as complex as running a machine learning algorithm to find clusters among thousands of data points. It all depends on what function you stick inside the module.
This is what a module looks like:
- Any number of inputs (0 - ∞)
- Any number of outputs (0 - ∞)
- A “Lambda” function that operates on inputs and spits out outputs
Modules can be hooked up to each other via their input and output Terminals. Only connections from an output terminal to an input terminal are valid. A single output terminal can be connected to multiple other input terminals via wires, but an input terminal can have only one wire connected to it, otherwise values at that terminal are undefined. Think of modules as virtual circuit board components connected by wires.
We can connect terminals by calling:
This sets up a “wire” between the two terminals. When the output terminal’s value is updated, that value is pushed to the input terminal.
Mechanically, an input terminal’s value can ONLY be updated by an upstream output terminal (the value update method is private). This makes sense because changing the value at an input terminal directly, that is also connected to an upstream output terminal, is nonsensical. The upstream output terminal and the input terminal would have different values.
So when two terminals are connected, the input terminal sets up an event listener listening for change on the output terminal, and updating its own value when it sees change upstream. When terminals are disconnected, that event listener is killed.
Terminals also have Default Values. The default value is much more relevant for unconnected input terminals. An unconnected input terminal acts much like a constant value in the system, because its value is never changed by an upstream terminal. The default value is the value used by the unconnected input terminal. If the input terminal is subsequently connected to an upstream output terminal, the input terminal will take the output terminal’s value and update itself accordingly. But if the terminals are disconnected again, the input terminal will resume having its default value.
You can easily create a bunch of modules, stick different lambdas in them, connect up their terminals, and then run the whole system and see what happens. Unfortunately, this structure is not easily persisted. Module and lambda instances are not intrinsically serializable, nor does it make sense to try to store them directly. What we really want to be able to do is store a representation of the connection graph of modules, so that we can load it up at a later time and rebuild the exact same module structure.
We call this data structure a Blueprint. And now is the perfect time to talk about the composite nature of modules, as it is directly tied to the structure of blueprints.
In building the modules system, we needed a way to abstract away “layers” of functionality. For instance, we may want to have a module that takes 3 inputs, adds up the first 2, then multiplies by the last one ((a + b) * c). If we have two basic lambda types, an add and a multiply, what we can do is create a new “Lambda” that operates the way we want it to:
What we have done here is construct a single module from 2 inner modules, and 5 wires. We can then use the outer module as a (a + b) * c module without even thinking about the mechanics of what goes on inside.
You can imagine that this idea of compositing modules, putting modules inside of other modules, can be extended to extremely complex circuits that have layers within layers within layers of modules. This design gives great flexibility to designers because they can abstract away parts of their simulation system, and distribute the work of designing each part of the simulation.
The structure that you see in the above module is represented and stored as a Blueprint. The blueprint above could be represented as follows:
- ADD:Output 1 → MULT:Input 1
- SELF:Input 1 → ADD:Input 1
- SELF:Input 2 → ADD:Input 2
- SELF:Input 3 → MULT:Input 2
- MULT:Output 1 → SELF:Output 1
You can see how the above data representation fully describes the inside of the above module, and can be used to recreate the structure above.
Wire 1 describes a wire from the output of ADD to the first input of MULT. Notice how we use “SELF” to denote the containing (outer) module. Wire 2 describes a wire from the first input of SELF to the first input of ADD. Here’s where things get a little sticky. Doesn’t this imply a connection FROM an Input, TO another Input? Isn’t that impossible given the definition of how terminals can be connected? Yes. In fact, Wires 2-5 above are contradictions of the terminal connection rules we gave.
Semantically, though, in the case of composited modules, we want to connect inputs from the outer module to inputs of inner modules. That makes sense. I will describe in a later section how this problem is solved. For now assume it works.
Mechanically, a blueprint represents either a Lambda function, or a connection graph of other blueprints. The Lambda Blueprint can be thought of as the leaf of the recursive tree of blueprints, and the composite Blueprint, which holds blueprints, is the node in the tree.
The Lambda represents the functional unit in the modules system. Abstractly, a lambda is simply a function that takes a set of named input values, and returns a set of named output values. In our system, a Lambda is any class that extends from the base abstract ALambda class. The simplest of lambda functions could technically take no inputs and just output values to a single output, or do simple mathematical operations on inputs. The most complex lambda functions could actually hold module instances and run them, which effectively makes them Composite Lambdas.
Because a lambda is anything that extends from ALambda, lambdas can be implemented however the designer wishes. Lambdas can store state, and can really do anything any C# class can do. It is nice, however, to keep Lambdas as stateless functional units.
When a module is instantiated with a lambda, the module holds an instance of that lambda. Whenever the module is told to Run, it loads up the values from its input terminals, packages them up to send to the lambda function, gets the output values from the lambda function, and pushes those outputs onto its output terminals.
Composite Lambdas are instantiated from Composite Blueprints. Upon instantiation, the lambda builds up all of the modules dictated by the blueprint, and sets up all of the wire connections dictated by the blueprint. When the lambda is told to run given a set of inputs, it pushes the inputs into the system of modules that it holds, runs all of the modules, then pulls the appropriate outputs out of the modules and returns them.
I mentioned earlier the contradiction that an input to a composite module can be hooked up to the input of an inner module. This problem is solved using an Inverted Self Module inside the composite module.
The Inverted Self module is invisible to the user, but solves the problem of connecting inputs to inputs and outputs to outputs.
When the outer module is Run, the values from its outer inputs are pushed onto the Inverted Self’s outputs, which are then propagated properly through the system. When all the inner modules are finished running, the values that we wish to use as output are at the inputs of the Inverted Self. So we pull the values from the inputs of the Inverted self and push them on to the output terminals of the outer module.
The Inverted Self module itself is a No-Op since it has no real functionality. It is only really being used to host the inverted input and output terminals.
Plans dictate how modules are initialized. Any good simulation needs to start with initial values that correspond to the starting point of the simulation. These starting values are described in a Plan, which basically holds a bunch of initial Configs (packages of starting values) for each module in the system.
Plans correspond to certain Blueprints, and there is a One-to-Many relationship between Blueprints and Plans. A module system that has been built up from a given Blueprint can be initialized with any one of the set of plans that correspond to that blueprint.
When a top-level module is initialized with a Plan, it passes that plan down into all of its submodules and sub-lambdas to initialize accordingly.
One of the most important features of the modules system is timing and synchronization. In our warmup project we built an entirely asynchronous modules system where values were pushed from terminal to terminal as soon as they were available. This leads to many problems when you build a system that has feedback loops, as many economic models do. In the asynchronous system, feedback loops would cause chaotic propagation of values and due to the race conditions of an asynchronous system, results were unpredictable. We needed to build a modules system that would run predictably and functioned much like a real synchronized circuit board would.
We need to synchronize the running of modules do a universal clock. For each clock tick, each module should take the values at its inputs, run its calculation, and push its outputs to its output terminals. However, for each clock tick, we must ensure that each module uses the input values that were made available in the previous clock tick. If we don’t do this, we again end up with race conditions. Consider this circuit:
A different value lives at wire 1, 2 and 3. If the clock ticks and A runs first, it will take the value at 1 and push a new output to 2. If in the same tick B runs next, B will take the value at 2, run its calculation and push a new value to 3. If we’re not careful, B will take the value that A pushed to 2 in the same tick, and use it to calculate the new output.
If instead B runs first in a clock tick, it will take the current value at 2, push a value to 3, and then A will take a value from 1 and push a value to 2. You can see how in these two different run orders, we will find different output values at 3. We still have a race condition based on which module runs first.
In order to synchronize the modules properly and ensure that each clock tick effectively latches values across modules in a predictable manner, regardless of run order and parallelization, we must have a two-step process for each clock tick:
- Each module Loads values from its input terminals
- Each module Runs its calculations based on values loaded in step 1, and pushes outputs to its output terminals.
If we ensure that all modules complete step 1 before we move on to step 2, we guarantee that modules will use values from the previous tick to compute values for the next tick. TheLoadInputs() method on modules accomplishes step 1, and the Run() method on modules accomplishes step 2.
The CompositeLambda class implements this synchronization in its Run() method. Because LoadInputs and Run both block until completion, we guarantee synchronization and predictable results throughout the system.
Timing and synchronization becomes more interesting in Composite Modules. From the outside, a composite module should behave like any other module. So when we run a composite module, we expect it to run all of its innards and push the appropriate values to its output terminals, all before returning. The composite execution happens inside CompositeLambda.
Let’s consider this composite again:
When the outer module is asked to run, it pushes its input values onto the Inverted module’s outputs. Those outputs propagate to the appropriate inputs on Add and Mult. Let us consider the case where our inputs are (1, 2, 3). We should expect to see an output value of (1 + 2) * 3 = 9. We’ll run through the execution:
- (1, 2, 3) pushed to inverted outputs
- (1, 2) propagate to inputs of Add, 3, propagates to second input of Mult.
- Call LoadInputs() on Add and Mult
- Add loads up (1, 2)
- Mult loads up (?, 3). Let’s assume this is the first run, and the value on the wire to the first input of Mult is 0. In that case Mult loads up (0, 3)
- Call Run() on Add and Mult
- Add pushes 1 + 2 = 3 to its output, which propagates to input 1 of Mult
- Mult pushes 0 * 3 = 0 to its output, which propagates to input 1 of Inverted Self
- Composite finishes executing. Pulls value of 0 from inverted self input and pushes 0 to outer module’s output
This is obviously not the correct answer. When you think about it, we really needed to run through all these modules twice to allow the output from Add the first time around to propagate through the Mult module.
What we can do is specify a NumTicks attribute on Composite modules. What this does is actually run through the entire circuit inside of a composite module multiple times in order to allow input values to fully propagate through the system the way the designer wishes. On the above module, we would want to set NumTicks to 2, because the longest path through the module involves going through 2 modules.
The coolest way to think about all this is like the movie Inception. As you go down deeper into the composite layers of a module system, each composite runs for a certain number of times. Even though I’m running the outer module once, the inner modules are being run twice (or more times, depending on the design). Now imagine I have composites within composites. Those inner composites are run a number of times per outer clock tick. So if I have a composite that runs 2 times on the top level, within which is a composite that runs 5 times, within which I have a composite that runs 3 times, the innermost modules are actually tun 2 * 5 * 3 = 30 times per major outer clock tick.
Time is an important concept in simulation. Whether time is represented as integer clock ticks, or as literal Dates, it needs to be passed around the system. We chose to pass around DateTime objects as our timing standard.
When you run a module or a lambda, you must give it the current time. Passing around time is important because we need to be able to support time-dependent operations, otherwise known asEvents. An event is really any change that is based on time, and any lambda could technically be driven by events if they use time to calculate their value. Time is effectively an additional input to every module and lambda.
In composite lambdas using time dilation (an increased number of inner clock ticks), the lambda effectively divides the passed time by the number of inner ticks and passes that along to its inner modules. So for example, if on the outside we tick from April 1, 2012 to May 1, 2012, a composite that has NumTicks of 3 will divide the time delta (1 month) into 3 equal time spans (10 days), and pass the following dates as intermediate clock tick to its inner modules:
- April 11, 2012
- April 21, 2012
- May 1, 2012
That time dilation propagates down into the system and keeps getting divided up, just like in Inception (it works really well as a metaphor).
Events are very important. They are implemented entirely as custom lambdas and should, in theory (although some hacks near the end of the project changed this), not exist in core modules code.
The way we implemented events was using a decorator pattern that wraps an existing lambda. An event lambda performs some Before operation on the input values before passing the altered inputs to the decorated lambda, then performs some After operation on the returned output values before returning them. It acts like a filter on input and output values.
Of course, because time is being passed everywhere in the system, events could technically be implemented however you see fit.
One of the huge requirements for our system was that it should be able to be distributed across multiple machines. A top level module should be able to hold a composite module that live on a different machine. When that composite module is run, it should pass its inputs to the other machine, run the inner modules as usual, then pass the outputs back to be returned.
This design makes it very obvious how modules are really black boxes. The lambda that exists inside of a remotely hosted composite module just takes the input values, passes them across the network to the actual composite lambda instance on another machine, runs the remote lambda, passes the outputs back to the thin networked lambda on the original machine, and returns those.
Cross-VM instantiation happens within the construction of Composite Lambdas. When it comes across a composite blueprint to build, it requests to a Network Adapter that it be built somewhere and that it get back the necessary IDs (machine instance id, lambda id) to communicate with the remote instance. So the majority of this implementation is network based.
We need to be able to log results through any given module system for any given run so that we can graph any value you wish across time once the run is complete.
Logging, however, is a nontrivial problem when you distribute modules across networked machines.
Effectively, each machine has an instance of a Logger that knows the current Run ID. Every module that lives on that machine has reference to that Logger and logs values whenever it wants to. The Logger just batches together a bunch of logs and hands them over to a Result Adapter that handles storing them.
There’s one caveat. Because of the “Inception” of nested modules, we don’t really want to store values at every single inner tick. We really only want to store values at every major tick. The problem is, inner modules have no real way of knowing when a major tick has happened.
We had to send an attribute called TicksPerMajorTick down into the system to every module. When composites are built up, the value of TicksPerMajorTick for inner modules is multiplied by the composite’s NumTicks. This ensures that each module becomes aware of how many ticks it will experience per major outer tick. And then, the module logs its values only after it experiences enough ticks to constitute a major tick.