Rambling on about refactoring 2018-06

This document is mostly obsolete.

Remove pyaml

Code execution

The pyaml implementation is a mess, and the amount of nasty code in simudo/solution/fragment.py is appalling.

Object paths

A useful feature in the Pyaml approach is the ability to override methods in different classes, as a way of wiring things together. We need a convenient replacement for that.

/VB: ValenceBand class instance

Restructuring

/$band/$quantity example: /VB/j

/u unit_registry

/optical/field/wavelength=23.3um,phi=45deg/@optical_field

Memoization

Currently, computing a value stores it in a dictionary solution.cdict that is specific to the solution object.

However, value lifetime may not match solution object lifetime. If value lifetime is shorter, then it needs to be manually reset. If it is longer, then it must be manually set in solution.dict.

There is a better way.

Each object that is the result of a computation records its inputs.

Problem: symbolic expression versus value

We don’t need to rebuild symbolic expressions (e.g., dolfin, sympy) every time, even if a terminal changes.

However, we do need to recompute evaluations (e.g., dolfin projections) if one of the terminals in the expression changes.

Normally the relationship between a symbolic expression and its evaluation is:

evaluated = evaluate(symbolic_expression, value_for_each_terminal)

Dolfin blurs the distinction by having the terminals themselves have values.

Problem: Combinatorial explosion in checking cached value staleness

To check whether a value is stale, we need to check whether its dependencies are stale. This needs to be re-checked as soon any mutation occurs. This is expensive. There are two solutions.

  1. Mutation of a value invalidates cached values that depend on it. This requires setting up a complex system to invalidate only what is necessary.
  2. Live with it. Mutation happens infrequently, and we can just invalidate the entire cache at once. If invalidated, each value decides whether it is actually stale (and if it isn’t, re-uses the current value).

Note that either way, invalidation does not imply that the value is recomputed.

Problem: verbosity

To implement a cached computation, one would need to write something like:

class MyValue(CachedComputation):
  def cache_key():
    pass
  def compute_value():
    pass

cache.register(MyValue)
cache[MyValue]
/expr:
  _v["/CB/j"]

v = _._new(cache_policy="dolfin_value")

a = v[CB/"j"]

CB('j', cache="dolfin_value")
CB('expr', cache="dolfin_expr")

Kinds of things

Plumbing values:

/VB/j: _.some_j + _.other_j

Plumbing functions:

/VB/w_to_density: !e lambda w: dolfin.exp(w * _.beta)

Projections and evaluations:

/VB/density_proj: dolfin.project(VB.w_to_density(_.w), space.CG1)

Problem: plumbing functions, as defined above, fail to establish dependency. And even more importantly, they do not cache their results. These can be seen as separate issues.

Fix example:

/CB/w_to_density: |
  _ = no_track(_)
  @caching_track_v1()
  def w_to_density(cache, w):
    yield (w, _.beta)
    yield dolfin.exp(w * _.beta)
  return w_to_density

Solution

Each memoized function call is stored in a table. The table is indexed by the function identifier and the cache key. A table entry is called a MemoizedEvaluation.

Consider definitions:

/dir:
  A: create_variable()
  B: _.A*2
  C: evaluate(_.B)

The evaluation of B must look up the value of A in _; this translates to a call to retrieve_path(solution_dictionary, "/dir/A"). The function retrieve_path is also a memoized function, and its cache key is (id(solution_dictionary), solution_dictionary["/mtime"], path). That is, it does not track any further changes. To indicate that mutation has occurred in the dictionary, all cache entries for retrieve_path are removed.

Cache clearing

For most objects, regenerating the value is not a big deal (a waste of CPU at most). However, certain objects must have their identity preserved. In particular, dolfin.Function and dolfin.Mesh instances. If these are regenerated, the UFL expressions depending on them won’t make sense anymore, and will also need to be regenerated.

Easiest hack for now is to clear based on last access time, and to re-access (“touch”) the memoized values that should stay alive right before “garbage collecting” the cache. A simple way to mark these values (in a solution dictionary for example) is:

/path/:
  func: dolfin.Function(...)
  func/@keep: True

In this case, the solution dictionary searches for properties ending in /@keep. If the value is True, then the part before the “/” is touched to prevent clearing. If the value is an iterable (which must then contain MemoizedEvaluation instances), the iterable is traversed and the instances are touched.

Request servicing process

  1. User code calls problemdata["/some/path"].
  2. A Request object is produced with the wanted path.
  3. The path is matched against a regex made up of all responders’ responder_get_path_regex(). The relevant responders are filtered using that regex.
  4. responder.responder_rejects_request(request) is called for each responder. Those that answered False are kept.
  5. The list of relevant responders is stored in request.responders, and request.responder_index is set to zero.
  6. The

Lookup algorithm

  1. Break path into subpaths, e.g. “/a/b/c” becomes ["/", "/a/", "/a/b/", "/a/b/c/"].
  2. For each subpath subpath:
    1. If $subpath/@mount exists (call it mount), then call mount(subpath).

Problem with mount system: what if the following all exist?

1. /a 1. /a/@mount 1. /a/

Use dependency (topological) sort to establish resolution order. Hooray for reinventing C3 linearization.

project(ufl_expr, space) -> dolfin.Function

tunneling_recombination: |
E = poisson.E … return dolfin.Function

Things to consider

  • unit system
  • function subspace registry
  • band
  • band transport form
  • poisson
  • poisson form
  • material
    • material parameter interdependency
  • optical fields
  • recombination/generation models
  • tunnelling
  • interpolated data loaded from disk
  • refinement by independent mesh generation
  • degenerate bands require a more complicated relationship between qfl and carrier concentration
    • in particular, it uses an approximation for both ways qfl->density and density->qfl
    • the inverse is NOT exactly the identity
    • need to avoid going qfl->density->qfl
      • this currently does not happen in the code, so just watch that it doesn’t happen as a result of the refactor
  • “In the current code, it seems quite awkward to set up a series of runs with different values of a material parameter. so maybe moving to python code will make it easier to vary parameters programmatically.”

pyaml rehaul implement: - symlink - textual code gen - …