core.model#

PolyConf core model.

Notes:
  • Although a lot of the dataclass magic is nullified, it’s still used here for its excellent repr().

  • Tricky stuff to be aware of:
    • I’m not quite 100% settled on the type for “children”.
      • It’s currently a set, which has the benefit of order-agnostic comparison and feels somewhat natural.

      • I’ve been tempted to switch to a dict for the sake of explicit index keys.

    • Similarly, I’m not quite 100% settled on the type for “sources”.
      • The first bullet for “children” also holds true here.

      • But the primary reasoning is that insertion order can carry implicit data for when the value was seen.
        • In this case, duplicates are actually desired, but it’s not clear this is a _realistic_ scenario.

    • Serialization casts sets as lists.
      • “children” members get pre-sorted, which aids in comparison during testing.
        • When using a fake value generator (like faker), such that the actual value isn’t readily known, then be careful about ordering. The serialized output is sorted but the INPUT might not be.

Definitions:
  • scalar – I’m slightly loading this term to specifically mean, “primitive, non-collection, includes null”.
    • Put another way, it’s generic for “str | int | bool | None”

    • A motivation is ease of serialization (including JSON).

  • datum – Is the blessed “augmented scalar”.
    • Although it has an attribute that is a collection, a datum itself (class level) is not a collection.

    • Note the use of “scalar” is more restrictive and does not imply “datum”.

  • collection – Is specifically constrained to native collection types “set | list | dict”.
    • In general, the members/values are “scalars” (as defined above) and/or “datums”.

    • Specific to representing collections as datums, in the case of dict, the keys are explicit and the type implies the collection type.
      • int key implies a list

      • str key implies a dict

    • Note there currently isn’t a case for collections of collections (perhaps called “2nd order collection”?).
      • This is because when composing data with “datums”, collection are always represented with a datum and its “children” attribute.

  • serialization – In general (outside of PolyConf), usually implies targeting outside the application, whether to disk or over the network, but PolyConf’s usage is closer to “marshalling” (should I refactor?). The target is portability within the application, thus the “de/serialize()” methods produce native Python objects and not a (usually JSON) string. - A motivation (perhaps primary) is to ease deep merging between “datums”, which is a complex topic/task. - I vendored a library (“deepmerge”) that has exhaustive support for deep merging dictionaries. Therefore, the “merge” process serials each side to dictionaries, the deserializes the result. - Currently, the use cases are constrained within PolyConf, but it’s easily conceivable that the application using PolyConf could use it, too (thus it’s publicly exposed).

Maxims:
  • “value” and “children” are mutually exclusive.

  • “value” indicates a leaf node

  • “value” is always a scalar object

  • “children” is a collection

  • “children” members are always Datums

  • child datums:
    • The name attribute is basically the index of the collection.

    • if name is an int, the “children” collection is a list

    • if name is a str, the “children” collection is a dict

Todo:
  • General naming consistency.
    • Terms like “put”, “assimilate”, “from_dict”, etc. are unclear – reconsider naming.

  • Review test coverage.

  • Clean up early churn – unused methods, properties, etc.

  • Consider logging usage.

  • Fill out types.

  • Clearly document public API and intended usage patterns.

  • Fill out docstrings.

Submodules#