The trivial problem of Dimensional Analysis in Clojure

This is a repost from the LeadLab blog. They work on applying technology to healthcare. Do check out their cool research!

Functional Programming

Functional Programming (or fp as it is often called) is a paradigm of programming which aims at describing what we want to do rather than how we want to do it. The entire universe of fp revolves around the concept of pure functions.

A pure function is a function where the return value is only determined by its input values, without observable side effects.

Imagine if you will, a conveyor belt which takes in raw material as input. There are a lot of tools which process the input as it goes along, eventually producing the desired result. You can consider the concept of functional programming to be similar to the above situation in a number of ways. The conveyor belt can be thought of as the program with each of the tools being a pure function(does not produce any external side effects, depends only on input). The only difference is that on the belt, the original resource is modified in place, whereas in the program, the data is worked upon and a new instance is passed on to the next function.

Although fp has been around for a while, due to the advent of multi-core systems, big data and distributed computing it is seeing a resurgence in terms of interest in the programming community. It is backed by very strong literature and there is a lot of ongoing research on the same.

Some of the principles that fp operates by are as follows:

  1. Immutable Data structures: This considerably reduces the resources we spend on memory allocation and storage. So, how do we program with such an impairment? Rather than passing data around, we can operate on the data and create a new instance. This is still more efficient than modifying it in place. It allows the compiler to make certain assumptions that are unsafe in an imperative language.

  2. Higher Order functions: We can consider them as functions which take in other functions as arguments or return them as results. The idea is derived from lambda calculus. One of the best examples of a HOF is the function map which takes in a function as an argument and maps it to the list of elements provided as the second argument. Since each function application is independent of another(assuming pure functions) there is a lot of scope for improving the approach by running the same program on a multi-core system. Each of the results can then be aggregated using a function such as reduce.

  3. Seperation of Functions and Values: To an OO person, the concept of encapsulation is very pivotal while programming. This allows the class to have an internal state which can be modified by calling class methods which operate on these values. In fp, both of these entities are clearly seperated.

  4. Statelessness: Since all the functions are pure they have no idea of what has happened or what is going to happen to the input it receives. It only cares about operating on the data and producing the output. This fact combined with immutability is extremely useful for unit testing since each function has a specified set of duties it has to perform and in doing so, it will not affect any external state/variables. The state of the program is kep local and methods(or functions) are completely pure in the fact that they do not hide any abstraction.

Clojure

Clojure is a dynamic, general purpose programming language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming.

Rich Hickey is a genius. Clojure sits on top of the JVM, leverages Java’s type system and supports calling Java code from within. Clojure has a set of internal data structures such as maps, sets, vectors and lists. It also supports lazy sequences, immutability and persistent data structures. It also boasts of a typical Lisp syntax(which can be a little intimidating for the novice programmer).

Why am I doing this?

I was inspired by the solution to the problem as given by Peter Henderson in his book on Functional Programming and its applications. It is described in a very general manner and I thought it would be interesting to implement it in Clojure using a few of my own variations.

Why Clojure?

It’s awesome. It’s functional and thinking functionally gives you a way to come up with interesting solutions to common problems encountered during programming.

Cutting to the point

Dimensional Analysis is a very simple method to check if a physical equation is valid or not.

The dimensions of a quantity refer to the type of units that must be used in order to obtain the measure of that quantity. The type of units used are the 7 fundamental units which cannot be described with the help of anything else. We will be looking mainly at equations which involve quantities made up of Mass(M), Length(L) and Time(T).

We will assume that the equation is linear in all the physical variables and operators used are that of multiplication, division, addition and subtraction. Although this makes things a lot simpler than they already are, it will help us set up a basic framework for solving such a problem.

Understanding the problem

We will be trying to solve the following problem:

Assume that we have a set of variable quantities v1, v2, etc. each with a unique single alphabet symbol, and a given set of dimensions a1, a2, etc.

We will be trying to determine two things:

  1. If the equation is dimensionally correct.
  2. The dimension of the resulting quantity.

Let us assume a very straightforward model of an equation.

  1. We will consider that the equation is made up of Expression(s).
  2. Each Expression has a Left Hand Side(LHS) and a Right Hand Side(RHS) with an Operator acting upon them.
  3. The most fundamental quantity in the equation will be an Atom. An Atom can be a constant or a variable.
  4. Each quantity has a Dimension where we mention the exponents to the base quantities.

We will be using schema for data descriptions. schema provides us to functions validate the data and find out if it conforms to the given schema descriptions.

;; schema is required as 's'.

;; Dimension will contain the exponents to M(Mass), L(Length) and T(Time).
;; With s/Int, we specify that the values have to be integral.
(def Dimension
  {:M s/Int
   :L s/Int
   :T s/Int
   })

;; An Atom will contain a type specifying if it is a variable or a constant.
;; Let us also have the symbol of the variable, in the data definition.
;; Most importantly, the dimension of the quantity has to be mentioned as well.
;; As you will observe, the dimension is described in terms of the earler schema.
;; This will help in Atom validation.
(def Atom
  {:Type      (s/enum :variable :constant)
   :Symbol    s/Str
   :Dimension Dimension})

;; Let us also define the dimension for a constant quantity.
;; Since it is a constant it will not have exponents in any of the quantities.
(def Constant
  {:M 0
   :L 0
   :T 0})

;; Since we are assuming a simple case, let us say that the operations can be addition,
;; subtraction, division and multiplication.
;; The model is so designed that it can be easily extended to other operations.
(def Operator
  (s/enum :add :subtract :divide :multiply))

;; We will now describe the definition for Expression.
;; An expression can be described in a recursive manner.
;; The LHS and the RHS both can be Expressions or Atoms. schema provides a nice way to do this.
(def Expression
  {:LHS (s/conditional #(= nil (:Type %)) (s/recursive #'Expression) :else Atom)
   :RHS (s/conditional #(= nil (:Type %)) (s/recursive #'Expression) :else Atom)
   :Operator Operator})

;; We first check if the data in the LHS quantity has a Type field.
;; If it does, we know that it is an Atom, and we direct the checker to validate it for us.
;; Instead, if it does not, we can check for an Expression instead.
;; This can definitely be optimized but I thought this was one way to do it.

Coming up with functions for dimension calculations

Now, that we have the basic definitions sorted out, we can look forward to implementing some functions for checking dimensional correctness.

Assume a function which takes in an Expression or an Atom. We need to find out if the given expression is dimensionally correct. Let us divide the possible scenarios into cases such as:

  1. Input is an Atom. We can then return the dimension of the quantity.
  2. Input is a constant. We can then return Constant which is basically a dimensionless quantity.
  3. Input is an Expression with Operator :add. In this case, we need to check if the HS and the RHS have the same dimensions. If they do, we can return dimension of either. If they do not, we then return an undefined value (say UNDEF).
  4. Similar is the case for subtraction as well.
  5. Input is an Expression with Operator :multiply. We don’t need to check anything for this, but we return a sum of the dimensions of the two quantities involved.
  6. Input is an Expression with Operator :divide. Again, we return the difference between the dimensions of the numerator(LHS) and the denominator(RHS).
;; A generic undefined value. We can use this to convey that the equation is dimensionally incorrect.
(def UNDEF
  "INCORRECT")

;; Used to calculate the dimensions of the reulting quantity after the product of the two inputs.
;; If either of the quantities is UNDEF, this function returns UNDEF.
(defn product
  [dim1 dim2]
  (cond
    (= UNDEF dim1) UNDEF
    (= UNDEF dim2) UNDEF
    :else {:M (+ (:M dim1) (:M dim2))
           :L (+ (:L dim1) (:L dim2))
           :T (+ (:T dim1) (:T dim2))}))

; Used to calculate the dimensions of the reulting quantity after the divison of the two inputs.
;; If either of the quantities is UNDEF, this function returns UNDEF.
(;
(defn ratio
  [dim1 dim2]
  (cond
    (= UNDEF dim1) UNDEF
    (= UNDEF dim2) UNDEF
    :else {:M (- (:M dim1) (:M dim2))
           :L (- (:L dim1) (:L dim2))
           :T (- (:T dim1) (:T dim2))}))

;; Main logic function. Takes in an expression as an argument.
;; This function gives as output the dimension of the resulting quantity,
;; based on the Expression map provided to it.
;; If the equation is dimensionally incorrect, it will return UNDEF.
(defn dim
  [expr]
  (let [lhs           (:LHS expr)
        rhs           (:RHS expr)
        operator      (:Operator expr)
        expr-type     (:Type expr)]
    (cond
      (= expr-type :variable)  (:Dimension expr)
      (= expr-type :constant)  Constant
      (= operator :add)        (if (= (dim lhs) (dim rhs))
                                 (dim lhs)
                                 UNDEF)
      (= operator :subtract)   (if (= (dim lhs) (dim rhs))
                                 (dim lhs)
                                 UNDEF)
      (= operator :multiply)   (product (dim lhs) (dim rhs))
      (= operator :divide)     (ratio   (dim lhs) (dim rhs)))))

Setting up grammer for parsing the equation

Now that we have the general setup ready, we can go ahead and worry about actually taking user input and parsing it into the given Expression format. For this, we will be using the instaparse library. It allows you to specify your own grammer and builds a tree around the same.

Let us consider the simplest linear equation we will be looking to solve:

v = u + at

Newton’s first law of motion. Very simple to understand. But parsing it might be a pain. To make things slightly more tougher, let’s say that the quantities in the equation can have integral co-efficients as well. So it becomes:

v = Xu + Yat where X,Y are integers

We basically can use a parse tree which is used for arithmetic and make a few changes to it to include the physics-terms or as I would refer to them as pterms.

;; Introducing pterm as a new grammer rule.
;; pterm is basically a coefficient and multiple symbols multiplied together.
;; The output format specified here is :enlive which will help us in formatting the parsed data later.
(def physics
  (insta/parser
    "expr       = add-sub
    <add-sub>   = mul-div | add | sub
    add         = add-sub <'+'> mul-div
    sub         = add-sub <'-'> mul-div
    <mul-div>   = pterm | term | mul | div
    mul         = mul-div <'*'> term
    div         = mul-div <'/'> term
    term        = number | <'('> add-sub <')'>
    number      = #'[0-9]+'

Parsing the tree!

Alright! Let us dive directly into the code:

;; instaparse provides us with an awesome function which is known as transform.
;; transform allows us to traverse the tree from bottom-up and
;; act upon each node data and replace it by operating upon it.
;; Since :expr will the root node, it will eventually contain the entire expression
;; map, hence we provide it the identity function.
(def parse-tree->schema-def
  {:sym         replace-sym
   :coefficient replace-coefficient
   :add         replace-add
   :sub         replace-sub
   :div         replace-div
   :term        replace-term
   :pterm       replace-pterm
   :expr        identity})

;; Each of the replace-"operator" function can be defined as follows:
;; (Considering replace-add for example)
(defn replace-add
  [term1 term2]
  {:Operator :add
  :LHS term1
  :RHS term2})

;; This allows us to simplify the function logic to a great extent.
;; The parser will give us with the the :add node with two children,
;; each representing a side of the operation.
;; As mentioned before, this is true for all functions.

Now, we have the basic setup for conversion ready, but again, we are missing a few important functions:

  1. replace-coefficient: Since a co-efficient is going to be a constant, we can just create an Atom map for the constant and return it. The Symbol for it can be the value of the co-efficient.

  2. replace-term: As we can see from the parsing rules above, term will always contain integral value and hence we can just return the Constant defined earlier.

Now, two most important functions left are replace-pterm and replace-sym. Let us see how we can tackle those:

;; The pterm is basically a multiplication of the co-efficient term and the symbol terms.
;; Assuming we have both of them defined correctly:
(defn replace-pterm
  [coeff-term symbol-term]
  {:Operator :multiply
   :LHS coeff-term
   :RHS symbol-term})

;; Now, this is a bit complicated.
;; Now, we can have multiple symbols being multiplied to each other.
;; Our expression model is pretty simple in that we have an LHS and an RHS both of which
;; can either be an Atom or an Expression.
;; Suppose we have the term "w*h*a*t", we can consider it as
;; w * (h * (a * t))
;; This exact same algorithm is implemented below.
;; It takes in the first element of the symbol list and recurses on the rest.
;; If we have a single element, we call find-sym which we will be describing later,
;; but if we have 2 elements, then we know that we have reached the end of the list and
;; can create an multiplication Expression.
(defn replace-sym
  [& sym]
  (case (count sym)
  1 (find-sym (first sym))
  2 {:Operator :multiply
      :LHS      (find-sym (first sym))
      :RHS      (find-sym (last sym))
     }
  ;; default case
  {:Operator :multiply
   :LHS      (first sym)
   :RHS      (replace-sym (rest sym))}))

All well and good. The only remaining piece in the puzzle, is find-sym. This is basically a lookup function into an Assoc-list. Henderson defines a map/list as an Assoc-list if we can look up the symbol dimensions into it. Easy.

;; Defines the variable 'v' as an Atom.
(def Assoc-List
  {:v {:Type :variable
       :Symbol "v"
       :Dimension {:M 0
                   :L 1
                   :T -1}})

;; Simple lookup into the Assoc-List map.
;; Returns UNDEF if not found.
;; We can have the user populate this map as an earlier exercise.
(defn find-sym
  [sym]
  (let [sym-key   (keyword sym)
        sym-value (sym-key Assoc-List)]
    (if (not-empty sym-value)
      sym-value
      "UNDEF")))

Yay!

Lo and behold! We now have a working program which can dimensionally analyze equations for us. We can then define a function to aggregate all of the above into a single API exposing the required functionality. Maybe for a later time ;). Let us see it running:

user=> (def equation (physics "2u+3at"))
#'user/equation
user=> (insta/transform parse-tree->schema-def equation)
{:Operator :add, :LHS {:Operator :multiply, :LHS {:Type :constant, :Symbol 2, :Dimension {:M 0, :L 0, :T 0}}, :RHS {:Type :variable, :Symbol "u", :Dimension {:M 0, :L 1, :T -1}}}, :RHS {:Operator :multiply, :LHS {:Type :constant, :Symbol 3, :Dimension {:M 0, :L 0, :T 0}}, :RHS {:Operator :multiply, :LHS {:Type :variable, :Symbol "a", :Dimension {:M 0, :L 1, :T -2}}, :RHS {:Type :variable, :Symbol "t", :Dimension {:M 0, :L 0, :T 1}}}}}
user=> (def parsed-result (insta/transform parse-tree->schema-def equation))
#'user/parsed-result
user=> (s/validate Expression parsed-result)
# Returns successful!
user=> (dim parsed-result)
{:M 0, :L 1, :T -1}

P.S: I am a novice Clojure programmer. Do suggest improvements in my coding style and the above code itself. It is hosted on my Github.

References: Peter Henderson’s book on Functional Programming Application and Implementation