Sign In

Communications of the ACM

Research highlights

Technical Perspective: An Elegant Model for Deriving Equations

multiple signals, illustration

Credit: Getty Images

We have encountered units and elementary dimensional analysis in our high school science classes. For instance, the mass of an object is expressed in kilograms (kg). Likewise, length is expressed using meters (m) and time in seconds (s). Other physical quantities such as acceleration has dimensions m s-2 (derived from its definition), whereas force has dimensions kg m s-2. The latter arises from Newton's second law that states that force (F) is equal to the mass (m) times the acceleration (a). Thus, the dimensional units of quantities reflect important relationships between them.

Suppose we are onboard an aircraft with an array of sensors that are independently measuring, among other things, the values of force, mass and acceleration. We could use the equation F = ma to check, for instance, that a single sensor has not failed. However, in many cases, deriving such laws from "first principles" may be quite cumbersome, if not outright impossible.

Imagine a system running by a patient's bedside in the intensive care unit of a hospital with a continuous stream of data that includes the patient's blood pressure BP (kg m-1 s-2), lung volume V (m3), pulse P (s-1), and body weight W (kg). In this situation, it is unclear whether there are "precise" equations derivable from first principles, or even "approximate" empirical equations that may hold under some situations. Be they exact or approximate, these relationships are useful in numerous applications such as the run-time monitoring of safety critical systems.

Discovering possible relationships between various quantities given observational data suffers from the classic "needle in the haystack" problem. The number of possible hypotheses is astronomically large whereas, in practice, very few of these hypotheses will survive empirical tests. The following paper addresses the key problem of discovering relationships that hold between physical quantities from data using dimensional analysis to drastically narrow down the space of hypotheses.

Machine learning provides many powerful approaches for regression using neural network models to detect relationships between quantities. However, many of the existing approaches do not consider the dimensions of the quantities being modeled. The authors propose a simple, yet elegant approach based on the idea of dimensional analysis in physics: a powerful approach that can postulate possible physical relationships by examining the dimensions of the quantities being related. The "Buckingham π" theorem, which formalizes earlier methods going back to the 19th century, provides an elegant recipe for generating such relationships by finding dimensionless parameters. Using this, given the dimensions of the quantities measured, we may setup a system of linear equations to discover such products. For instance, F × m-1 × a-1 is seen to be dimensionless using this approach, from the dimensions of F, m, and a. Similarly, for the ICU bedside monitor described earlier, the quantity BP × V 1/3 × P-2 × W-1 is dimensionless. However, unlike Newton's second law, the relationship between blood pressure and pulse is much more complex and variable. Thus, suitable statistical tests on the data are used to further classify the relationships obtained from the inference approach presented in the paper.

The authors demonstrate their approach to effectively derive physical relationships from observational data for systems such as an unpowered glider and a pendulum. Their approach empirically discovers Newton's equations, which are then used to accurately predict the altitude of the glider or the familiar relationship between the length of the pendulum and its time of oscillation. A more sophisticated and general approach uses the derived dimensionless parameters as input features to train machine learning models on the observed data. This approach compares quite favorably to other off-the-shelf approaches.

Thus, the authors present an elegant approach to inferring models from data that incorporate some of the known relationships between the quantities being modeled using dimensional analysis. Elsewhere, dimensional analysis has been shown to be quite effective in detecting defects in robotic software using dimensions as type annotations that can be derived using program analysis techniques.2 Furthermore, dimensions provide a type system for physical quantities. Such type systems are quite useful in machine learning models wherein we often seek to avoid overfitting by imposing constraints such as monotonicity on the models.3 I see the proposed dimensional consistency approach as a precursor to strongly typed machine learning models that can leverage the power of dependent type systems to specify more sophisticated properties including monotonicity.1

Back to Top


1. Clancy, K. and Miller, H. Monotonicity types for distributed dataflow. In Proceedings of the Programming Models and Languages for Distributed Computing. ACM, 2017.

2. Ore, J-P.W. Dimensional Analysis of Robot Software without Developer Annotations. Ph.D. thesis, Univ. of Nebraska, Lincoln, 2019.

3. Sill, J. Monotonic networks. Advances in Neural Information Processing Systems 10. M. Jordan, M. Kearns, and S. Solla, Eds. MIT Press, Cambridge, MA, 1998, 661–667

Back to Top


Sriram Sankaranarayanan is an associate professor of computer science at the University of Colorado, Boulder CO, USA.

Back to Top


To view the accompanying paper, visit

Copyright held by author.
Request permission to (re)publish from the owner/author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2021 ACM, Inc.


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Article Contents: