The math library embedded systems
always deserved.
numx brings scientific-grade numerical computing to bare-metal hardware. No heap. No operating system. No external dependencies. Thirteen production-ready modules spanning linear algebra, signal processing, automatic differentiation, ODE solvers, and more, all designed to run directly on ESP32, ARM Cortex-M, RISC-V, and everything else your project targets.
#include "numx/numx.h"
int main(void) {
/* float32 default; -DNUMX_USE_DOUBLE=1 for float64 */
/* every function returns numx_status_t — check it */
numx_real_t result;
numx_status_t s = numx_integrate_simpson(
my_func, 0.0f, 1.0f, 100, &result
);
if (s != NUMX_OK) {
/* NUMX_ERR_INVALID_ARG, NUMX_ERR_NO_CONVERGE … */
return 1;
}
return 0;
}Engineering Principles
Not features. Constraints.
Every decision in numx was made in the context of the hardware it runs on. These are the four constraints that shaped the entire architecture.
Zero Dynamic Allocation
Not a single call to malloc, calloc, or realloc exists anywhere in the numx codebase. On embedded systems, heap fragmentation is not a theoretical concern, it is a production failure mode. Every buffer, every intermediate result, every output lives either on the stack or in caller-provided memory. The library never surprises you with a memory footprint larger than what you gave it.
Fully Reentrant
numx holds no global state, no static mutable buffers, no hidden dependencies between calls. This matters in real embedded systems where the same library functions may be called from interrupt handlers, from RTOS tasks running concurrently, or from multiple execution contexts on multicore processors. Reentrancy is not a feature. It is a requirement for production use.
Typed Status Codes
Every function returns numx_status_t. If a solver does not converge, if a matrix is singular, if input falls outside valid range for an algorithm, the caller knows exactly what happened and why. There is no silent failure in numx. This is the only honest way to build software for systems where silent failure can mean a sensor reading the wrong value or a control loop running on corrupted data.
Single Precision Flag
The entire precision of the library switches between float32 and float64 with a single compile-time flag: -DNUMX_USE_DOUBLE=1. On hardware with a hardware floating-point unit, float64 runs without penalty. On hardware where memory and speed are tighter, float32 keeps the footprint minimal. One library. One codebase. The right precision for every target.
13 Modules
Every algorithm embedded engineers actually need.
Each module is self-contained, allocation-free, and integrates cleanly into any embedded project regardless of platform or toolchain.
Vectors and matrices. Dot product, norms, cross product, matrix multiply, transpose, determinant, and LU decomposition.
dotnormcrossmat_multransposedetluDescriptive statistics on fixed-size buffers. Mean, variance, standard deviation, median, and percentile without a runtime.
meanvariancestd_devmedianpercentileEquation root-finding. Bisection for guaranteed convergence, Newton-Raphson for speed, and Brent's method for both.
bisectnewtonbrentNumerical integration. Trapezoidal rule, Simpson's rule, and Gaussian quadrature for varying accuracy requirements.
trapsimpsongaussFinite difference derivatives. Forward difference, central difference, and Richardson extrapolation for improved accuracy.
forwardcentralrichardsonCurve fitting and reconstruction. Linear interpolation, cubic splines, and Chebyshev polynomial approximation.
linearcubic_splinechebyshevPolynomial arithmetic. Horner's method for efficient evaluation and Newton with deflation for root-finding.
evalrootsODE solvers for physical simulation. Fixed-step RK4 for deterministic loops and adaptive RK45 with error control.
rk4rk45Signal processing primitives. FIR filters, IIR filters, convolution, correlation, windowing, EMA, and peak detection.
firiirconvolvecorrelatewindowemapeaksFast Fourier Transform. Cooley-Tukey radix-2 in float32 and Q15 fixed-point, IFFT, and magnitude spectrum.
fft_f32fft_q15ifftmagnitudeAutomatic differentiation without a runtime. Forward mode via dual numbers, reverse mode via a static compile-time tape.
forwardreversegradSparse signal recovery. Orthogonal Matching Pursuit and Iterative Shrinkage-Thresholding for compressed sensing.
ompistaRandomized matrix methods. Randomized SVD using the Halko-Martinsson-Tropp algorithm for large matrix approximation.
rsvd(Planned) Number Theoretic Transform for post-quantum cryptography. Constant-time implementation targeting Kyber and Dilithium parameters.
nttinttTarget Hardware
Runs where it matters.
GCC. Clang. IAR. MSVC. If your toolchain compiles C99, numx compiles on your toolchain. No platform-specific code. No architecture-specific intrinsics. No dependencies on any header outside the C99 standard library.
Quick Start
Up and running in minutes.
# Option A: FetchContent (no cloning needed)
include(FetchContent)
FetchContent_Declare(numx
GIT_REPOSITORY https://github.com/NIKX-Tech/numx.git
GIT_TAG v0.1.0
)
FetchContent_MakeAvailable(numx)
target_link_libraries(my_target PRIVATE numx::numx)# Option B: clone and build locally
git clone https://github.com/NIKX-Tech/numx.git
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
ctest --test-dir build --output-on-failureProduction Validated
Not a prototype.
Through 2024 and 2025, NIKX Technologies developed TERRA, a production IoT platform running on ESP32 microcontrollers. Signal processing pipelines, numerical solvers, on-device mathematical operations running at 240 MHz, allocation-free, within real hardware memory constraints.
TERRA validated every architectural decision that had gone into numx since 2020. Zero dynamic allocation was not a design preference. It was the difference between a library that holds up in production and one that fails unpredictably under load.
Free. Open. Ready for production.
MIT licensed. No royalties, no attribution requirements beyond the license notice, no restrictions on commercial or academic use.
An academic paper documenting the algorithms and performance characteristics across embedded platforms is currently in preparation.