I am currently building a machine learning library and I often need to check whether given input is a matrix of numbers or strings or booleans or .... . I have written an extension method to deal with this within my project:
/**
* Checking the matrix is a matrix of a certain data type (e.g. number)
* The function also performs isMatrix against the passed in the dataset
* @param matrix
* @param {string} _type
*/
const isMatrixOf = (matrix, _type = 'number') => {
if (!isMatrix(matrix)) {
throw Error(`Cannot perform isMatrixOf ${_type} unless the data is matrix`);
}
// Checking each elements inside the matrix is not number
// Returns an array of result per row
const vectorChecks = matrix.map(arr =>
arr.some(x => {
// Checking type of each element
if (_type === 'number') {
return !_.isNumber(x);
} else {
throw Error('Cannot check matrix of an unknown type');
}
})
);
// All should be false
return vectorChecks.indexOf(true) === -1;
};
/**
* Checking the matrix is a data of multiple rows
* @param matrix
* @returns {boolean}
*/
const isMatrix = matrix => {
if (_.size(matrix) === 0) {
return false;
}
const isAllArray = matrix.map(arr => _.isArray(arr));
return isAllArray.indexOf(false) === -1;
};
Notes:
1) _ is Lodash.
2) Above code example only deals with number[][] ... =) ...
My question is, does this functionality already exist in Math.js?
There is no such function, though it's possible to pass a datatype when creating a new matrix, like:
const m = math.matrix([[1, 2], [3, 4], 'dense', 'number')
The matrix algorithms reckon with this datatype and when provided, specific functions (like add for numbers only) are used instead of generic functions (like add for any type). This gives a huge performance improvement.
A function isMatrixOf sounds like a good addition to facilitate these kind of optimizations. Maybe we could even call this when creating a matrix to automatically have these optimizations when no datatype is provided.
@JasonShin would you be interested in turning your code into a full PR (including testing, docs)?
"would you be interested in turning your code into a full PR (including testing, docs)?"
@josdejong sure, I will do it after work =)
Great!
I was thinking, maybe instead of isMatrixOf we could create a function getMatrixDataType which would return for example number' if the matrix only contains numbers, and returns 'mixed' if there are multiple types used? Then you can simply do getMatrixDataType(myMatrix) === 'number' to get the behavior you have with isMatrixOf.
A function getMatrixDataType should work for DenseMatrix and SparseMatrix, and Array. I think it would be neat to add two methods getDataType on the two matrix classes DenseMatrix and SparseMatrix for that and call these from the getMatrixDataType function.
What do you think @JasonShin ?
@josdejong
"What do you think" -> Sounds nice =) I will use the name getMatrixDataType and make sure to factor inDenseMatrix, SparseMatrix, and Array. With using the method against Array, would you implement another method to check the array is a matrix (e.g. [ [1, 2, 3], [4, 5, 6] ])?
Progress update -> I am working on this at https://github.com/JasonShin/mathjs/tree/develop
That's a great start! Some feedback points:
getMatrixDataType.js can be simplified a bit: input argument can simply be array (instead of a generic object), and I think isArrayMatrix can be replaced with just Array.isArray(array) on the input. Checks to ensure that you have a valid matrix with rows/cols with the same size are implemented in the DenseMatrix and SparseMatrix constructors already.getMatrixDataType implementation for arrays should be recursive. See other functions in src/utils/array.js for inspirationAs far as I can see these are the steps we need:
getMatrixDataType which (recursively) gets the type of all of the valuesgetDataType on DenseMatrix which internally calls the getMatrixDataType Array function on the _data property (a one liner)getDataType on SparseMatrix which internally calls the getMatrixDataType Array function on the _values property (a one liner)getMatrixDataType in src/function/matrix which is exposed on math, supporting all three different matrix typeso, and I forget a BONUS step, but maybe we should do that in a separate PR in order to keep this PR focused:
getMatrixDataType in matrix operations/functions we can make them like a factor 5 faster out of the box (!).matrix._datatype of DenseMatrix and SparseMatrix with calling matrix.getDataType(), and smartly cache the result in the internal property _datatype to prevent unnecessary recomputation if the matrix didn't change.Thanks for your replies @josdejong !
Giving you a quick update on the comment
_"maybe you want to use plain old for loops for best performance? (Worth a small benchmark on a large matrix before changing things for real)"_
I found that using plain old for loops perform much faster than using recursion. Here is the benchmarking result:
Code URL: https://github.com/JasonShin/mathjs/blob/develop-benchmark/src/utils/testing.js
Code snippet:
getMatrixDataType -> recursion
getMatrixDataType2 -> plain old for loops
const { getMatrixDataType, getMatrixDataType2 } = require('./matrix')
// Simple matrix construction
const numRows = 10000
const numCols = 10000
let newRow = []
for (let col = 0; col < numCols; col++) {
newRow.push(col)
}
let rows = [];
for (let row = 0; row < numRows; row++) {
rows.push(newRow.slice())
}
console.time('test recursion')
console.log(getMatrixDataType(rows))
console.timeEnd('test recursion')
console.time('test pure loop')
console.log(getMatrixDataType2(rows))
console.timeEnd('test pure loop')
Result:
$ node testing.js
test recursion: 3100.918ms
test plain loop: 733.547ms
I will choose the plain old for loop over recursion for this PR ;)
I will choose the plain old for loop over recursion for this PR ;)
:+1: that's definitely worth it
@josdejong
Hi, I'm having trouble with generating docs for getMatrixDataType function that I wrote. When I run npm run docs It keeps complaining
[20:16:19] Warning: function "getMatrixDataType": description missing
[20:16:19] Warning: function "getMatrixDataType": syntax missing
[20:16:19] Warning: function "getMatrixDataType": examples missing
[20:16:19] Warning: function "getMatrixDataType": parameters missing
[20:16:19] Warning: function "getMatrixDataType": description missing of returns
[20:16:19] Warning: function "getMatrixDataType": seeAlso missing
But I am pretty sure I have description, syntax, examples...
Could you please have a look at the code and let me know what's wrong?
https://github.com/JasonShin/mathjs/blob/develop/src/function/matrix/getMatrixDataType.js
Thanks thanks =) ...
@JasonShin I think the doc generator (a small custom script) assumes that each file contains 1 function, and it only parses the first comment block it sees. I think if you move the isMultiDimensional with it's comment block to the bottom of the file the documentation will be generated correctly.
Looking at the code itself: I think it will become easier if you create the file as a typed-function. That saves code and automatically generates informative error messages when calling the function with wrong inputs. Something like:
return typed('getMatrixDataType', {
'Matrix': function (x) { // will match both DenseMatrix and SparseMatrix
return x.getDataType()
},
'Array': function (x) {
return getMatrixDataType(x) // the low level Array implementation
}
})
You can have a look at other functions to see how the factory thing and dependency injection is structured exactly, basically:
function factory (type, config, load, typed) {
return ...
}
exports.name = 'getMatrixDataType'
exports.factory = factory
Thanks! Sorry, it's my first time working in this code base so I was confused ;)
No worries, its also not the most common/easy dependency injection. If you have more questions/doubts don't hesitate to ask!
PR for this open at https://github.com/josdejong/mathjs/pull/1172
This feature is merged, closing the issue