Guava: IntMath.sum(int[]), and likewise

Created on 31 Oct 2014  路  42Comments  路  Source: google/guava

_Original issue created by hein.meling on 2010-01-05 at 11:15 PM_


Suggest to provide:

sum()
average()

for Ints, Longs, Floats etc.

P3 package=primitives status=research type=addition

Most helpful comment

Many people use guava as a backfill of Java 8 functionality for Java 7-and-earlier projects (like Android).

All 42 comments

_Original comment posted by [email protected] on 2010-01-06 at 01:00 AM_


Agreed.

Varargs?

The best way to implement Longs.average() is a somewhat interesting puzzle.


Status: Accepted

_Original comment posted by gpampara on 2010-01-06 at 05:21 AM_


Could you please elaborate at what the complexities are?

I think I'm missing something.

_Original comment posted by hein.meling on 2010-01-06 at 08:12 AM_


About varargs; currently I have this:

int sum(Collection<Integer> c)

in my project. Not sure if such a collection can be transformed into an array without copying; if so varargs are
fine, otherwise I would like to have both.

Another issue that come to mind when I started to think about test cases for this was: what to return if the sum
exceeds the limit of its primitive type. (maybe this was what you were referring to as a puzzle with
Longs.average()?)

We have to avoid any silent overflows for this. Maybe the return type should be Number to allow
BigInteger/Decimal to be returned for those cases.

_Original comment posted by ogregoire on 2010-01-06 at 03:03 PM_


for the Longs.average() wouldn't it be interesting to use BigInteger as internal
mechanics?

Or use an integer to keep track of the number of times the overflow (or underflow) is
reached.

_Original comment posted by [email protected] on 2010-01-06 at 04:54 PM_


Yup, BigInteger is the simple solution, but I'm quite curious to benchmark simpler
alternatives.

As for what to do in the case of overflow, I think ArithmeticException seems
appropriate.

_Original comment posted by [email protected] on 2011-01-26 at 10:16 PM_


_(No comment entered for this change.)_


Labels: Type-Enhancement

_Original comment posted by [email protected] on 2011-01-27 at 09:07 AM_


I'd like to have each method to accept iterable, too.

While looking at my custom math utilities: What about rate and percentage calculation? Like this:

  public static double calculateRate(double count, double total) {
    // Compare against 0.0 to prevent infinity as result.
    return total == 0.0 ? 0.0 : count / total;
  }

This might be useful to implement for multiple primitives as one might be dealing with doubles or floats etc. depending on the application.

_Original comment posted by wasserman.louis on 2011-03-15 at 05:24 PM_


I'm up for writing this, including some attempts I've thought of for "simpler alternatives."

_Original comment posted by raymondofrish on 2011-03-16 at 02:01 AM_


How about something like the following? Not thorough benchmarked or optimized, but surely better than BigInteger, no? I hear there's a nice microbenchmarking tool available for Java...

public static long average(Iterable<Long> longs) {
    long count = Iterables.size(longs);

    long integerSum = 0;
    long remainderSum = 0;

    for (long num : longs) {
        integerSum += num / count;
        remainderSum += num % count;
    }

    return integerSum + remainderSum / count;
}

_Original comment posted by [email protected] on 2011-07-13 at 06:18 PM_


_(No comment entered for this change.)_


Status: Triaged

_Original comment posted by [email protected] on 2011-07-13 at 08:40 PM_


_(No comment entered for this change.)_

_Original comment posted by blank101 on 2011-07-19 at 07:06 PM_


Recollecting back to my numerical methods class, there are also some interesting issues for the floating point primitives when calculating sum/average values beyond (over|under)flow.

_Original comment posted by wasserman.louis on 2011-10-18 at 11:26 PM_


Revisiting...

I'm thinking we might just provide sum(), which has a less ambiguous return type.

In particular:

long IntMath.sum(int... ints)
BigInteger LongMath.sum(long... longs)

since we can safely assume that the sum of an array of ints will fit into a long. (The result will fit in 96 bits, so we can do intermediate accumulation with smaller types, too...)

_Original comment posted by [email protected] on 2011-10-19 at 07:53 AM_


So iterables are not supported as arguments? I've got a lot of them floating around when creating reports.

_Original comment posted by dancerjohn on 2011-10-19 at 10:35 AM_


You can use Longs.toArray(Lists.newArrayList(myIterable)) to convert to an array and pass to a varargs.

_Original comment posted by [email protected] on 2011-10-19 at 12:31 PM_


That does two copies. Seems a little unnecessary, considering that the "for each" loop introduced in Java 5 can iterate both arrays and iterables.

As implementing the loop twice (for each number type) would be somewhat unattractive, one might convert the other way round, i.e. from array to iterable. Iterators.forArray exists, and it might be the time to add it to Iterables, too. Then, that (or Arrays.asList(array).iterator(), as mentioned in a code comment) might more efficient.

For the record, a very common use case in my projects is retrieving a total field's value from some iterable of objects using transform and then summing up those numbers.

_Original comment posted by ogregoire on 2011-10-19 at 12:43 PM_


Can't you have Collections instead of Iterables ? You speak about the transform case, but to be honest, I never had the case where I wanted to perform only one action on my transformed list, so it was always "persisted" in a collection in order to be used more than once without having to re-transform it. Don't forget that Collection2 contains a transform(Collection,Function) method as well...

_Original comment posted by [email protected] on 2011-10-19 at 07:23 PM_


Well, as long as I can get away with passing just iterables around, I try to do it. Maybe I'm overestimating the memory efficiency here (compared to, say, generators in Python).

OTOH, I had the impression that one tried to avoid arrays in favor of collections, especially in Guava. Also, what is a common use case that supports the idea of accepting _varargs_? I'd expect to either have very few numbers which I then could just easily sum up using the classic + operator, or a whole bunch of numbers that are in some kind of container (usually a collection, not an array [see my expectation above], but I don't know how popular number crunching in Java with Guava is) anyway.

_Original comment posted by [email protected] on 2011-10-20 at 06:17 AM_


"avoid arrays in favor of collections" -- definitely, for _Object_ arrays, but primitive arrays do make some sense.

_Original comment posted by [email protected] on 2011-12-10 at 03:42 PM_


_(No comment entered for this change.)_


Labels: Package-Primitives

_Original comment posted by [email protected] on 2011-12-15 at 09:32 AM_


I revisited the use of my custom sum methods and it turns out that they are called with an Iterable in most cases because Iterables.transform is often used beforehand (on values that mostly already are in Iterables, too) to extract an attribute. The few cases that pass a Collection are those that call Map.values().

_Original comment posted by [email protected] on 2012-02-16 at 07:17 PM_


_(No comment entered for this change.)_


Status: Acknowledged

_Original comment posted by [email protected] on 2012-05-25 at 06:47 PM_


I still think the simplest most obvious win here is just to add IntMath.sum(int[]) and friends.

I'd like to save the topic of the mean/average for when we look at statistics calculation in general.

_Original comment posted by wasserman.louis on 2012-05-25 at 08:30 PM_


We need more detail, though:

  1. What is the return type of sum(int[])? It always fits in a long, but...
  2. What is the return type of sum(long[])? BigInteger, or long?
  3. Do we deal with floats and doubles? Do we take steps to reduce rounding error? (There are steps that can be taken.)

_Original comment posted by [email protected] on 2012-05-30 at 07:43 PM_


_(No comment entered for this change.)_


Labels: -Type-Enhancement, Type-Addition

_Original comment posted by [email protected] on 2012-06-22 at 06:16 PM_


_(No comment entered for this change.)_


Status: Research

_Original comment posted by [email protected] on 2012-06-26 at 01:13 PM_


private final <T> long sum(long bias, Iterable<T> ts, Function<T, Integer> transformer) {
        long sum = bias;
        for(T t : ts) {
            sum += transformer.apply(t);
        }
        return sum;
    }

_Original comment posted by wasserman.louis on 2012-06-26 at 01:26 PM_


Hrrrrm. I'd like this method to do very limited things, because if you need it to do _smarter_ things, you can just write the implementation yourself, given that it's only a few lines at the most.

At the moment, I'm leaning towards a very dumb, straightforward implementation. IntMath.sum(int...) returns an int. If you want something more sophisticated, whip it up yourself; this is just a convenience method for the most common case.

_Original comment posted by orionllmain on 2013-05-14 at 01:36 PM_


int IntMath.sum(int[] array)
int IntMath.average(int[] array)
long LongMath.sum(int[] array)
long LongMath.sum(long[] array)
long LongMath.average(long[] array)
double DoubleMath.sum(int[] array)
double DoubleMath.sum(long[] array)
double DoubleMath.sum(float[] array)
double DoubleMath.sum(double[] array)
double DoubleMath.average(int[] array)
double DoubleMath.average(long[] array)
double DoubleMath.average(double[] array)
BigInteger BigIntegerMath.sum(int[] array)
BigInteger BigIntegerMath.sum(long[] array)
BigInteger BigIntegerMath.sum(BigInteger[] array)
BigInteger BigIntegerMath.average(int[] array)
BigInteger BigIntegerMath.average(long[] array)
BigInteger BigIntegerMath.average(BigInteger[] array)

Any progress on this issue?

Should this never be done in guava as Java 8 has exactly the duplicated API?

https://docs.oracle.com/javase/8/docs/api/java/util/stream/IntStream.html

Many people use guava as a backfill of Java 8 functionality for Java 7-and-earlier projects (like Android).

@lowasser shall we bring this to API review?

Even with Java 8, there seem to be some methods missing:
...
long LongMath.sum(int[])
BigInteger BigIntegerMath.sum(long[])
...
I'm not sure about choosing the classes according to the result type, but there are already precedents in Guava (e.g., factorial(int)). Both methods are rather simple, but getting them wrong isn't too hard either.

Using IntStream::sum sounds like a bad idea in case it might overflow one day and you might spend the next one chasing the bug.

Varargs or int/long streams arguments may be better.

@Maaartinus With regards to LongMath.sum(int[]), I believe one can emulate it for now with both streams and Guava's help, like so. (I've not run this code example through a compiler, so it may need some tweaking, but I hope the idea comes across well.)

int[] ints = ...;
long sum = Arrays.stream(ints).mapToLong(i -> i).reduce(0, LongMath::checkedAdd);

Likewise, for BigIntegerMath.sum(long[]), something like this should work:

long[] longs = ...;
BigInteger sum = Arrays.stream(longs).mapToObj(BigInteger::valueOf).reduce(BigInteger.ZERO, (a, b) -> a.add(b));

@jbduncan Agreed. I guess, your sum(int[]) is as fast as it can get (note that you need no checkedAdd), but your sum(long[]) is probably much slower than slicing the long into two ints, accumulating each into a long and using BigInteger for the final result only.

@Maaartinus

Agreed. I guess, your sum(int[]) is as fast as it can get (note that you need no checkedAdd)...

Am I right to think that it's because a single long is large enough to hold the sum of an int[Integer.MAX_VALUE] where all elements are equal to either Integer.MAX_VALUE or Integer.MIN_VALUE?

Regardless, I think that LongMath::checkedAdd would almost certainly be needed when dealing with Streams with more than Integer.MAX_VALUE elements (which aren't common at all, I imagine). But then again, in such a situation, I imagine mapping and reducing to a BigInteger would be more sensible than reducing to a long anyway.

...but your sum(long[]) is probably much slower than slicing the long into two ints, accumulating each into a long and using BigInteger for the final result only.

I'm a bit lost but very curious by what you mean here. Can you give me a code example?

Am I right to think

Yes, the minimum value is > Integer.MAX_VALUE * Integer.MIN_VALUE > -2**62, which fits, the maximum value is slightly smaller in magnitude.


a code example

long MASK = 0xFFFFFFFFL;
long high = 0, low = 0;
for (long x : longs) {
     low += x & MASK;
     high += x >> 32;
}
return BigInteger.valueOf(high).shiftLeft(32).add(BigInteger.valueOf(low));

There may be tons of errors.

Is this covered by Stats.of(1, 2, 3, ...).sum(), etc.?

Only technically, but I would not expect Stats to be a good choice in cases where the Stats instance is immediately thrown away. It does too much extra work.

Stats is indeed very slow. I've written a simple benchmark which compares direct array summation and Stats.sum. Stats is 7-10x time slower:

Benchmark             (size)  Mode  Cnt    Score     Error  Units
Statistics.sum            10  avgt    4    0,010 ?   0,001  us/op
Statistics.sum          1000  avgt    4    0,861 ?   0,011  us/op
Statistics.sum        100000  avgt    4   86,060 ?   1,232  us/op
Statistics.sumStats       10  avgt    4    0,077 ?   0,017  us/op
Statistics.sumStats     1000  avgt    4    8,608 ?   0,133  us/op
Statistics.sumStats   100000  avgt    4  863,270 ?  31,961  us/op
Was this page helpful?
0 / 5 - 0 ratings