I think a useful utility function is to chose a random element from an array. Python has this in random.choice. Here is what a Julia implementation might look like:
function choice(a::Array)
n = length(a)
idx = mod(rand(Uint),n)+1
return a[idx]
end
An option would be to have an additional argument for drawing some number of samples. This would be sampling with replacement from an array with uniform probability.
Another useful function could be to have sampling without replacement.
Using idx = rand(1:n)
would be a better solution, as using mod
can't guarantee randomness, IIRC.
Sampling with replacement already exists in the Stats package as the randsample
function. Adding sampling without replacement is an ongoing issue.
Yes I think a good way to do this is just a[rand(1:end)]
.
I think you mean a[1:rand(1:end)]
I've seen a handful of @JeffBezanson coding slip ups. This is not one of them ;-)
a[1:rand(1:end)]
will produce a random-length prefix of a
, rather than a random sample from a
.
Since this issue is very old, I just wanted to updated @johnmyleswhite's comment for 2014: sampling is in StatsBase.jl
and the sample
function does some very clever sampling.
How does StatsBase.sample work with dataframe? I know I can convert it to
multi-dimensional array first but I'd like to keep my original
structure/headers.
On Fri, Sep 12, 2014 at 3:03 PM, Iain Dunning [email protected]
wrote:
Since this issue is very old, I just wanted to updated @johnmyleswhite
https://github.com/johnmyleswhite's comment for 2014: sampling is in
StatsBase.jl and the sample function does some very clever sampling.—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/3075#issuecomment-55446642.
Just ask for a subset of rows generated by sampling from 1:size(df, 1).
-- John
On Sep 12, 2014, at 1:12 PM, arshak [email protected] wrote:
How does StatsBase.sample work with dataframe? I know I can convert it to
multi-dimensional array first but I'd like to keep my original
structure/headers.On Fri, Sep 12, 2014 at 3:03 PM, Iain Dunning [email protected]
wrote:Since this issue is very old, I just wanted to updated @johnmyleswhite
https://github.com/johnmyleswhite's comment for 2014: sampling is in
StatsBase.jl and the sample function does some very clever sampling.—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/3075#issuecomment-55446642.—
Reply to this email directly or view it on GitHub.
@SamChill This does not work for non-indexable collections, like a Set. What's an efficient way to get a random element out of a Set?
rand(a::Array)
works after https://github.com/JuliaLang/julia/pull/9049.
An efficient implementation for Set
(and Dict
?) seems non-trival, and a PR might be accepted. In the mean time you can use rand(collect(set))
.
choose(xs, n) = xs[randperm(end)][1:n]
@undwad That seems inefficient, particularly when n
is small compared to length(xs)
.
Please don't necropost on old, resolved issues.
Yes I think a good way to do this is just
a[rand(1:end)]
.
@JeffBezanson
Thanks, it worked.
Most helpful comment
rand(a::Array)
works after https://github.com/JuliaLang/julia/pull/9049.An efficient implementation for
Set
(andDict
?) seems non-trival, and a PR might be accepted. In the mean time you can userand(collect(set))
.