Turicreate: Is there are any plans to add cross validation module?

Created on 18 Dec 2017  路  7Comments  路  Source: apple/turicreate

In the previous version of turicreate (graphlab-create-2.1) were a cross validation module that included cross_validation and KFold.
I wasn't able to find them anywhere in the current documentation or code.
It would be great to have cross validation and KFold as part of Turi.

enhancement toolkits workaround

Most helpful comment

Not sure why this was closed. It's a reasonable feature request. Reopening.

All 7 comments

We don't have it now. This is a great feature request!

For one of my own projects I have implemented a cross-validation and kfold that works with turicreate.

@Kagandi, Thank you - we will definitely have a look.
Feel free to submit this as a Pull Request.

will you add cross_validation.KFold in turicreate or not?
@igiloh @Kagandi @znation @srikris @hoytak @afranklin

Not sure why this was closed. It's a reasonable feature request. Reopening.

We still don't have cross validation support. However we did just add a shuffle method for SFrame. That should make it simpler to do cross validation yourself.

To do k-fold cross validation: call shuffle on your SFrame then divide it into k equal segments.

Here is a function I wrote to do cross validation:

def get_cross_validation_generator(sf, k):
    '''
    Parameters
    ----------
    sf : SFrame
        The SFrame on which to do cross validation

    k : int
        The number of folds

    Returns
    -------
    out : generator
        The generator yields a tuple with two members. The first
        member of the tuple is the train set SFrame. The second member
        is the test set.
    '''
    sf = sf.shuffle()
    fold_size = len(sf) // k

    for i in range(k-1):
        test_set_start = i * fold_size
        test_set_end = (i+1) * fold_size

        cur_test = sf[test_set_start:test_set_end]
        cur_train = sf[:test_set_start] + sf[test_set_end:]

        yield cur_train, cur_test

    # Add any left over portion to the final test set
    final_divide = (k-1) * fold_size
    yield sf[:final_divide], sf[final_divide:]

Here is an example of using it:

# Test get_cross_validation_generator
import turicreate as tc
sf = tc.SFrame({'a': range(11)})
for train, test in get_fold(sf, 5):
    print(train)
    print(test)
    print("\n\n")
Was this page helpful?
0 / 5 - 0 ratings