I made a few lines of code where I'm trying to instead of giving the train: , val: paths within the .yaml file I made an attribute which is dataset: containing all the dataset in it. After this process, I wrote a simple line of code which splits the dataset using sklearn but this didn't succeed because it prompted an error in the train.py when the create_dataloader is called. My question is it possible to split the dataset using ready-made functions?
@CPor99 sure. You can split a dataset automatically using autosplit() in utils/datasets, then you simply point your data.yaml to the new autosplit_*.txt files.
https://github.com/ultralytics/yolov5/blob/2c99560a98f9bba96ccf5ec3c774cc2a95c7cc64/utils/datasets.py#L918-L933
@glenn-jocher Nice. Has this method been added recently because I don't have that method in my version?
Update your code, changes are pushed daily.
@glenn-jocher Thanks! One last question, where are the autosplit_*.txt files saved after using the autosplit()?
Nevermind found them, thanks for your help @glenn-jocher much appreciated!
As stated in the comment section of the function:
def autosplit(path='../coco128', weights=(0.9, 0.1, 0.0)): # from utils.datasets import *; autosplit('../coco128')
""" Autosplit a dataset into train/val/test splits and save path/autosplit_*.txt files