How-To: Generate classification data

In this article you are going to learn how to generate classification data. Generating classification data is simple and requires four steps:

1. Choose the classes.

First of all you need to decide which classes you want to use for your classification. Try to be specific. It is easier to merge classes later on than to separate them.

For this examples, let’s say we have three classes:

  • agglomerate
  • single particle
  • other (e.g. debris or image artifacts)

The first two classes are for the objects that we are actually interested in. However, the third class is for anything that cannot be classified into the first two classes.

2. Produce images of each class.

Ideally, try to produce an equal amount of images for each class except the other-class.

3. Prepare the data.

Make sure that each of your images only shows a single object. If you have an image showing multiple objects, cut the image in parts until each sub-image only shows a single object (e.g. using ImageJ). If you have an image of objects of different classes, which touch, then discard the image or use a mask-based approach instead of a simple classification approach. However, a mask-based approach is far more time consuming.

4. Sort the data.

Now we are going to actually generate the classification data. Create a folder called “data”. In this folder create sub-folders for each class that you have. In our example we would have the following folder structure:
Folder Structure to generate classification data.

Now sort each image into the corresponding class-folder and you are done.

If you found this How-To useful, then be sure to check out our other How-Tos.