Why should I share my data?

Even today, the most reliable way to analyze particles on images, is to mark them by hand. However, this method is very tedious and highly repetitive. This can impede the quality of the analysis results drastically. The operator may mark the particles in a slipshod way or the number of evaluated particles is just to low. After all, you need to mark at least 600 or better 1000 particles to achieve a good statistical quality [bibcite key=Allen.2003], which can take up to several hours for just a single sample.

1000 particles
Transmission electron microscope image showing approximately 1000 particles

There exist several semi-automatic methods to analyze particles on images (e.g. Watershed- or Hough-Transformation). Unfortunately, all of the established methods require the user to adjust one or more parameters to achieve a good result for a single image or a set of very similar images, if they manage to do so at all. However, as soon as the imaging conditions change, the user has to manually readjust the parameters. There is still no fully automatic solution, which allows you to input your images and outputs high quality results. As we have already demonstrated, machine learning can change that [bibcite key=Frei.2018].

The biggest challenge for machine learning methods is their hunger for data. To achieve results which can catch up with the quality of the manual analysis, they need hundreds or even thousands of already labeled images to learn from. The more data, the better the results.

More data = Better results
Modified version of Andrew Ng’s (Stanford, Google, Baidu) famous slide explaining the success of deep learning (source)

However, not only the amount of data is important but also it’s versatility. More heterogeneous data allows for a broader training of the machine learning algorithms. This in turn, makes them more robust against changes of the imaging conditions. If we want to develop fully automated methods to analyze images of particles, we need to join our data.

So now to answering the original question: Why should I share my data?

To help yourself.

By sharing your data now, you can profit from more reliable, robust and precise fully automated analysis methods in the future. You can also get credited for making your data available, if you decide to supply referenceable information along with your data. Others will use it to develop and validate new particle analysis methods and quote you in their publications.

To help others.

By contributing to the development of new particle analysis methods you obviously help the developers of said methods. However, also other users, just like yourself, profit from the new methods to analyze their data in a far less tedious and even more precise manner.