As part of our publication on epithelium segmentation using deep learning and immunohistochemistry we published our dataset on Zenodo: the PESO dataset. The dataset is free to use under the BY-NC-SA Creative Commons license.

The reference standard of the PESO set was made using immunohistochemistry, resulting in a precise and accurate ground truth.

Download dataset

Contents of the dataset

The PESO dataset consists 102 whole-slide images, split in to in a training and test part. The total dataset is around 140GB. For the training part, the reference standard is included. This set consists of:

  • 62 whole-slide images, exported at a pixel resolution of 0.48mu/pixels.
  • 62 Raw color deconvolution masks containing the P63&CK8/18 channel of the color deconvolution operation. These masks mark all regions that are stained by either P63 or CK8/18 in the IHC version of the slides.
  • 25 color deconvolution masks (N=25) on which manual annotations have been made. Within these regions, stain and other artifacts have been removed.
  • 62 training masks (N=62) that have been used to train the main network of our paper. These masks are generated by a trained U-Net on the corresponding IHC slides.

The test set consists of:

  • 40 whole-slide images, exported at a pixel resolution of 0.48mu/pixels.
  • 40 xml files containing a total of 160 annotations of regions that are used in the original evaluation.
  • 160 png files of 2500x2500 pixels, exported at a pixel resolution of 0.48mu/pixels. Each png file corresponds to one test region.
  • 160 padded png files of 3500x3500 pixels of the same test regions.
  • A mapping file (csv) describing whether a test region contains cancer or only benign tissue.

Example from the PESO dataset. The dataset consists of full whole-slide images.

Example of the ground truth segmentation of the epithelium tissue.

Using the data

All slides are saved in TIFF format and can be opened with a viewer such as ASAP. To use the images in Python we can use the python binding of ASAP. To do so, make sure that the <ASAP install directory>/bin is in your PYTHONPATH. Then use the following snippet to extract a patch:

# Import the ASAP module
import multiresolutionimageinterface as mir

# Create a new reader to open files
reader = mir.MultiResolutionImageReader()
image ='path to image file')

# Get a patch at x=1000, y=12000 (coordinates relative to level 0) at level 2
level = 1
patch = image.getUCharPatch(10000, 12000, 1000, 1000, level)

More info / questions?

Background information on the dataset can be found in the paper. For questions please use the contact form.