As part of our publication on epithelium segmentation using deep learning and immunohistochemistry we published our dataset on Zenodo: the PESO dataset. The dataset is free to use under the BY-NC-SA Creative Commons license.
Contents of the dataset
The PESO dataset consists 102 whole-slide images, split in to in a
test part. The total dataset is around 140GB. For the training part, the reference standard is included. This set consists of:
- 62 whole-slide images, exported at a pixel resolution of 0.48mu/pixels.
- 62 Raw color deconvolution masks containing the P63&CK8/18 channel of the color deconvolution operation. These masks mark all regions that are stained by either P63 or CK8/18 in the IHC version of the slides.
- 25 color deconvolution masks (N=25) on which manual annotations have been made. Within these regions, stain and other artifacts have been removed.
- 62 training masks (N=62) that have been used to train the main network of our paper. These masks are generated by a trained U-Net on the corresponding IHC slides.
The test set consists of:
- 40 whole-slide images, exported at a pixel resolution of 0.48mu/pixels.
- 40 xml files containing a total of 160 annotations of regions that are used in the original evaluation.
- 160 png files of 2500x2500 pixels, exported at a pixel resolution of 0.48mu/pixels. Each png file corresponds to one test region.
- 160 padded png files of 3500x3500 pixels of the same test regions.
- A mapping file (csv) describing whether a test region contains cancer or only benign tissue.
Using the data
All slides are saved in TIFF format and can be opened with a viewer such as ASAP. To use the images in Python we can use the python binding of ASAP. To do so, make sure that the
<ASAP install directory>/bin is in your
PYTHONPATH. Then use the following snippet to extract a patch:
# Import the ASAP module import multiresolutionimageinterface as mir # Create a new reader to open files reader = mir.MultiResolutionImageReader() image = reader.open('path to image file') # Get a patch at x=1000, y=12000 (coordinates relative to level 0) at level 2 level = 1 patch = image.getUCharPatch(10000, 12000, 1000, 1000, level)