Skin lesions classification: SparkR in DSX with IBM Watson visual recognition (9th WBME)

Written by fsmunoz on 14 April 2017 Categories: datascience Tags: ,

I have recently had the pleasure of paricipaing in the 9th Workshop on Biomedical Engineering that occurred in the Faculty of Sciences of the University of Lisbon; it is always refreshing to attend such events since one is exposed to a lot of extremely interesting research that is easy to miss when focusing on the day-to-day tasks.

My participation was twofold: a presentation concerning the use of IBM Watson in the bio-medical and health-care domains and a practical workshop to showcase how to use a concrete Watson API in a specific use-case, something which could be used to show what’s involved in using Watson and thus demystifying a bit the perceived complexity of using cognitive APIs. I ended up using IBM Watson Visual Recognition and Data Science Experience to build a step-by-step interactive tutorial on how to obtain and clean an existing curated skin lesions database to build a model that tries to identify malignant lesions:

Read on for the details.

The workshop was great and the participants interested and active; after some research around the topic I settled on using as use-case the detection of melanoma using a publicly available database from the ADDI project, the PH(2) database (which is coincidentally also made by Portuguese institutions, a curiosity more than anything since science hardly has borders).

Using R is something which for some reason comes naturally to me for most Data Science tasks and the unabashedly imperative nature of it makes prototyping very simple, and on top of that it adds all sorts of goodies that I could then use for statistical analysis. I initially created a Rmd file (using knitr in Emacs with ESS-mode, of course) for even with RStudio making things easier for newcomers a workshop of these kind has specific challenges and the less requirements in terms of software stack the better.

With that in mind I decided to give Data Science Experience (DSX) a try, a web-based collaborative environment which includes Jupyter notebooks and SparkR and has a collaborative dimension as well.

After some changes to the initial code (mostly to remove one or two dependencies that used native libs) it worked fine and it perfectly fitted the purpose: everyone in the workshop could copy the notebook and work in their own workspace or simply follow through.

I have made the notebook available through DSX- Using Watson for skin lesions identification – and also through github using the built-in integration, so feel free to experiment. It goes step-by-step through the entire process:

  1. Obtaining the database
  2. Cleaning up the data and importing the observations
  3. Creating the training and test sets
  4. Perform additional transformations to images
  5. Use Watson Visual Recognition to train a model based on the training set
  6. Interact with Visual Recognition API using R
  7. Classify the testing set
  8. Analysing the results

Using R through SparkR is straightforward and it’s trivial to wrap the Watson API ,e.g.:

All in all a great experience and time well spent, if nothing else I think it allows a “point-and-click” tutorial on how to obtain, treat and use Watson’s deep learning neural network to train a classifier.

No Comments on Skin lesions classification: SparkR in DSX with IBM Watson visual recognition (9th WBME)

Leave a Reply