Oral Presentation at the European Congress of Radiology, Vienna, 2019
Developing and validation of Deep Learning (DL) algorithms for medical imaging requires access to large organised datasets of images and their corresponding reports. Currently, most medical imaging data in the world is unorganised and requires images and text reports to be manually linked. An approach for linking medical iamges and reports of patients, where no unique identifier for linking them exists, is presented.
Methods and Materials
Simple, partial, token-set and token-sort ratios gave 4.56%, 46.45%, 57.37% and 7.97% matches of reports respectively with 95% match confidence. Token set ratio, which had the highest match percentage, matched 170,336 reports to their corresponding studies.
Fuzzy matching is a promising technique to merge independent datasets withoutunique identifiers, saving thousands of man-hours, critical for development and validation of DL algorithms.