This talk was given today at ICCR 2016, London, UK
A DICOM anonymization server for data collection in clinical trials
Marcel van Herk†, Gareth Price, Alan McWilliam and Rhidian Bramley
Institute of Cancer Sciences, University of Manchester and Christie NHS Trust, Manchester, UK
Sharing anonymized RT planning and image data in clinical trials allows image biomarkers and detailed dose-response relations to be established. Even though DICOM anonymization software is abundant, anonymizing for clinical trials time is time consuming and error prone due its manual steps, especially when updating existing trial databases. Interpreting received data is often difficult due to inconsistencies in resubmitted data, incorrect subject keys, missing and superfluous data, and lost linkage between RTDICOM objects. Previously, full integration between clinical trial systems and PACS was proposed [1-3], that in practice may be time-consuming due to manual linking of images to trial events. Our purpose is to develop a simple (bulk) DICOM anonymization server with these specifications: 1) DICOM connectivity with image upload by DICOM push or automated by script; 2) Ensured UID consistency of repeatedly processed or added data; 3) Fine control of anonymization potentially linking other databases; 4) Web preview of anonymized data for QA; 5) Export of the same patient to multiple trials will generate different UIDs; 6) Easy data management.
Materials & Methods
The system uses the open source Conquest DICOM server  augmented with Lua scripts. The process is implemented in two stages: received data is first staged (anonymized and stored) and processed later to output to one or more clinical trial databases (Fig. 1). The first stage reduces security constraints for the anonymization server and preview. Linkage between hospital ID, trial ID and subject key is provided by external databases (e.g. in csv form). The system provides query/retrieve, as well as web preview of anonymized data. The following operations are supported by easily configured scripts:
• Bulk DICOM data retrieve for anonymization [trial_bulk_get.lua]
• Anonymize received data for staging [trial_anonymize.lua]
• Reject identifiable images and/or wipe burned-in annotations [trial_filter.lua]
• Collect/update data for submission to clinical trial [trial_collect.lua]
• Cleanup staged data for specified trial [trial_cleanup.lua]
• Restore staged data for specified trial from PACS [trial_restore.lua]
If key and modification tables are kept, all other data can be restored from the PACS. Output is a zip file per subject/study per trial; various upload mechanisms are supported.
The server was developed on Conquest DICOM server version 1.4.19beta. The system uses modification tables and therefore performs pseudonymization, which in the UK is considered equivalent to anonymization if the key tables are kept private to the originating hospital . The function that maps a key or UID was extended to include the trial ID. The existing anonymization script was adapted and scripts (+/- 20 lines, simple to modify) were written for the operations listed above. Staging, restore and final anonymization speed is ~23 MB/s. RT plans from multiple vendors processed correctly, and were visualized in Mirada validating correct UID assignment. Anonymised data from the same patient for different trials loaded in the same DICOM database without UID conflicts, while repeatedly processed data for the same trial was identical. Scripts were posted on the Conquest forum.
Figure 1: Logical data flow and use of the anonymization server. Selected PACS data is pushed or pulled into the system. Data received is instantly anonymized (stage 1) with all modifications stored in the modification table a. QA of the anonymization is possible through a web-based preview system that runs the final anonymization scripts. A subject key table is provided for each clinical trial, linking the hospital’s patient ID to a trial specific subject key. When a researcher (PhD) requests the data, the staged data is re-anonymized (stage 2, tables b and c) and shipped to the specific trial storage. The second anonymization process keeps UIDs in each trial database globally unique.
Discussion & Conclusions
Even though DICOM anonymization is considered a standard operation it is still mostly performed manually and is error prone. Using the scripting capabilities of an open source DICOM server, an anonymization server is provided for bulk anonymization. The system was written for a particular server but the principles could be applied to other toolkits. However, Conquest’s scripting capability and associated debugger ease development. To ensure data consistency key and modification tables need to be kept indefinitely. All other data in the system can be regenerated providing the source data is still in the PACS. We conclude that we have developed a robust and practical anonymization suite for DICOM data that allows automating and centralizing this task and eases management and QA.
 van Herk M: Integration of a clinical trial database with a PACS. JPCS 489:12099, 2014
 Haak D, et al. J Digit Imaging. 2015 Oct;28(5):558-66.
 Sariyar M, et al, Sharing and Reuse of Sensitive Data and Samples .. Legal Requirements. Biopreserv Biobank. 2015 Aug;13(4):263-70.
Acknowledgements & Disclosures
Lambert Zijp and Lennert Ploeger are acknowledged for development and support.