[DCRM-L] OCLC implementing AI to further scale and accelerate WorldCat de-duplication

James,Kate jamesk at oclc.org
Wed Feb 5 16:30:43 MST 2025


OCLC Metadata Quality teams implement a variety of measures-both manual and automated-to improve the quality and usefulness of WorldCat data. These extensive and ongoing efforts ensure that WorldCat data supports the needs of our membership and our global network of thousands of libraries across a wide range of services. As the technologies and tools that allow us to do this important work evolve, we are continually exploring new methods for enriching, repairing, and de-duplicating WorldCat records-data that powers the global discovery and sharing of library resources.

At OCLC, we believe Artificial Intelligence (AI) is at its best when guided by human expertise. Our journey with AI is a partnership-where the insights and values of library professionals shape how AI serves communities. A core component of many AI systems is machine learning, which involves training algorithms on data to enable them to make predictions or decisions without explicit programming.

We'll soon implement the latest of our AI record-merging technology as part of our ongoing efforts to resolve duplicate records in WorldCat. On 11 February 2025, we will do a test run of 500,000 record pairs, targeting only print English books in WorldCat, and merging 500,000 duplicate records. Print English books represent the largest category of duplicates in WorldCat and is the format that has been most rigorously tested and improved in our machine learning de-duplication activities to date.

Read the full announcement<https://www.oclc.org/en/news/announcements/2025/ai-worldcat-deduplication.html>

Cleaning up duplicate records is one of the most impactful ways to improve the quality of WorldCat. Manual efforts by metadata professionals paired with the latest AI technology have led to significant success in reducing the number of duplicates.

For additional information we also invite you to attend the OCLC cataloging community meeting on 12 February 2025<https://www.oclc.org/en/events/2025/cataloging-community-meeting-february2025.html> to learn more about WorldCat duplicate resolution efforts and machine learning.


Please excuse duplication of this message.

Kate


Kate James  (she/her/hers)
OCLC * Program Coordinator- Metadata Engagement, Global Product Management
6565 Kilgour Place, Dublin, Ohio, 43017  United States
[cid:image001.png at 01DB77FC.105B1470]<https://help.oclc.org/WorldCat/Metadata_Quality/AskQC>
OCLC.org<https://www.oclc.org/en/home.html>* Twitter<http://twitter.com/oclc> * Facebook<http://www.facebook.com/pages/OCLC/20530435726> * YouTube <http://www.youtube.com/OCLCvideo> * LinkedIn<https://www.linkedin.com/company/oclc>* Instagram<https://www.instagram.com/oclc_global/> *Next blog<http://www.oclc.org/blog/main/>





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20250205/549c7462/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 15635 bytes
Desc: image001.png
URL: <http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20250205/549c7462/attachment-0001.png>


More information about the DCRM-L mailing list