[DCRM-L] OCLC's Glenn Patton on merging and de-duplication of WorldCat records

Dooley,Jackie dooleyj at oclc.org
Thu May 20 12:11:45 MDT 2010


I brought the recent conversation among Richard Noble and others to the
attention of Glenn Patton, OCLC's long-time expert in quality control
issues (including record de-duplication), and he provided the
information below.  In a nutshell: OCLC does not de-dup records for any
pre-1800 imprints, given the complexities in determining what
constitutes a "duplicate." Further conversation on this would be
welcomed if those in the rare book cataloging community would find it
useful. Issues relating to when to input a new record are pertinent.

 

Best to all, Jackie

 

Jackie Dooley

Consulting Archivist

OCLC Research and the RLG Partnership

 

 

OCLC's Duplicate Detection and Resolution software (DDR) does not merge
records if one of the imprint dates is pre-1800, nor would OCLC staff
merge records in this situation unless it were absolutely clear that the
records represented the same item (but we would be willing to work with
someone who had gone through the effort of working out which were true
duplicates and which weren't).  

 

While the matching software used to load records prepared in external
systems into WorldCat is very similar to that used in DDR, it does not
include the pre-1800 exclusion.  We could consider some more complex
exclusions that would be based on the 040 $e coding (e.g., exclude all
with a 'dcrb[x]' code and  its predecessor codes) if the rare book
community felt this would be desirable.

 

It's certainly true that a WorldCat record can end up with holdings
attached that represent variations of the item described in the
bibliographic record.  OCLC matching has not always been as restrictive
as it is now, and catalogers certainly may have chosen "close" master
records and then made adaptations in their local systems.

 

The issue of not recording an edition statement based on a reference
source is a very problematic one.  Having an edition statement (even a
bracketed one) would, I believe, prevent mismatches in both DDR and
Batchload; having that information in the "first note"  (which I assume
would be a 500, since the 503 is no longer valid) is not the sort of
thing that is "actionable" from a machine matching perspective.

 

It would be useful to carry forward this discussion with the rare book
community.  Nobody wants to play "fast and loose" with record merging,
but, on the other hand, I don't think people really want a situation
where there's no attempt to match at all.

 

Glenn E. Patton

Director, WorldCat Quality Management

OCLC

6565 Kilgour Place

Dublin  OH  43017-3395

Phone: +1.800.828.5878, ext. 6371 or +1.614.764.6371

Fax: +1.614.718.7187

Email: pattong at oclc.org

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20100520/a1c02830/attachment.htm 


More information about the DCRM-L mailing list