[DCRM-L] OCLC's Glenn Patton on merging and de-duplication of WorldCat records

Deborah J. Leslie DJLeslie at FOLGER.edu
Thu May 20 14:06:15 MDT 2010


I would welcome an exclusion of dcrm and predecessor codes from matching
algorithms. 

 

The exclusion of pre-1801 imprints explains the large numbers of
duplicate results for early printed books. I'm not sure of its value;
removing this exclusion while adding one for the 040 seems like it might
be a better way to go, but I bow to those with longer and more
experience of using WorldCat.

 

From: dcrm-l-bounces at lib.byu.edu [mailto:dcrm-l-bounces at lib.byu.edu] On
Behalf Of Dooley,Jackie
Sent: Thursday, 20 May, 2010 14:12
To: DCRM Revision Group List
Cc: Chapman,John; Patton,Glenn
Subject: [DCRM-L] OCLC's Glenn Patton on merging and de-duplication
ofWorldCat records

 

I brought the recent conversation among Richard Noble and others to the
attention of Glenn Patton, OCLC's long-time expert in quality control
issues (including record de-duplication), and he provided the
information below.  In a nutshell: OCLC does not de-dup records for any
pre-1800 imprints, given the complexities in determining what
constitutes a "duplicate." Further conversation on this would be
welcomed if those in the rare book cataloging community would find it
useful. Issues relating to when to input a new record are pertinent.

 

Best to all, Jackie

 

Jackie Dooley

Consulting Archivist

OCLC Research and the RLG Partnership

 

 

OCLC's Duplicate Detection and Resolution software (DDR) does not merge
records if one of the imprint dates is pre-1800, nor would OCLC staff
merge records in this situation unless it were absolutely clear that the
records represented the same item (but we would be willing to work with
someone who had gone through the effort of working out which were true
duplicates and which weren't).  

 

While the matching software used to load records prepared in external
systems into WorldCat is very similar to that used in DDR, it does not
include the pre-1800 exclusion.  We could consider some more complex
exclusions that would be based on the 040 $e coding (e.g., exclude all
with a 'dcrb[x]' code and  its predecessor codes) if the rare book
community felt this would be desirable.

 

It's certainly true that a WorldCat record can end up with holdings
attached that represent variations of the item described in the
bibliographic record.  OCLC matching has not always been as restrictive
as it is now, and catalogers certainly may have chosen "close" master
records and then made adaptations in their local systems.

 

The issue of not recording an edition statement based on a reference
source is a very problematic one.  Having an edition statement (even a
bracketed one) would, I believe, prevent mismatches in both DDR and
Batchload; having that information in the "first note"  (which I assume
would be a 500, since the 503 is no longer valid) is not the sort of
thing that is "actionable" from a machine matching perspective.

 

It would be useful to carry forward this discussion with the rare book
community.  Nobody wants to play "fast and loose" with record merging,
but, on the other hand, I don't think people really want a situation
where there's no attempt to match at all.

 

Glenn E. Patton

Director, WorldCat Quality Management

OCLC

6565 Kilgour Place

Dublin  OH  43017-3395

Phone: +1.800.828.5878, ext. 6371 or +1.614.764.6371

Fax: +1.614.718.7187

Email: pattong at oclc.org

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20100520/312593cc/attachment.htm 


More information about the DCRM-L mailing list