[DCRM-L] OCLC de-duping algorithms and dates of publication

dooleyj dooleyj at oclc.org
Thu Nov 4 10:01:52 MDT 2010


It¹s already possible for others to have IRs, but there¹s a fee involved. It
was a significant endeavor for OCLC to incorporate those from RLIN into the
master-record environment, and adding more involves extra effort.

I don¹t know to what non-ex-RLGs have adopted IRs; it does seem to me that
it falls to special collections folks to convince their library
administrations that it¹s sufficiently important to do so. You probably know
that IRs are accessible to all via Connexion and FirstSearch, but not via
worldcat.org or WorldCat Local (which is based on .org). As I¹ve
communicated to this listserv in the past, however, the WorldCat Local
developers are at work in this fiscal year to enable additional ³local² data
to be retained and displayed.
 
Many of you probably are aware that OCLC Research has implemented algorithms
for establishing FRBR work sets in WorldCat. Ideally, this would concatenate
for the public display the many dupes and variants for early imprints. I¹m
checking with my colleagues who are on that project to find out to what
extent records for pre-1801 materials get picked up. It¹s bound to be messy
because of all the inevitable title variations.

In the meantime, perhaps some of you would like to test that a bit with your
favorite messy record sets?
 
-Jackie

-- 
Jackie Dooley
Program Officer
OCLC Research and the RLG Partnership

949.492.5060 (work/home) -- Pacific Time
949.295.1529 (mobile)




From: "Auyong, Dorothy" <dauyong at huntington.org>
Reply-To: DCRM-L <dcrm-l at lib.byu.edu>
Date: Wed, 3 Nov 2010 11:15:35 -0700
To: DCRM-L <dcrm-l at lib.byu.edu>
Subject: Re: [DCRM-L] OCLC de-duping algorithms and dates of publication

Jackie,
 
This seems to be an issue where it might be appropriate for OCLC to revisit
the idea of allowing non-RLG legacy libraries to contribute IRs?  It would
help not ³clutter² up a Master record, but still allow DCRM enriched content
to be easily available to the greater WorldCat community. 
 

Dorothy Auyong
Principal Rare Book Cataloger
Huntington Library
dauyong at huntington.org
 

From: dcrm-l-bounces at lib.byu.edu [mailto:dcrm-l-bounces at lib.byu.edu] On
Behalf Of Ann W. Copeland
Sent: Wednesday, November 03, 2010 10:56 AM
To: DCRM Revision Group List
Subject: Re: [DCRM-L] OCLC de-duping algorithms and dates of publication
 
Jackie,

We did discuss this at what used to be called MASC last winter. Here from
the minutes:

 A). OCLC issues. 
 
 Given the new functionality available in OCLC to improve records, are
catalogers working  differently --for example, routinely adding genre/form
terms to master records?
 
 Some participants said they search the OCLC database for a suitable record
to enhance using  DCRM(B) cataloging rules and/or they add genre terms and
notes to AACR2 records, others said  they upgrade their records only in
their local database. The concern that other catalogers could delete the
information in enhanced records in OCLC was mentioned as was the belief that
public services librarians would prefer less elaborate records.
 
 Annie Copland reported that on behalf of the RBMS Bibliographic Standards
Committee she had written to OCLC to inquire about the possibility of OCLC
allowing duplicate records for the same item, one record cataloged according
to AACR2 and another according to DCRM. OCLC responded that rather than
allowing permissible duplicates, they prefer having the DCRM record, as the
one containing the most information, be the master record. OCLC wondered how
libraries would react to this change.  A show of hands of MASC participants
was called for and a large majority indicated their preference for the DCRM
record being the master record.  Some attendees asked to have an OCLC
representative at a future MASC meeting to discuss master records, duplicate
records and proliferation of records in the database.

Glenn then issued this in May I believe:
 
OCLC¹s Duplicate Detection and Resolution software (DDR) does not merge
records if one of the imprint dates is pre-1800, nor would OCLC staff merge
records in this situation unless it were absolutely clear that the records
represented the same item (but we would be willing to work with someone who
had gone through the effort of working out which were true duplicates and
which weren¹t).  
 
While the matching software used to load records prepared in external
systems into WorldCat is very similar to that used in DDR, it does not
include the pre-1800 exclusion. We could consider some more complex
exclusions that would be based on the 040 $e coding (e.g., exclude all with
a Œdcrb[x]¹ code and  its predecessor codes) if the rare book community felt
this would be desirable.
 
It¹s certainly true that a WorldCat record can end up with holdings attached
that represent variations of the item described in the bibliographic record.
OCLC matching has not always been as restrictive as it is now, and
catalogers certainly may have chosen ³close² master records and then made
adaptations in their local systems.
 
The issue of not recording an edition statement based on a reference source
is a very problematic one. Having an edition statement (even a bracketed
one) would, I believe, prevent mismatches in both DDR and Batchload; having
that information in the ³first note²  (which I assume would be a 500, since
the 503 is no longer valid) is not the sort of thing that is ³actionable²
from a machine matching perspective.
 
It would be useful to carry forward this discussion with the rare book
community.  Nobody wants to play ³fast and loose² with record merging, but,
on the other hand, I don¹t think people really want a situation where
there¹s no attempt to match at all.
 
Glenn E. Patton
Director, WorldCat Quality Management

I'm not sure where we want to go with this now.

Thanks, Annie

On 11/3/2010 1:22 PM, Dooley,Jackie wrote:
Big questions acout which, IMHO, Bib Standards oughta have discussions.
-Jackie
 

From: dcrm-l-bounces at lib.byu.edu [mailto:dcrm-l-bounces at lib.byu.edu] On
Behalf Of Deborah J. Leslie
Sent: Wednesday, November 03, 2010 7:35 AM
To: DCRM Revision Group List
Subject: Re: [DCRM-L] OCLC de-duping algorithms and dates of publication
 
Thanks for Annie¹s comment. I have mixed feelings about the no de-duping of
pre-1801 publications. Would OCLC really give preference to dcrm records if
they were to de-dupe? Even over pcc records?
__________________________________________
Deborah J. Leslie, M.A., M.L.S.
RBMS past chair 2010-2011 | Head of Cataloging, Folger Shakespeare Library
201 East Capitol St., S.E., Washington, D.C. 20003 | 202.675-0369 (phone)
202.675-0328 (fax) | djleslie at folger.edu  | www.folger.edu
<http://www.folger.edu>
 
 

From: dcrm-l-bounces at lib.byu.edu [mailto:dcrm-l-bounces at lib.byu.edu] On
Behalf Of ANN W. COPELAND
Sent: Tuesday, 02 November, 2010 22:45
To: Erin Blake
Cc: DCRM Revision Group List
Subject: Re: [DCRM-L] OCLC de-duping algorithms and dates of publication
 

Interestingly, when we asked about permissible duplicates (one DCRM, one
AACR2) OCLC said they did NOT want duplicate records. Instead they wanted to
merge records with the DCRM record surviving as the master record. So, why
exempt pre-1800 books from the de-duping? Why not work the algorithm to
favor DCRM? 

Thanks, Annie




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20101104/1a0c4c3e/attachment.htm 


More information about the DCRM-L mailing list