<HTML>

<HEAD>

<TITLE>Re: [DCRM-L] OCLC de-duping algorithms and dates of publication</TITLE>

</HEAD>

<BODY>

<FONT COLOR="#1E487C"><FONT SIZE="4"><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>It&#8217;s already possible for others to have IRs, but there&#8217;s a fee involved. It was a significant endeavor for OCLC to incorporate those from RLIN into the master-record environment, and adding more involves extra effort.<BR>

<BR>

I don&#8217;t know to what non-ex-RLGs have adopted IRs; it does seem to me that it falls to special collections folks to convince their library administrations that it&#8217;s sufficiently important to do so. You probably know that IRs are accessible to all via Connexion and FirstSearch, but not via worldcat.org or WorldCat Local (which is based on .org). As I&#8217;ve communicated to this listserv in the past, however, the WorldCat Local developers are at work in this fiscal year to enable additional &#8220;local&#8221; data to be retained and displayed.<BR>

&nbsp;<BR>

Many of you probably are aware that OCLC Research has implemented algorithms for establishing FRBR work sets in WorldCat. Ideally, this would concatenate for the public display the many dupes and variants for early imprints. I&#8217;m checking with my colleagues who are on that project to find out to what extent records for pre-1801 materials get picked up. It&#8217;s bound to be messy because of all the inevitable title variations. <BR>

<BR>

In the meantime, perhaps some of you would like to test that a bit with your favorite messy record sets? <BR>

&nbsp;<BR>

-Jackie<BR>

</SPAN></FONT></FONT></FONT><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:10pt'><BR>

</SPAN><FONT SIZE="4"><SPAN STYLE='font-size:10.5pt'>-- <BR>

Jackie Dooley<BR>

Program Officer<BR>

OCLC Research and the RLG Partnership<BR>

<BR>

949.492.5060 (work/home) -- Pacific Time<BR>

949.295.1529 (mobile)<BR>

<BR>

<BR>

<BR>

</SPAN></FONT><SPAN STYLE='font-size:10pt'><HR ALIGN=CENTER SIZE="3" WIDTH="95%"><B>From: </B>&quot;Auyong, Dorothy&quot; &lt;<a href="dauyong@huntington.org">dauyong@huntington.org</a>&gt;<BR>

<B>Reply-To: </B>DCRM-L &lt;<a href="dcrm-l@lib.byu.edu">dcrm-l@lib.byu.edu</a>&gt;<BR>

<B>Date: </B>Wed, 3 Nov 2010 11:15:35 -0700<BR>

<B>To: </B>DCRM-L &lt;<a href="dcrm-l@lib.byu.edu">dcrm-l@lib.byu.edu</a>&gt;<BR>

<B>Subject: </B>Re: [DCRM-L] OCLC de-duping algorithms and dates of publication<BR>

<BR>

</SPAN><FONT COLOR="#1F497D"><FONT SIZE="4"><SPAN STYLE='font-size:11pt'>Jackie,<BR>

&nbsp;<BR>

This seems to be an issue where it might be appropriate for OCLC to revisit the idea of allowing non-RLG legacy libraries to contribute IRs?  It would help not &#8220;clutter&#8221; up a Master record, but still allow DCRM enriched content to be easily available to the greater WorldCat community.  <BR>

&nbsp;<BR>

</SPAN></FONT></FONT><SPAN STYLE='font-size:10pt'><BR>

</SPAN></FONT><SPAN STYLE='font-size:10pt'><FONT COLOR="#1F497D"><FONT FACE="Arial">Dorothy Auyong<BR>

Principal Rare Book Cataloger<BR>

Huntington Library<BR>

<a href="dauyong@huntington.org">dauyong@huntington.org</a><BR>

</FONT></FONT></SPAN><FONT COLOR="#1F497D"><FONT SIZE="4"><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'> <BR>

</SPAN></FONT></FONT></FONT><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:10pt'><BR>

</SPAN></FONT><SPAN STYLE='font-size:10pt'><FONT FACE="Tahoma, Verdana, Helvetica, Arial"><B>From:</B> <a href="dcrm-l-bounces@lib.byu.edu">dcrm-l-bounces@lib.byu.edu</a> [<a href="mailto:dcrm-l-bounces@lib.byu.edu">mailto:dcrm-l-bounces@lib.byu.edu</a>] <B>On Behalf Of </B>Ann W. Copeland<BR>

<B>Sent:</B> Wednesday, November 03, 2010 10:56 AM<BR>

<B>To:</B> DCRM Revision Group List<BR>

<B>Subject:</B> Re: [DCRM-L] OCLC de-duping algorithms and dates of publication<BR>

</FONT></SPAN><FONT SIZE="4"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:12pt'> <BR>

Jackie,<BR>

<BR>

We did discuss this at what used to be called MASC last winter. Here from the minutes:<BR>

<BR>

<B> A)</B>. <B><I>OCLC issues.</I></B> <BR>

&nbsp;<BR>

<I> Given the new functionality available in OCLC to improve records, are catalogers working &nbsp;differently --for example, routinely adding genre/form terms to master records? <BR>

</I> <BR>

&nbsp;Some participants said they search the OCLC database for a suitable record to enhance using &nbsp;DCRM(B) cataloging rules and/or they add genre terms and notes to AACR2 records, others said &nbsp;they upgrade their records only in their local database. The concern that other catalogers could delete the information in enhanced records in OCLC was mentioned as was the belief that &nbsp;public services librarians would prefer less elaborate records. &nbsp;<BR>

&nbsp;<BR>

&nbsp;Annie Copland reported that on behalf of the RBMS Bibliographic Standards Committee she had written to OCLC to inquire about the possibility of OCLC allowing duplicate records for the same item, one record cataloged according to AACR2 and another according to DCRM. OCLC responded that rather than allowing permissible duplicates, they prefer having the DCRM record, as the one containing the most information, be the master record. OCLC wondered how libraries would react to this change. &nbsp;A show of hands of MASC participants was called for and a large majority indicated their preference for the DCRM record being the master record. &nbsp;Some attendees asked to have an OCLC representative at a future MASC meeting to discuss master records, duplicate records and proliferation of records in the database.<BR>

<BR>

Glenn then issued this in May I believe:<BR>

<FONT COLOR="#1F497D"> <BR>

</FONT></SPAN></FONT></FONT><FONT COLOR="#1F497D"><FONT FACE="Arial Unicode MS"><SPAN STYLE='font-size:10pt'>OCLC</SPAN></FONT><SPAN STYLE='font-size:10pt'><FONT FACE="Times New Roman">&#8217;</FONT><FONT FACE="Arial Unicode MS">s Duplicate Detection and Resolution software (DDR) does not merge records if one of the imprint dates is pre-1800, nor would OCLC staff merge records in this situation unless it were absolutely clear that the records represented the same item (but we would be willing to work with someone who had gone through the effort of working out which were true duplicates and which weren</FONT><FONT FACE="Times New Roman">&#8217;</FONT><FONT FACE="Arial Unicode MS">t).</FONT><FONT FACE="Times New Roman"> </FONT><FONT FACE="Arial Unicode MS"> <BR>

&nbsp;<BR>

While the matching software used to load records prepared in external systems into WorldCat is very similar to that used in DDR, it does not include the pre-1800 exclusion.</FONT><FONT FACE="Times New Roman"> </FONT><FONT FACE="Arial Unicode MS">We could consider some more complex exclusions that would be based on the 040 $e coding (e.g., exclude all with a </FONT><FONT FACE="Times New Roman">&#8216;</FONT><FONT FACE="Arial Unicode MS">dcrb[x]</FONT><FONT FACE="Times New Roman">&#8217;</FONT><FONT FACE="Arial Unicode MS"> code and &nbsp;its predecessor codes) if the rare book community felt this would be desirable.<BR>

&nbsp;<BR>

It</FONT><FONT FACE="Times New Roman">&#8217;</FONT><FONT FACE="Arial Unicode MS">s certainly true that a WorldCat record can end up with holdings attached that represent variations of the item described in the bibliographic record.</FONT><FONT FACE="Times New Roman"> </FONT><FONT FACE="Arial Unicode MS"> OCLC matching has not always been as restrictive as it is now, and catalogers certainly may have chosen &#8220;close&#8221; master records and then made adaptations in their local systems.<BR>

&nbsp;<BR>

The issue of not recording an edition statement based on a reference source is a very problematic one.</FONT><FONT FACE="Times New Roman"> </FONT><FONT FACE="Arial Unicode MS">Having an edition statement (even a bracketed one) would, I believe, prevent mismatches in both DDR and Batchload; having that information in the &#8220;first note&#8221; </FONT><FONT FACE="Times New Roman"> </FONT><FONT FACE="Arial Unicode MS">(which I assume would be a 500, since the 503 is no longer valid) is not the sort of thing that is &#8220;actionable&#8221; from a machine matching perspective.<BR>

&nbsp;<BR>

It would be useful to carry forward this discussion with the rare book community.</FONT><FONT FACE="Times New Roman"> </FONT><FONT FACE="Arial Unicode MS"> Nobody wants to play &#8220;fast and loose&#8221; with record merging, but, on the other hand, I don</FONT><FONT FACE="Times New Roman">&#8217;</FONT><FONT FACE="Arial Unicode MS">t think people really want a situation where there</FONT><FONT FACE="Times New Roman">&#8217;</FONT><FONT FACE="Arial Unicode MS">s no attempt to match at all.<BR>

</FONT></SPAN></FONT><SPAN STYLE='font-size:10pt'><FONT FACE="Times New Roman"> <BR>

</FONT></SPAN><FONT FACE="Times New Roman"><FONT COLOR="#1F497D"><FONT SIZE="4"><SPAN STYLE='font-size:12pt'>Glenn E. Patton<BR>

Director, WorldCat Quality Management<BR>

</SPAN></FONT></FONT><FONT SIZE="4"><SPAN STYLE='font-size:12pt'><BR>

I'm not sure where we want to go with this now. <BR>

<BR>

Thanks, Annie<BR>

<BR>

On 11/3/2010 1:22 PM, Dooley,Jackie wrote: <BR>

</SPAN></FONT></FONT><FONT SIZE="4"><FONT COLOR="#1F497D"><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Big questions acout which, IMHO, Bib Standards oughta have discussions. -Jackie<BR>

&nbsp;<BR>

</SPAN></FONT></FONT></FONT><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:10pt'><BR>

</SPAN></FONT><SPAN STYLE='font-size:10pt'><FONT FACE="Tahoma, Verdana, Helvetica, Arial"><B>From:</B> <a href="dcrm-l-bounces@lib.byu.edu">dcrm-l-bounces@lib.byu.edu</a> [<a href="mailto:dcrm-l-bounces@lib.byu.edu">mailto:dcrm-l-bounces@lib.byu.edu</a>] <B>On Behalf Of </B>Deborah J. Leslie<BR>

<B>Sent:</B> Wednesday, November 03, 2010 7:35 AM<BR>

<B>To:</B> DCRM Revision Group List<BR>

<B>Subject:</B> Re: [DCRM-L] OCLC de-duping algorithms and dates of publication<BR>

</FONT></SPAN><FONT SIZE="4"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:12pt'> <BR>

</SPAN></FONT><SPAN STYLE='font-size:12pt'><FONT COLOR="#1F497D"><FONT FACE="Calibri, Verdana, Helvetica, Arial">Thanks for Annie&#8217;s comment. I have mixed feelings about the no de-duping of pre-1801 publications. Would OCLC really give preference to dcrm records if they were to de-dupe? Even over pcc records? &nbsp;&nbsp;<BR>

</FONT></FONT></SPAN><FONT COLOR="#1F497D"><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>__________________________________________<BR>

</SPAN></FONT></FONT></FONT><FONT COLOR="#1F497D"><FONT FACE="Calibri, Verdana, Helvetica, Arial"><FONT SIZE="2"><SPAN STYLE='font-size:9pt'>Deborah J. Leslie, M.A., M.L.S.<BR>

RBMS past chair 2010-2011 | Head of Cataloging, Folger Shakespeare Library<BR>

201 East Capitol St., S.E., Washington, D.C. 20003 | 202.675-0369 (phone) &nbsp;202.675-0328 (fax) | <a href="djleslie@folger.edu">djleslie@folger.edu</a> &nbsp;| www.folger.edu &lt;<a href="http://www.folger.edu">http://www.folger.edu</a>&gt; <BR>

</SPAN></FONT><FONT SIZE="4"><SPAN STYLE='font-size:12pt'> <BR>

&nbsp;<BR>

</SPAN></FONT></FONT></FONT><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:10pt'><BR>

</SPAN></FONT><SPAN STYLE='font-size:10pt'><FONT FACE="Tahoma, Verdana, Helvetica, Arial"><B>From:</B> <a href="dcrm-l-bounces@lib.byu.edu">dcrm-l-bounces@lib.byu.edu</a> [<a href="mailto:dcrm-l-bounces@lib.byu.edu">mailto:dcrm-l-bounces@lib.byu.edu</a>] <B>On Behalf Of </B>ANN W. COPELAND<BR>

<B>Sent:</B> Tuesday, 02 November, 2010 22:45<BR>

<B>To:</B> Erin Blake<BR>

<B>Cc:</B> DCRM Revision Group List<BR>

<B>Subject:</B> Re: [DCRM-L] OCLC de-duping algorithms and dates of publication<BR>

</FONT></SPAN><FONT SIZE="4"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:12pt'> <BR>

</SPAN></FONT></FONT><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:10pt'><BR>

</SPAN></FONT><FONT SIZE="4"><FONT FACE="Times New Roman"><SPAN STYLE='font-size:12pt'>Interestingly, when we asked about permissible duplicates (one DCRM, one AACR2) OCLC said they did NOT want duplicate records. Instead they wanted to merge records with the DCRM record surviving as the master record. So, why exempt pre-1800 books from the de-duping? Why not work the algorithm to favor DCRM? <BR>

<BR>

Thanks, Annie<BR>

<BR>

<BR>

</SPAN></FONT></FONT><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:10pt'><BR>

</SPAN></FONT>

</BODY>

</HTML>