[DCRM-L] OCLC's duplicate detection & resolution software: two questions for the rare and archival materials communities

Lapka, Francis francis.lapka at yale.edu
Fri Sep 4 07:00:34 MDT 2015


Jackie,

I'm grateful for your message, and pleased to hear that OCLC is considering changes "to expand and strengthen the safeguards we already apply to bibliographic records for unique, rare, and/or archival materials."
At first blush, it would seem that moving the chronological exception for de-duping to an earlier date might *weaken* the safeguards, since it would make the exception apply to a smaller set of records. Could you tell us more about the motivation for this particular change and how it might serve to strengthen the safeguards?

Thanks
Francis


On Fri, Sep 04, 2015 at 4:18 AM, Dooley,Jackie <dooleyj at oclc.org<mailto:dooleyj at oclc.org>> wrote:

                Dear DCRM-L --

On behalf of my colleagues on OCLC's Metadata Quality Team, I'm writing to pose two questions: 1) whether the pre-1801 cutoff for excluding records from de-duplication should be changed to an earlier date, and 2) whether additional cataloging code symbols should be added to the 040 $e exception.

We're considering changes to the automated Duplicate Detection and Resolution (DDR) software and are seeking community opinion before taking action. The contemplated changes are intended to expand and strengthen the safeguards we already apply to bibliographic records for unique, rare, and/or archival materials. As members of the rare and/or archival cataloging community, you are in an excellent position to provide informed advice on these issues.

First, some background. OCLC first developed the capability to merge bibliographic records manually in 1983. During the late 1980s and early 1990s, we developed automated DDR software, which dealt with Books records only. From 2005 through 2009, OCLC developed a completely new version of DDR that worked with all bibliographic formats. From the very beginning of automated DDR back in 1991, records for resources with dates of publication/production earlier than 1801 have been set aside and not processed. More recently, in consultation with the American Library Association (ALA) Map and Geospatial Information Round Table (MAGIRT) Cataloging and Classification Committee (CCC), we have further exempted records for cartographic materials with dates of publication earlier than 1901. In addition, we exempt from DDR processing all records for resources that can be identified as photographs (Material Types "pht" for photograph and/or "pic" for picture).

Following discussions with representatives of the rare materials community several years ago, we also exempted from DDR processing all records that are coded in field 040 subfield $e under description conventions for rare materials codes "bdrb", "dcrb", "dcrmb", or "dcrms." Please note that these DDR exemptions are not intended to apply to electronic, microform, or other reproductions, only to the original resources.

The current DDR software is incredibly complicated and continues to be fine-tuned on a regular basis. Although this is an oversimplification of a complex process, there are now at least two dozen different points of comparison taken into consideration. Many of these comparison points draw data from multiple parts of a bibliographic record and involve manipulation of data in ways designed to distinguish both variations that should be equated and distinctions that must be recognized.
As part of our ongoing efforts to improve DDR's accuracy, we are reaching out again to members of the rare materials and archival resources communities, in particular, for feedback on the following questions:

  1.  Within the context of the materials cataloged by your community, are there dates other than pre-1801 for most resources and pre-1901 for cartographic materials that would make more sense as an exemption cutoff?
  2.  The current list of Description Convention Source Codes, found at http://www.loc.gov/standards/sourcelist/descriptive-conventions.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.loc.gov_standards_sourcelist_descriptive-2Dconventions.html&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=MJfHI5B_tV51Vx2wSKcLJQY4vkqu3ua9UEvXyUqqX8c&e=>, has grown much more extensive in recent years. Aside from the four codes already exempted ("bdrb", "dcrb", "dcrmb", "dcrms"), are there others that it would make sense to consider exempting? Note that Description Convention Source Codes "appm", "dacs", "gihc", and "dcrmg" have already been suggested for adding to the exemption list.

     *   Are there other well-accepted rare and/or archival materials descriptive standards that don't currently have their own code, and so are absent from the MARC Code List? If so, would the relevant community be willing to request codes from LC?
     *   How faithfully do members of the relevant community actually code such records in field 040 subfield $e?

Please reply either to the list or to me directly. We greatly appreciate your input.

Many thanks- Jackie

-

Jackie Dooley

Program Officer, OCLC Research

647 Camino de los Mares, Suite 108-240
San Clemente, CA 92673

office/home 949-492-5060
mobile 949-295-1529
dooleyj at oclc.org

[OCLC]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flogo&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=dnyUTanaqjBHSVV1FdTIEoNm6hDTbjlsRHIvE8OGviQ&e=>

OCLC.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flink&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=TS_w0TQQ5p-iCY6URnpdmON9jBXJFIqhge-Llx6W-ms&e=>/research



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20150904/e8e7106b/attachment-0001.html>


More information about the DCRM-L mailing list