[DCRM-L] OCLC's duplicate detection & resolution software: two questions for the rare and archival materials communities

Kate Moriarty moriarks at slu.edu
Fri Sep 4 11:08:05 MDT 2015


Liz and Lenore,

You're right. I was remembering a discussion about automated RDA changes to
OCLC records. Different issue - my apologies.

-Kate

On Fri, Sep 4, 2015 at 11:34 AM, Rouse, Lenore <rouse at cua.edu> wrote:

> This is probably a dumb question, but even without amremm in a record,
> under  what circumstance would OCLC ever merge a record for a *manuscript*,
> which by definition is unique? I've operated under the assumption that I
> would never have to worry about our ms. records being merged.
>
> Re Jackie's question - I now catalog practically everything as DCRM but
> this was not the case in this institution until perhaps 10 years ago or
> whenever I wised up.  I haven't recataloged AACR2 records into dcrm either.
> So there are indeed many post 1801 items that might easily succumb to
> merging. I'd argue for an 1840 or 1850 cutoff date but that might be too
> conservative for some.
> Lenore
>
> --
> Lenore M. Rouse
> Curator, Rare Books and Special Collections
> The Catholic University of America
> Room 214, Mullen Library
> 620 Michigan Avenue N.E.
> Washington, D.C. 20064
>
> PHONE: 202 319-5090
> E-MAIL: rouse at cua.edu
> RBSC BLOG: http://ascendonica.blogspot.com/
>
>
>
> On 9/4/2015 11:37 AM, Kate Moriarty wrote:
>
> Thank you for this, Jackie and John.
>
> As others have stated, I would be in favor of moving the cut-off date to a
> later date, though I'll leave it to those with a larger post-1801
> collection to suggest a specific date.
>
> Jackie, regarding your 2nd question, I believe you mentioned last year
> that OCLC would be adding "amremm" to the list of 040 $e DDR exemptions.
> You said it wouldn't be easy - have you had any success with it?
>
> And in answer to your last question, we regularly code the 040 $e here
> and, at least from the records I see in OCLC, it seems like others do, too.
>
> Thanks,
> Kate
>
> On Fri, Sep 4, 2015 at 9:51 AM, Chapman,John <chapmanj at oclc.org> wrote:
>
>> Richard and Francis,
>>
>> We are asking if the 1801 cutoff (or the 1901 cartographic exception
>> date) need to be adjusted, but are not suggesting that it should be
>> earlier. We would expect that, if a change is agreed upon, the dates would
>> be later.
>>
>> We are asking the question of the DCRM-L community to see if there is any
>> consensus that can be reached about a change, or if the current scheme is
>> logical and can remain. The context that Richard provided should be helpful
>> in the discussion.
>>
>> --
>> John Chapman
>> OCLC · Product Manager, Metadata Services
>> 6565 Kilgour Place, Dublin, OH 43017 USA
>> T +1-614-761-5272
>>
>>
>> From: < <dcrm-l-bounces at lib.byu.edu>dcrm-l-bounces at lib.byu.edu> on
>> behalf of "Noble, Richard"
>> Reply-To: DCRM Users' Group
>> Date: Friday, September 4, 2015 at 10:23 AM
>> To: DCRM Users' Group
>> Subject: Re: [DCRM-L] OCLC's duplicate detection & resolution software:
>> two questions for the rare and archival materials communities
>>
>> Quick response: the cut-off for books should, if anything, be later, not
>> earlier. The year 1801 is arbitrary, as much established as it is in
>> national bibliographies and the like. It seems to be understood as the end
>> of the "hand-press period", which is historically not the case. For English
>> books that would be no earlier than 1820, and for some continental books
>> even later (I see German books of the 1840s printed direct from type on
>> handmade laid paper, for instance).
>>
>> But the bibliographical significance of "hand-press" has been great
>> exaggerated. While printers become more and more adept at covering their
>> tracks as the c19 proceeds, bibliographical analysis and description are
>> very much applicable to post-1801 books and post "hand-press" books, for
>> the most basic of our FRBR purposes: the identification of manifestations,
>> and, at the most learned level, the specification of diagnostic evidence
>> for distinction of manifestations, as well as explicit accounting for
>> evidence of variation within the body of items that constitute a
>> manifestation.
>>
>> That said, I suppose--assuming that the exemption of dcrm records from
>> automatic de-duping continues--the idea is to establish criteria by which
>> to exempt a range of non-dcrm records as well. Earlier versions of dcrm
>> tended to emphasize 1801/"hand-press period" as a cutoff for application of
>> the special rules (and the consequent finer-grained analysis of supporting
>> evidence and variation), so it it made sense of a kind to specify that
>> range. As tempting as it is, however, to limit dcrm to hand-press books
>> because it is easier to analyze and describe them, I know from considerable
>> experience that post-1801 books printed from plates, perhaps based on
>> mechanical composition, are equally and more subtly variable.
>>
>> The whole body of pre-1801 works forms, I presume, a relatively small
>> percentage of the material represented in the database, though the mass of
>> duplicate records generated by uploading of incommensurably cataloged
>> material is considerable. The problem is not so much the conflation of
>> different manifestations indifferently described, as it is the loss of
>> information that takes place when merged records are expunged, which
>> precludes conscious and focused comparison--by catalogers well versed in
>> the vagaries of legacy and minimal cataloging--as a check on de-duping
>> errors.
>>
>> I would be dismayed to see an irreversible process applied to an even
>> greater range of materials than before. IRs being a lost cause, this would
>> be mitigated to some extent if records represented in 019 fields could be
>> preserved for inspection (beyond the current brief grace period) in such a
>> way as not to impede the operations of the WorldCat as a whole. But as
>> Francis Lapka pointed out, the regression of the date cutoff does seem to
>> be a retraction, not an expansion, of safeguards.
>>
>> RICHARD NOBLE :: RARE MATERIALS CATALOGUER :: JOHN HAY LIBRARY
>> BROWN UNIVERSITY  ::  PROVIDENCE, R.I. 02912  ::  401-863-1187
>> <Richard_Noble at Br <RICHARD_NOBLE at BROWN.EDU>own.edu>
>>
>> On Fri, Sep 4, 2015 at 9:00 AM, Lapka, Francis <francis.lapka at yale.edu>
>> wrote:
>>
>>> Jackie,
>>>
>>> I'm grateful for your message, and pleased to hear that OCLC is
>>> considering changes "to expand and strengthen the safeguards we already
>>> apply to bibliographic records for unique, rare, and/or archival materials."
>>>
>>> At first blush, it would seem that moving the chronological exception
>>> for de-duping to an earlier date might *weaken* the safeguards, since it
>>> would make the exception apply to a smaller set of records. Could you tell
>>> us more about the motivation for this particular change and how it might
>>> serve to strengthen the safeguards?
>>>
>>>
>>>
>>> Thanks
>>>
>>> Francis
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Sep 04, 2015 at 4:18 AM, Dooley,Jackie < <dooleyj at oclc.org>
>>> dooleyj at oclc.org> wrote:
>>>
>>>
>>>
>>>                 Dear DCRM-L --
>>>
>>>
>>>
>>> On behalf of my colleagues on OCLC's Metadata Quality Team, I'm writing
>>> to pose two questions: 1) whether the pre-1801 cutoff for excluding records
>>> from de-duplication should be changed to an earlier date, and 2) whether
>>> additional cataloging code symbols should be added to the 040 $e exception.
>>>
>>>
>>>
>>> We're considering changes to the automated Duplicate Detection and
>>> Resolution (DDR) software and are seeking community opinion before taking
>>> action. The contemplated changes are *intended to expand and strengthen
>>> the safeguards we already apply to bibliographic records for unique, rare,
>>> and/or archival materials*. As members of the rare and/or archival
>>> cataloging community, you are in an excellent position to provide informed
>>> advice on these issues.
>>>
>>>
>>>
>>> First, some background. OCLC first developed the capability to merge
>>> bibliographic records manually in 1983. During the late 1980s and early
>>> 1990s, we developed automated DDR software, which dealt with Books records
>>> only. From 2005 through 2009, OCLC developed a completely new version of
>>> DDR that worked with all bibliographic formats. From the very beginning of
>>> automated DDR back in 1991, *records for resources with dates of
>>> publication/production earlier than 1801 have been set aside and not
>>> processed*. More recently, in consultation with the American Library
>>> Association (ALA) Map and Geospatial Information Round Table (MAGIRT)
>>> Cataloging and Classification Committee (CCC), we have further *exempted
>>> records for cartographic materials with dates of publication earlier than
>>> 1901*. *In addition, *we exempt from DDR processing all records for
>>> resources that can be identified as* photographs (Material Types “pht”
>>> for photograph and/or “pic” for picture)*.
>>>
>>>
>>>
>>> Following discussions with representatives of the rare materials
>>> community several years ago, *we also exempted from DDR processing all
>>> records that are coded in field 040 subfield $e under description
>>> conventions for rare materials codes "bdrb", "dcrb", "dcrmb”, or “dcrms*.”
>>> Please note that these DDR exemptions are *not* intended to apply to
>>> electronic, microform, or other reproductions, only to the original
>>> resources.
>>>
>>>
>>>
>>> The current DDR software is incredibly complicated and continues to be
>>> fine-tuned on a regular basis. Although this is an oversimplification of a
>>> complex process, there are now at least two dozen different points of
>>> comparison taken into consideration. Many of these comparison points draw
>>> data from multiple parts of a bibliographic record and involve manipulation
>>> of data in ways designed to distinguish both variations that should be
>>> equated and distinctions that must be recognized.
>>>
>>> As part of our ongoing efforts to improve DDR’s accuracy, we are
>>> reaching out again to members of the rare materials and archival resources
>>> communities, in particular, for feedback on the following questions:
>>>
>>>
>>>    1. Within the context of the materials cataloged by your community,
>>>    are there dates other than pre-1801 for most resources and pre-1901 for
>>>    cartographic materials that would make more sense as an exemption cutoff?
>>>    2. The current list of Description Convention Source Codes, found at
>>>    <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.loc.gov_standards_sourcelist_descriptive-2Dconventions.html&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=MJfHI5B_tV51Vx2wSKcLJQY4vkqu3ua9UEvXyUqqX8c&e=>
>>>    http://www.loc.gov/standards/sourcelist/descriptive-conventions.html,
>>>    has grown much more extensive in recent years. Aside from the four codes
>>>    already exempted ("bdrb", "dcrb", "dcrmb”, “dcrms”), are there others that
>>>    it would make sense to consider exempting? Note that Description Convention
>>>    Source Codes “appm”, “dacs”, “gihc”, and “dcrmg” have already been
>>>    suggested for adding to the exemption list.
>>>
>>>
>>>    1. Are there other well-accepted rare and/or archival materials
>>>       descriptive standards that don’t currently have their own code, and so are
>>>       absent from the MARC Code List? If so, would the relevant community be
>>>       willing to request codes from LC?
>>>       2. How faithfully do members of the relevant community actually
>>>       code such records in field 040 subfield $e?
>>>
>>>
>>>
>>> Please reply either to the list or to me directly. We greatly appreciate
>>> your input.
>>>
>>>
>>>
>>> Many thanks— Jackie
>>>
>>>
>>>
>>> -
>>>
>>> Jackie Dooley
>>>
>>> Program Officer, OCLC Research
>>>
>>> 647 Camino de los Mares, Suite 108-240
>>>
>>> San Clemente, CA 92673
>>>
>>> office/home 949-492-5060
>>> mobile 949-295-1529
>>> <dooleyj at oclc.org>dooleyj at oclc.org
>>>
>>> [image: OCLC]
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flogo&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=dnyUTanaqjBHSVV1FdTIEoNm6hDTbjlsRHIvE8OGviQ&e=>
>>>
>>> OCLC.org
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flink&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=TS_w0TQQ5p-iCY6URnpdmON9jBXJFIqhge-Llx6W-ms&e=>
>>> /research
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> Kate S. Moriarty, MSW, MLS  |  Rare Book Catalog Librarian  |  Associate
> Professor  |  Pius XII Memorial Library  |  Room 320-2
> Saint Louis University  |  3650 Lindell Blvd . |  St. Louis, MO 63108  |  (314)
> 977-3024 (tel)  |  (314) 977-3108 (fax)  |  moriarks at slu.edu  |
> <http://libraries.slu.edu/>http://libraries.slu.edu/
>
>
>
>


-- 
Kate S. Moriarty, MSW, MLS  |  Rare Book Catalog Librarian  |  Associate
Professor  |  Pius XII Memorial Library  |  Room 320-2
Saint Louis University  |  3650 Lindell Blvd . |  St. Louis, MO 63108  |
(314) 977-3024 (tel)  |  (314) 977-3108 (fax)  |  moriarks at slu.edu  |
http://libraries.slu.edu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20150904/cb0f9ad4/attachment-0001.html>


More information about the DCRM-L mailing list