[DCRM-L] OCLC's duplicate detection & resolution software: two questions for the rare and archival materials communities

Kate Moriarty moriarks at slu.edu
Fri Sep 4 09:37:57 MDT 2015


Thank you for this, Jackie and John.

As others have stated, I would be in favor of moving the cut-off date to a
later date, though I'll leave it to those with a larger post-1801
collection to suggest a specific date.

Jackie, regarding your 2nd question, I believe you mentioned last year that
OCLC would be adding "amremm" to the list of 040 $e DDR exemptions. You
said it wouldn't be easy - have you had any success with it?

And in answer to your last question, we regularly code the 040 $e here and,
at least from the records I see in OCLC, it seems like others do, too.

Thanks,
Kate

On Fri, Sep 4, 2015 at 9:51 AM, Chapman,John <chapmanj at oclc.org> wrote:

> Richard and Francis,
>
> We are asking if the 1801 cutoff (or the 1901 cartographic exception date)
> need to be adjusted, but are not suggesting that it should be earlier. We
> would expect that, if a change is agreed upon, the dates would be later.
>
> We are asking the question of the DCRM-L community to see if there is any
> consensus that can be reached about a change, or if the current scheme is
> logical and can remain. The context that Richard provided should be helpful
> in the discussion.
>
> --
> John Chapman
> OCLC · Product Manager, Metadata Services
> 6565 Kilgour Place, Dublin, OH 43017 USA
> T +1-614-761-5272
>
>
> From: <dcrm-l-bounces at lib.byu.edu> on behalf of "Noble, Richard"
> Reply-To: DCRM Users' Group
> Date: Friday, September 4, 2015 at 10:23 AM
> To: DCRM Users' Group
> Subject: Re: [DCRM-L] OCLC's duplicate detection & resolution software:
> two questions for the rare and archival materials communities
>
> Quick response: the cut-off for books should, if anything, be later, not
> earlier. The year 1801 is arbitrary, as much established as it is in
> national bibliographies and the like. It seems to be understood as the end
> of the "hand-press period", which is historically not the case. For English
> books that would be no earlier than 1820, and for some continental books
> even later (I see German books of the 1840s printed direct from type on
> handmade laid paper, for instance).
>
> But the bibliographical significance of "hand-press" has been great
> exaggerated. While printers become more and more adept at covering their
> tracks as the c19 proceeds, bibliographical analysis and description are
> very much applicable to post-1801 books and post "hand-press" books, for
> the most basic of our FRBR purposes: the identification of manifestations,
> and, at the most learned level, the specification of diagnostic evidence
> for distinction of manifestations, as well as explicit accounting for
> evidence of variation within the body of items that constitute a
> manifestation.
>
> That said, I suppose--assuming that the exemption of dcrm records from
> automatic de-duping continues--the idea is to establish criteria by which
> to exempt a range of non-dcrm records as well. Earlier versions of dcrm
> tended to emphasize 1801/"hand-press period" as a cutoff for application of
> the special rules (and the consequent finer-grained analysis of supporting
> evidence and variation), so it it made sense of a kind to specify that
> range. As tempting as it is, however, to limit dcrm to hand-press books
> because it is easier to analyze and describe them, I know from considerable
> experience that post-1801 books printed from plates, perhaps based on
> mechanical composition, are equally and more subtly variable.
>
> The whole body of pre-1801 works forms, I presume, a relatively small
> percentage of the material represented in the database, though the mass of
> duplicate records generated by uploading of incommensurably cataloged
> material is considerable. The problem is not so much the conflation of
> different manifestations indifferently described, as it is the loss of
> information that takes place when merged records are expunged, which
> precludes conscious and focused comparison--by catalogers well versed in
> the vagaries of legacy and minimal cataloging--as a check on de-duping
> errors.
>
> I would be dismayed to see an irreversible process applied to an even
> greater range of materials than before. IRs being a lost cause, this would
> be mitigated to some extent if records represented in 019 fields could be
> preserved for inspection (beyond the current brief grace period) in such a
> way as not to impede the operations of the WorldCat as a whole. But as
> Francis Lapka pointed out, the regression of the date cutoff does seem to
> be a retraction, not an expansion, of safeguards.
>
> RICHARD NOBLE :: RARE MATERIALS CATALOGUER :: JOHN HAY LIBRARY
> BROWN UNIVERSITY  ::  PROVIDENCE, R.I. 02912  ::  401-863-1187
> <Richard_Noble at Br <RICHARD_NOBLE at BROWN.EDU>own.edu>
>
> On Fri, Sep 4, 2015 at 9:00 AM, Lapka, Francis <francis.lapka at yale.edu>
> wrote:
>
>> Jackie,
>>
>> I'm grateful for your message, and pleased to hear that OCLC is
>> considering changes "to expand and strengthen the safeguards we already
>> apply to bibliographic records for unique, rare, and/or archival materials."
>>
>> At first blush, it would seem that moving the chronological exception for
>> de-duping to an earlier date might *weaken* the safeguards, since it would
>> make the exception apply to a smaller set of records. Could you tell us
>> more about the motivation for this particular change and how it might serve
>> to strengthen the safeguards?
>>
>>
>>
>> Thanks
>>
>> Francis
>>
>>
>>
>>
>>
>> On Fri, Sep 04, 2015 at 4:18 AM, Dooley,Jackie <dooleyj at oclc.org> wrote:
>>
>>
>>
>>                 Dear DCRM-L --
>>
>>
>>
>> On behalf of my colleagues on OCLC's Metadata Quality Team, I'm writing
>> to pose two questions: 1) whether the pre-1801 cutoff for excluding records
>> from de-duplication should be changed to an earlier date, and 2) whether
>> additional cataloging code symbols should be added to the 040 $e exception.
>>
>>
>>
>> We're considering changes to the automated Duplicate Detection and
>> Resolution (DDR) software and are seeking community opinion before taking
>> action. The contemplated changes are *intended to expand and strengthen
>> the safeguards we already apply to bibliographic records for unique, rare,
>> and/or archival materials*. As members of the rare and/or archival
>> cataloging community, you are in an excellent position to provide informed
>> advice on these issues.
>>
>>
>>
>> First, some background. OCLC first developed the capability to merge
>> bibliographic records manually in 1983. During the late 1980s and early
>> 1990s, we developed automated DDR software, which dealt with Books records
>> only. From 2005 through 2009, OCLC developed a completely new version of
>> DDR that worked with all bibliographic formats. From the very beginning of
>> automated DDR back in 1991, *records for resources with dates of
>> publication/production earlier than 1801 have been set aside and not
>> processed*. More recently, in consultation with the American Library
>> Association (ALA) Map and Geospatial Information Round Table (MAGIRT)
>> Cataloging and Classification Committee (CCC), we have further *exempted
>> records for cartographic materials with dates of publication earlier than
>> 1901*. *In addition, *we exempt from DDR processing all records for
>> resources that can be identified as* photographs (Material Types “pht”
>> for photograph and/or “pic” for picture)*.
>>
>>
>>
>> Following discussions with representatives of the rare materials
>> community several years ago, *we also exempted from DDR processing all
>> records that are coded in field 040 subfield $e under description
>> conventions for rare materials codes "bdrb", "dcrb", "dcrmb”, or “dcrms*.”
>> Please note that these DDR exemptions are *not* intended to apply to
>> electronic, microform, or other reproductions, only to the original
>> resources.
>>
>>
>>
>> The current DDR software is incredibly complicated and continues to be
>> fine-tuned on a regular basis. Although this is an oversimplification of a
>> complex process, there are now at least two dozen different points of
>> comparison taken into consideration. Many of these comparison points draw
>> data from multiple parts of a bibliographic record and involve manipulation
>> of data in ways designed to distinguish both variations that should be
>> equated and distinctions that must be recognized.
>>
>> As part of our ongoing efforts to improve DDR’s accuracy, we are reaching
>> out again to members of the rare materials and archival resources
>> communities, in particular, for feedback on the following questions:
>>
>>
>>    1. Within the context of the materials cataloged by your community,
>>    are there dates other than pre-1801 for most resources and pre-1901 for
>>    cartographic materials that would make more sense as an exemption cutoff?
>>    2. The current list of Description Convention Source Codes, found at
>>    http://www.loc.gov/standards/sourcelist/descriptive-conventions.html
>>    <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.loc.gov_standards_sourcelist_descriptive-2Dconventions.html&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=MJfHI5B_tV51Vx2wSKcLJQY4vkqu3ua9UEvXyUqqX8c&e=>,
>>    has grown much more extensive in recent years. Aside from the four codes
>>    already exempted ("bdrb", "dcrb", "dcrmb”, “dcrms”), are there others that
>>    it would make sense to consider exempting? Note that Description Convention
>>    Source Codes “appm”, “dacs”, “gihc”, and “dcrmg” have already been
>>    suggested for adding to the exemption list.
>>
>>
>>    1. Are there other well-accepted rare and/or archival materials
>>       descriptive standards that don’t currently have their own code, and so are
>>       absent from the MARC Code List? If so, would the relevant community be
>>       willing to request codes from LC?
>>       2. How faithfully do members of the relevant community actually
>>       code such records in field 040 subfield $e?
>>
>>
>>
>> Please reply either to the list or to me directly. We greatly appreciate
>> your input.
>>
>>
>>
>> Many thanks— Jackie
>>
>>
>>
>> -
>>
>> Jackie Dooley
>>
>> Program Officer, OCLC Research
>>
>> 647 Camino de los Mares, Suite 108-240
>>
>> San Clemente, CA 92673
>>
>> office/home 949-492-5060
>> mobile 949-295-1529
>> dooleyj at oclc.org
>>
>> [image: OCLC]
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flogo&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=dnyUTanaqjBHSVV1FdTIEoNm6hDTbjlsRHIvE8OGviQ&e=>
>>
>> OCLC.org
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flink&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=TS_w0TQQ5p-iCY6URnpdmON9jBXJFIqhge-Llx6W-ms&e=>
>> /research
>>
>>
>>
>>
>>
>>
>


-- 
Kate S. Moriarty, MSW, MLS  |  Rare Book Catalog Librarian  |  Associate
Professor  |  Pius XII Memorial Library  |  Room 320-2
Saint Louis University  |  3650 Lindell Blvd . |  St. Louis, MO 63108  |
(314) 977-3024 (tel)  |  (314) 977-3108 (fax)  |  moriarks at slu.edu  |
http://libraries.slu.edu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20150904/874f5d7c/attachment-0001.html>


More information about the DCRM-L mailing list