[DCRM-L] OCLC's duplicate detection & resolution software: two questions for the rare and archival materials communities

Will Evans evans at bostonathenaeum.org
Fri Sep 4 13:10:03 MDT 2015


I’m glad that OCLC is open to amending the cutoff date. I agree with
Richard that 19th century books can and often do contain subtle variations
at the manifestation level, and moreover, I often apply bibliographical
analysis and description to items of that time period. Is it insanity to
ask that the cutoff date be moved to 1901 for books as well as cartographic
material? Truthfully, if I had my druthers, I’d like to see the cutoff date
moved to 1930, which would get us through the bulk of the private press
books, works that are rife with subtle variations.



I code almost all my bib records “dcrmb” often as a defensive measure,
whether it’s a record for an incunabule or a 21st century artist book.
Additionally, I would like to see “appm”, “dacs”, “gihc”, “dcrms” and
“dcrmg”



Best,

Will





*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*

Will Evans

Chief Rare Materials Catalog Librarian

Library of the Boston Athenaeum

10 1/2 Beacon Street

Boston, MA   02108



Tel:  617-227-0270 ext. 224

Fax: 617-227-5266

www.bostonathenaeum.org







*From:* dcrm-l-bounces at lib.byu.edu [mailto:dcrm-l-bounces at lib.byu.edu] *On
Behalf Of *Chapman,John
*Sent:* Friday, September 04, 2015 10:52 AM
*To:* DCRM Users' Group
*Subject:* Re: [DCRM-L] OCLC's duplicate detection & resolution software:
two questions for the rare and archival materials communities



Richard and Francis,



We are asking if the 1801 cutoff (or the 1901 cartographic exception date)
need to be adjusted, but are not suggesting that it should be earlier. We
would expect that, if a change is agreed upon, the dates would be later.



We are asking the question of the DCRM-L community to see if there is any
consensus that can be reached about a change, or if the current scheme is
logical and can remain. The context that Richard provided should be helpful
in the discussion.



--

John Chapman

OCLC · Product Manager, Metadata Services

6565 Kilgour Place, Dublin, OH 43017 USA

T +1-614-761-5272





*From: *<dcrm-l-bounces at lib.byu.edu> on behalf of "Noble, Richard"
*Reply-To: *DCRM Users' Group
*Date: *Friday, September 4, 2015 at 10:23 AM
*To: *DCRM Users' Group
*Subject: *Re: [DCRM-L] OCLC's duplicate detection & resolution software:
two questions for the rare and archival materials communities



Quick response: the cut-off for books should, if anything, be later, not
earlier. The year 1801 is arbitrary, as much established as it is in
national bibliographies and the like. It seems to be understood as the end
of the "hand-press period", which is historically not the case. For English
books that would be no earlier than 1820, and for some continental books
even later (I see German books of the 1840s printed direct from type on
handmade laid paper, for instance).



But the bibliographical significance of "hand-press" has been great
exaggerated. While printers become more and more adept at covering their
tracks as the c19 proceeds, bibliographical analysis and description are
very much applicable to post-1801 books and post "hand-press" books, for
the most basic of our FRBR purposes: the identification of manifestations,
and, at the most learned level, the specification of diagnostic evidence
for distinction of manifestations, as well as explicit accounting for
evidence of variation within the body of items that constitute a
manifestation.



That said, I suppose--assuming that the exemption of dcrm records from
automatic de-duping continues--the idea is to establish criteria by which
to exempt a range of non-dcrm records as well. Earlier versions of dcrm
tended to emphasize 1801/"hand-press period" as a cutoff for application of
the special rules (and the consequent finer-grained analysis of supporting
evidence and variation), so it it made sense of a kind to specify that
range. As tempting as it is, however, to limit dcrm to hand-press books
because it is easier to analyze and describe them, I know from considerable
experience that post-1801 books printed from plates, perhaps based on
mechanical composition, are equally and more subtly variable.



The whole body of pre-1801 works forms, I presume, a relatively small
percentage of the material represented in the database, though the mass of
duplicate records generated by uploading of incommensurably cataloged
material is considerable. The problem is not so much the conflation of
different manifestations indifferently described, as it is the loss of
information that takes place when merged records are expunged, which
precludes conscious and focused comparison--by catalogers well versed in
the vagaries of legacy and minimal cataloging--as a check on de-duping
errors.



I would be dismayed to see an irreversible process applied to an even
greater range of materials than before. IRs being a lost cause, this would
be mitigated to some extent if records represented in 019 fields could be
preserved for inspection (beyond the current brief grace period) in such a
way as not to impede the operations of the WorldCat as a whole. But as
Francis Lapka pointed out, the regression of the date cutoff does seem to
be a retraction, not an expansion, of safeguards.


RICHARD NOBLE :: RARE MATERIALS CATALOGUER :: JOHN HAY LIBRARY

BROWN UNIVERSITY  ::  PROVIDENCE, R.I. 02912  ::  401-863-1187

<Richard_Noble at Br <RICHARD_NOBLE at BROWN.EDU>own.edu>



On Fri, Sep 4, 2015 at 9:00 AM, Lapka, Francis <francis.lapka at yale.edu>
wrote:

Jackie,

I'm grateful for your message, and pleased to hear that OCLC is considering
changes "to expand and strengthen the safeguards we already apply to
bibliographic records for unique, rare, and/or archival materials."

At first blush, it would seem that moving the chronological exception for
de-duping to an earlier date might *weaken* the safeguards, since it would
make the exception apply to a smaller set of records. Could you tell us
more about the motivation for this particular change and how it might serve
to strengthen the safeguards?



Thanks

Francis





On Fri, Sep 04, 2015 at 4:18 AM, Dooley,Jackie <dooleyj at oclc.org> wrote:



                Dear DCRM-L --



On behalf of my colleagues on OCLC's Metadata Quality Team, I'm writing to
pose two questions: 1) whether the pre-1801 cutoff for excluding records
from de-duplication should be changed to an earlier date, and 2) whether
additional cataloging code symbols should be added to the 040 $e exception.



We're considering changes to the automated Duplicate Detection and
Resolution (DDR) software and are seeking community opinion before taking
action. The contemplated changes are *intended to expand and strengthen the
safeguards we already apply to bibliographic records for unique, rare,
and/or archival materials*. As members of the rare and/or archival
cataloging community, you are in an excellent position to provide informed
advice on these issues.



First, some background. OCLC first developed the capability to merge
bibliographic records manually in 1983. During the late 1980s and early
1990s, we developed automated DDR software, which dealt with Books records
only. From 2005 through 2009, OCLC developed a completely new version of
DDR that worked with all bibliographic formats. From the very beginning of
automated DDR back in 1991, *records for resources with dates of
publication/production earlier than 1801 have been set aside and not
processed*. More recently, in consultation with the American Library
Association (ALA) Map and Geospatial Information Round Table (MAGIRT)
Cataloging and Classification Committee (CCC), we have further *exempted
records for cartographic materials with dates of publication earlier than
1901*. *In addition, *we exempt from DDR processing all records for
resources that can be identified as* photographs (Material Types “pht” for
photograph and/or “pic” for picture)*.



Following discussions with representatives of the rare materials community
several years ago, *we also exempted from DDR processing all records that
are coded in field 040 subfield $e under description conventions for rare
materials codes "bdrb", "dcrb", "dcrmb”, or “dcrms*.” Please note that
these DDR exemptions are *not* intended to apply to electronic, microform,
or other reproductions, only to the original resources.



The current DDR software is incredibly complicated and continues to be
fine-tuned on a regular basis. Although this is an oversimplification of a
complex process, there are now at least two dozen different points of
comparison taken into consideration. Many of these comparison points draw
data from multiple parts of a bibliographic record and involve manipulation
of data in ways designed to distinguish both variations that should be
equated and distinctions that must be recognized.

As part of our ongoing efforts to improve DDR’s accuracy, we are reaching
out again to members of the rare materials and archival resources
communities, in particular, for feedback on the following questions:


   1. Within the context of the materials cataloged by your community, are
   there dates other than pre-1801 for most resources and pre-1901 for
   cartographic materials that would make more sense as an exemption cutoff?
   2. The current list of Description Convention Source Codes, found at
   http://www.loc.gov/standards/sourcelist/descriptive-conventions.html
   <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.loc.gov_standards_sourcelist_descriptive-2Dconventions.html&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=MJfHI5B_tV51Vx2wSKcLJQY4vkqu3ua9UEvXyUqqX8c&e=>,
   has grown much more extensive in recent years. Aside from the four codes
   already exempted ("bdrb", "dcrb", "dcrmb”, “dcrms”), are there others that
   it would make sense to consider exempting? Note that Description Convention
   Source Codes “appm”, “dacs”, “gihc”, and “dcrmg” have already been
   suggested for adding to the exemption list.


   1. Are there other well-accepted rare and/or archival materials
      descriptive standards that don’t currently have their own code,
and so are
      absent from the MARC Code List? If so, would the relevant community be
      willing to request codes from LC?
      2. How faithfully do members of the relevant community actually code
      such records in field 040 subfield $e?



Please reply either to the list or to me directly. We greatly appreciate
your input.



Many thanks— Jackie



-

Jackie Dooley

Program Officer, OCLC Research

647 Camino de los Mares, Suite 108-240

San Clemente, CA 92673

office/home 949-492-5060
mobile 949-295-1529
dooleyj at oclc.org

[image: OCLC]
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flogo&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=dnyUTanaqjBHSVV1FdTIEoNm6hDTbjlsRHIvE8OGviQ&e=>

OCLC.org
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flink&d=AwMGaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=t7GDkvcZa922K6iya7a6MxgVxxw7OjL0m1rPBXkflk4&m=kRqExyp5bTagfw4W-s3iO-qvtjTFj_59J74agId44nI&s=TS_w0TQQ5p-iCY6URnpdmON9jBXJFIqhge-Llx6W-ms&e=>
/research
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20150904/ec93ffc2/attachment-0001.html>


More information about the DCRM-L mailing list