[DCRM-L] FW: OCLC incorrectly merging map records

Fell, Todd todd.fell at yale.edu
Fri Oct 30 07:10:57 MDT 2015


Apologies in advance for cluttering your inboxes even more, but I thought folks on the dcrm-l would find the following discussions taking place on the maps-l listserv interesting and very relevant to our concerns and past problems with OCLC merging records.

Todd

From: Maps-L: Map Librarians, etc. [mailto:MAPS-L at LISTSERV.UGA.EDU] On Behalf Of Weitz,Jay
Sent: Thursday, October 29, 2015 1:34 PM
To: MAPS-L at LISTSERV.UGA.EDU
Subject: Re: OCLC incorrectly merging map records

Dear Angie and Colleen,

Please believe me when I tell you that OCLC takes the accuracy of the automated Duplicate Detection and Resolution extremely seriously.  The DDR team meets multiple times most weeks, in part to discuss such issues as this one.  As soon as Angie reported the incorrect merge on Wednesday, we began working on it.  The records were pulled apart and we have been discussing and testing different solutions to the incorrect merge since then.

Already built into the algorithms are numerous routes to try to discern differences such as those in evidence in this pair of records.  We have already exempted from DDR records for all maps published earlier than 1901 (except for such things as electronic or microform reproductions of them), a decision determined in consultation with MAGIRT's Cataloging and Classification Committee (on which I serve as the OCLC liaison).  Within a certain tolerance, we don't match records that have different scales.  We try to find and parse dates as well as certain types of alphanumeric identifiers that may be buried in quoted 500 notes. We try to identify statements of distribution (such as oil company distributors for road maps) that may be buried in quoted 500s.  These are just a few of the map-specific  things we try to do so as not to merge records incorrectly.  We also compare 086 fields when present to distinguish subtly different government publications, including maps.  Although an automated process such as DDR will never be perfect, especially in a database as large and diverse as WorldCat, we try our best to make it work better all the time.

The single example reported by Angie has suggested several improvements to our algorithms that we've already begun to test.  It has also revealed to us ways in which the existing algorithms may not be working quite the way we intended them to.  The quoted note in #793401057 that Angie cites was absolutely correct, and we're not suggesting it wasn't.  In #701552600, the numbering statement and the date statement were in two separate 500 notes, and the date note was missing its opening quotation mark, which has since been corrected.  We tested to see if adding the missing quotation mark made any difference (it didn't, as it turned out).  Dates can be expressed in all sorts of ways, as we all know, and though many of those ways are easily recognizable by a human cataloger, instructing an algorithm to identify all the legitimate variations can be difficult.

One of the most effective ways of assuring that bibliographic records reflecting subtle differences between similar resources are not merged incorrectly by DDR is to use the power that both AACR2 1.2B4 and RDA 2.5.1.4 give to catalogers.  AACR2 1.2B4 (and the corresponding rules in subsequent chapters, including 3.2B3) and their associated LCRIs allow the optional addition of an edition statement:  "If an item lacks an edition statement but is known to contain significant changes from other editions, supply a suitable brief statement in the language and script of the title proper and enclose it in square brackets."  LCRI 1.2B4 further states: "Do not apply this optional rule to any case of merely supposed differences in issues that might make them different editions. Apply the option for manifest differences where the catalog records would otherwise show exactly the same information in the areas beginning with the title and statement of responsibility area and ending with the series area."  RDA 2.5.1.4 allows essentially the same option:  "If a resource lacks an edition statement but is known to contain significant changes from other editions, supply an edition statement, if considered important for identification or access."  If there is a date associated with these different versions, it is fully in keeping with these instructions to include that date as part of the edition statement in field 250.  As I read them, both Cartographic Materials 2B4 and the passages on edition statements in RDA and Cartographic Resources are consonant with these practices.

My colleagues and I here at OCLC share your concerns about duplicates and about the shortcomings of DDR.  We really do work all the time on trying to make DDR work better and encourage you to report all incorrect merges that you find.  We learn something from every one of them and that helps us to improve the process.  thanks for your understanding, your patience, and your help.

Jay

--

Jay Weitz

OCLC · Senior Consulting Database Specialist, Data Infrastructure and WorldCat Quality Management

6565 Kilgour Place, MC 139, Dublin, Ohio USA 43017-3395

T +1-614-764-6156 · T +1-800-848-5878, ext. 6156 · F +1-614-718-7195

[OCLC]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flogo&d=AwMFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=Np9Dv-N85TxuwGXDqbzvM-I_B1c6fwLXrzKWWE4fD3I&m=GN7b_Q8ShF6CmQv8ZxqQs2-y6rMQQx3U_3C8cep3IUc&s=Zn4gGvAYuchv5u-56kBh0EkznUG43XowPIx-_WaQgvM&e=>

OCLC.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oclc.org_home.en.html-3Fcmpid-3Demailsig-5Flink&d=AwMFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=Np9Dv-N85TxuwGXDqbzvM-I_B1c6fwLXrzKWWE4fD3I&m=GN7b_Q8ShF6CmQv8ZxqQs2-y6rMQQx3U_3C8cep3IUc&s=gw_NWYp694Z4g9xJiYsqjrVe7zDHRU63mGbPMde7Zz8&e=> · Facebook<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_pages_OCLC_20530435726&d=AwMFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=Np9Dv-N85TxuwGXDqbzvM-I_B1c6fwLXrzKWWE4fD3I&m=GN7b_Q8ShF6CmQv8ZxqQs2-y6rMQQx3U_3C8cep3IUc&s=pctVajw5dLa7MQtVgBGsGTm-3HXJihpNEZM2JxIo04c&e=> · Twitter<https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_oclc&d=AwMFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=Np9Dv-N85TxuwGXDqbzvM-I_B1c6fwLXrzKWWE4fD3I&m=GN7b_Q8ShF6CmQv8ZxqQs2-y6rMQQx3U_3C8cep3IUc&s=Y6EVMPBa9-9AoIuFoNmAiv-OCTNvTBcKZ96mS_ssp4k&e=> · YouTube<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.youtube.com_OCLCvideo&d=AwMFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=Np9Dv-N85TxuwGXDqbzvM-I_B1c6fwLXrzKWWE4fD3I&m=GN7b_Q8ShF6CmQv8ZxqQs2-y6rMQQx3U_3C8cep3IUc&s=k5l0rlBSXehFAvO4eQzbdI1OHJLG5fSusFv_S1UDqnI&e=>






From: Maps-L: Map Librarians, etc. [mailto:MAPS-L at LISTSERV.UGA.EDU] On Behalf Of Angela R Cope
Sent: Thursday, October 29, 2015 10:33 AM
To: MAPS-L at LISTSERV.UGA.EDU<mailto:MAPS-L at LISTSERV.UGA.EDU>
Subject: Re: OCLC incorrectly merging map records


Hi Colleen,



Yes, OCLC suggests including the date in quotes as well as in an edition statement. I have noticed a lot of edition statements in brackets and I guess it makes sense now.



The issue is - we had a recat project for many years when we cataloged probably more than 30,000 maps and I conservatively estimate (without checking) that half of them were originals. So, OCLC says my note didn't have a quote around the dates and here is what the note looked like: "Map no. 2272, 10 April 1943."



The cataloger put it all in one quote because it was all in one string of text. Am I supposed to go back through 15,000 oclc numbers and check whether they were merged? I don't know what rules OCLC is applying to merging map records but they should not be the same rules as books. Most map catalogers have their share of angst stories of being misunderstood by book catalogers and this incorrect merging situation is a big one.

How about OCLC restores all my institutions map records and stops merging them. Generate a list of potential candidates and let me review a sample or something. I am just shocked at the idea that a LOT of my institutions records have been imporperly merged. It makes me sick to think of all that time we put into creating original records ...

Sick in Wisconsin,

Angie


________________________________
From: Maps-L: Map Librarians, etc. <MAPS-L at LISTSERV.UGA.EDU<mailto:MAPS-L at LISTSERV.UGA.EDU>> on behalf of Cahill, Colleen <cstu at LOC.GOV<mailto:cstu at LOC.GOV>>
Sent: Thursday, October 29, 2015 5:54 AM
To: MAPS-L at LISTSERV.UGA.EDU<mailto:MAPS-L at LISTSERV.UGA.EDU>
Subject: Re: [MAPS-L] OCLC incorrectly merging map records


This problem with OCLC goes way back and is a moving target. When I worked for the State Library of Pennsylvania, we has a slew of booklets all that had the same 5 initial and ending words, something like "The sunset laws for Pennsylvania ... within the commonwealth of the state"  I entered a bunch of these in OCLC and because the first 5 words and last 5 words were identical in each document, OCLC merged them all into one record.  Very annoying, to say the least.  I believe after that OCLC began looking at the entire title.



Question for Angie: has OCLC provided any way for both records to exist in their system?  I know their de-duping is automated, so I am guessing some change to a specific field (not a note) needs to be added keep the records from merging again.



Colleen



Colleen R. Cahill

Digital Conversion Coordinator and

    Recommending Officer for Fantasy and Science Fiction

Geography & Map Division

Library of Congress

101 Independence Ave. SE

Washington, DC 20540-4650

Voice: 202-707-8540

Fax: 202-707-8531

cstu at loc.gov<mailto:cstu at loc.gov>

These opinions are mine, Mine, MINE!







From: Maps-L: Map Librarians, etc. [mailto:MAPS-L at LISTSERV.UGA.EDU] On Behalf Of Elizabeth J Cox
Sent: Wednesday, October 28, 2015 5:34 PM
To: MAPS-L at LISTSERV.UGA.EDU<mailto:MAPS-L at LISTSERV.UGA.EDU>
Subject: Re: OCLC incorrectly merging map records



Hi, Angie. Thank you for bringing this to everyone's attention. As current MAGIRT chair and as a map cataloger, this is definitely something that the whole map community should be concerned about. With your permission, I will forward this to the members of the MAGIRT Executive Board and also to the WAML chair.



If others have had this occur, I think we would all be interested in hearing it.



Beth





Beth Cox

MAGIRT Chair (2015-2016)

http://www.ala.org/magirt/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ala.org_magirt_&d=AwMFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=Np9Dv-N85TxuwGXDqbzvM-I_B1c6fwLXrzKWWE4fD3I&m=GN7b_Q8ShF6CmQv8ZxqQs2-y6rMQQx3U_3C8cep3IUc&s=C1xYXQhxbsXFTSAJGy5kGUpIhO_7OaI4PAgmKGnw84E&e=>



ELIZABETH J. COX

Associate Professor, Coordinator of Cataloging & Metadata



MORRIS LIBRARY

MAIL CODE 6632

SOUTHERN ILLINOIS UNIVERSITY

605 AGRICULTURE DRIVE

CARBONDALE, IL 62901



bcox at lib.siu.edu<mailto:bcox at lib.siu.edu>

P: 618/453-5594

F: 618/453-3452

lib.siu.edu







From: Maps-L: Map Librarians, etc. [mailto:MAPS-L at LISTSERV.UGA.EDU] On Behalf Of Angela R Cope
Sent: Wednesday, October 28, 2015 3:24 PM
To: MAPS-L at LISTSERV.UGA.EDU<mailto:MAPS-L at LISTSERV.UGA.EDU>
Subject: OCLC incorrectly merging map records









I just discovered another record from my catalog that got incorrectly merged in OCLC. The map is an OSS map from 1943 with the same title as another map but each have different dates (months indicated in a 500 note) and different map number identifiers (also indicated in a 500 note). See oclc number 793401057 that was merged (incorrectly) with 701552600. The title and date fields (008 and 260/264) match but the 500 notes described what distinguished the two maps from one another. My library doesn't even hold the map that OCLC now says we hold.



Is there a committee from MAGIRT or WAML that is working with OCLC regarding this incorrect merging of catalog map records? Should we -  map catalogers, special collections catalogers - be keeping a record of these incorrect mergers as we discover them and then reporting them as a unified group? I'm sure for every one we discover, there are many others going undetected.



I've reported this error via OCLC's error reporting method. We need to express some concern about the frequency of this problem to OCLC. Why are we entering data into a shared catalog if it's just getting deleted by their computers?





-Angie



Angie Cope

American Geographical Society Library

UW Milwaukee Libraries

2311 E. Hartford Avenue

Milwaukee, Wisconsin 53211



http://www.uwm.edu/Libraries/AGSL<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.uwm.edu_Libraries_AGSL&d=AwMFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=Np9Dv-N85TxuwGXDqbzvM-I_B1c6fwLXrzKWWE4fD3I&m=GN7b_Q8ShF6CmQv8ZxqQs2-y6rMQQx3U_3C8cep3IUc&s=f9NrRzZqEbK897se3-SGR9d3Rc0XJ6CqSyQ3FuymmsM&e=>

Hours: M-F 8:00am-4:30pm

acope at uwm.edu<mailto:acope at uwm.edu>

(414)229-6282 / (800)558-8993 (US TOLL FREE) / (414)229-3624 (FAX)

43°03'8"N 87°57'21"W



Like us on Facebook: www.facebook.com/agslibrary<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_agslibrary&d=AwMFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=Np9Dv-N85TxuwGXDqbzvM-I_B1c6fwLXrzKWWE4fD3I&m=GN7b_Q8ShF6CmQv8ZxqQs2-y6rMQQx3U_3C8cep3IUc&s=IVgyJ0EbxdaLfUVNejoOdaKudgki1uLZcbI4xn3aKqk&e=>

Flickr: http://www.flickr.com/photos/agslibrary/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.flickr.com_photos_agslibrary_&d=AwMFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=Np9Dv-N85TxuwGXDqbzvM-I_B1c6fwLXrzKWWE4fD3I&m=GN7b_Q8ShF6CmQv8ZxqQs2-y6rMQQx3U_3C8cep3IUc&s=zBqBhoSHJFuu3mp2R4CcEo8QD3-NpO_cuuVsJfHIJ8k&e=>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserver.lib.byu.edu/pipermail/dcrm-l/attachments/20151030/afeaa9e8/attachment-0001.html>


More information about the DCRM-L mailing list