December 11, 2005
Podcasting and libraries
Miscellaneous
It is interesting to see the growth in podcasting, and in our space there are a couple of interesting venues: - Matt Pasiewicz of Educause has been interviewing people with an interest in technology in education, including several librarians. Check out his interviews with Jim Michalko and Barbara Taranto, for example.
- Paul Miller has begun Talking with Talis, a series of interviews with people active in Web 2.0, semantic web, and library endeavors. He has just interviewed my colleague George Needham, for example, among others.
I have agreed speak with both Paul and Matt, and we are looking for dates. Myself and Thom will talk with Matt; with Paul, I am on my own ...
December 11, 2005
Institutional repositories and research assessment
Digital asset management, Research, learning and scholarly communication
Australia and the UK have frameworks in place for the evaluation of research and the selective allocation of funding based on that evaluation. This has potential implications for institutional repositories as one factor of the evaluation exercises is the need to better record, and potentially manage, research outputs. The UK exercise is known as the Research Assessment Exercise (RAE). The Australian exercise is known as the Research Quality Framework (RQF). There is a FAQ [pdf] about the RQF, and here is a presentation [pdf] about the ARROW institutional repository intiative and the RQF.
In this context I was interested to see the following project jointly run by the Universities of Southampton and Edinburgh. The Edinburgh and Southampton teams propose a one-year activity to jointly develop solutions for integrating DSpace and EPrints repositories and repository workflows into institutional RAE activities. The resulting software should be easy to install in existing repositories and should be relatively easy to adapt to local circumstances. [IRRA: Institutional Repositories for Research Assessment: a JISC Project] There is a description of the steps that the University of Edinburgh is taking to support the Research Assessement Exercise and the relationship between its institutional repository and the RAE publications repository here.
December 10, 2005
Cool search
Learning and research - systems and technologies, OCLC, Search
Thom has a note about some really nice search systems on his blog. Check there for the details of how these work. The search systems are based on an index of phrases, and suggest results as you type. These, combined with speed of response, give a very good serach experience.
Look here for a 'quick' search of the Phoenix Public Library collection. The dynamic presentation of results as you type works very well. Results are ranked by holdings.
Look here for a quick search of the Library of Congress Subject Headings. This is useful as a way of quickly finding an appropriate heading. Again, results are ranked by the number of records in worldcat with the heading. Potentially a productivity enhancement ...
These are pretty special.
December 07, 2005
Identifiers
General - distributed environments, Knowledge organization and representation, Libraries - distributed environments, OCLC
A couple of things come together ...
I was going to do a short post on the renaissance of interest in identifiers based on the approval by the IESG of the Info-URI, and on the growing awareness that we need to consistently identify the entities in our environment (institutions for example) if we are to more effectively tie together data and applications. Stu summarised some of the issues recently [ppt].
However, I came across David Weinberger's note on identifiers earlier, so I will point to that instead and come back to some specifics later. "Web 2.0" is one of those terms with lots of precise meanings, none of them entirely consistent with the others. To me, it refers to the way in which data and applications can be integrated across the Web, building new apps out of snippets of old. (I'm not nearly as fond of the implication that only with Web 2.0 did users come to have a voice on the Web. User voice has driven the Web since it began.) Web 2.0 takes what were monolithic apps and breaks them apart so they can be stitched together in new ways. Tags break apart the world of hyperlinked pages so that we can pull them together around meanings that we, the readers, supply. But none of this restitching is possible without thread. That's where unique IDs come in. [JOHO - March 21, 2000] He goes on to discuss the case of books and Hamlet, and reports on a discussion with my colleague Thom Hickey about Hamlet and FRBR. And, hey, yet another mention of xISBN as a nice example of a service which usefully rolls up identifiers for the different manifestations of works.
December 07, 2005
Link-addressible artifacts
General - distributed environments, Knowledge organization and representation, OCLC
I mentioned the other day that a part of the value of Flickr was that it made images citable, or, in the term suggested by John Udell in a nice post about media, 'link-addressible'. John's last sentence below (my emphasis) is a nice statement of the importance of 'citability'. In my earlier experiments with MP3 sound bites I showed how seemingly-opaque and statically-served audio files can be made link-addressable, and can therefore be quoted from in situ. Composing on-the-fly remixes is one of the nice benefits that fall out of this approach, but the larger goal is to bring the social effects we see at work in the textual blogosophere into the realm of audio. Linking and quotation drive discovery and shared discourse, but media formats, players, and hosting environments are notoriously hostile to linking and quotation, and I'd really like to see that change. [Jon Udell: Greasemonkeying Google Video] I noticed this because - as he discusses how an item may have multiple identifiers - he draws attention to our proof-of-concept xISBN service. And he finds good words with which to describe it: It's my view that every media player should also be, at least potentially, an authoring tool as well. And every piece of published media content should afford, at least potentially, a canonical address -- indeed, a whole family of them. In the case of Google Video, the classic Doug Engelbart video shown in the embedded screencast has the unique ID -8734787622017763097. My Greasemonkey script uses that information to play back or download some or all of the video. Of course the same video might appear at Ourmedia.org or Brightcove, where it would have different identifiers. If we want to concentrate the discourse about media content, we'll need services that can unify these various identifiers, as the OCLC's xISBN service aims to unify the cloud of ISBNs that represent different expressions of the same work. [Jon Udell: Greasemonkeying Google Video] Incidentally, xISBN has been cropping up in several places recently, as here at libdev for example.
December 04, 2005
Blog searching again
Search , Social networking
I posted a little while ago about blog searching options. As others do, I have several 'ego feeds' set up to track posts. No service is miles ahead of the others, although Technorati and Google's Blogsearch are what I tend to watch. That said, others will occasionally have something that those do not.
However, something that I have noticed recently is that PubSub doesn't seem to pull in mentions from outside North America, or in non-English languages, as much as the others do.
December 02, 2005
Service models
Libraries - distributed environments
I mentioned the work of the DLF Framework a while ago. This has been looking at ways of developing service and process models for libraries as they evolve to support research and learning in new environments. The absence of common models undermines our ability to develop and design systems efficiently, to create large-scale collaborative activities, and to communicate the value of libraries to other communities. It makes it less likely that library services will be routinely embedded in system-mediated research and learning workflows (the course management system, for example). It slows the development of common or third-party systems which would reduce the costs involved in redundant development activity across many institutions. And it makes it impossible for the library community to mobilize its collective resources to respond promptly and efficiently to changing needs. [DLF Service Framework for Digital Libraries] This work has been on hold for a while pending recruitment of some dedicated effort to move it forward. I am very pleased to report that Geneva Henry of Rice University will be moving this initiative forward as a DLF/CLIR Distinguished Fellow.
Incidentally, along similar lines readers may find Andy Powell's A 'service oriented' view of the JISC Information Environment interesting.
December 01, 2005
QA briefings
Standards
The QA Focus was an activity provided by UKOLN and the AHDS to JISC projects. Now finished, it continues to make available eighty briefing documents on various aspects of information service development and management, which seem like a useful resource.
Each is a short, pragmatic introduction to a topic. A wide range is covered, including, for example, current topics like podcasting and folksonomies as well as many other management and technical issues.
December 01, 2005
QOTD
Digital asset management
From Timo Hannay, of Nature Publishing Group ... So while Google has to mature, it is publishers and politicians who still have the most to do. They must adapt their businesses and laws to work in a new, unfamiliar land edging into view on the horizon, a place that our children are already colonising. This is a world in which our abilities to find, reinvent and share are being set free from the limits of the physical world. The future is a foreign country; they do things differently there. [..::: EPS Online Debate :::..] From the Googledebate run by EPS. A focus on how Google is impacting the publishing industry with invited contributions from sundry parties. Surprised not to have seen more mention of it.
December 01, 2005
Flickr value
General - distributed environments
I sometimes find it helpful to state the obvious ....
One of the major values of Flickr is that it is a low-entry mechanism for making photos 'citable'.
You can give them a URI, a handle, that can be copied and exchanged, and which makes it that much easier for them to enter the shared network space.
The ability to copy and share a URL for most pages is one of the ways in which Technorati, Google or Amazon differs from earlier generations of services.
November 30, 2005
A handful of presentations
Digital asset management, Knowledge organization and representation, Libraries - distributed environments, Metadata, OCLC, User experience
I just noticed how out-and-about my OR colleagues have been in the last few months. Here is a note about presentations since September, pulling in a variety of things in passing.
System and service architectures
David Bigwood asked recently about the Microsoft Research Pane. We have been using it as a way of providing services into applications which sit in a Windows environment. Our terminology services work is described here. In doing this work we developed a gateway between SRU and the Research Pane, which means that we can bring up any SRU-compatible resource within a Microsoft environment. This is actually quite useful.
WikiD is a framework for managing multiple structured datasets and services on them. Built around the OpenURL 1.0 specification it has a wiki-style interface, and is now being used to support a variety of applications at OCLC. Diane Vizine-Goetz
Terminology Services Project & DeweyBrowser (PowerPoint:1.95MB/34slides)
32nd Annual ASTED Conference (PDF:1.5MB/52pp.), 12 November 2005, Montréal (Canada)[Presentations [OCLC - Research]]Jeff Young
WikiD (Wiki/Data) (PowerPoint:397K/30slides)
DLF Fall Forum, 8 November 2005, Charlottesville, Virginia (USA) [Presentations [OCLC - Research]]
Data mining
These presentations talk about some of the ways in which we are trying to extract intelligence from Worldcat - making the accumulated investment in library metadata work harder to answer questions about collections and services. Here there are discussions about trying to bring together publisher names, look at relative patterns of print and digital serial holdings, present world publishing and collections patterns graphically, and, finally, to overview some of the projects looking at what we can say about collections based on aggregate holdings data, of which the analysis of the Google 5 library collections has already received some notice. Lynn Silipigni Connaway and Akeisha Heard
Publisher Name Authority Project: An Attempt to Enhance Data Mining for Collection Analysis & Comparison (PowerPoint:183K/39slides)
XXV Annual Charleston Conference, 04 November 2005, Charleston, South Carolina (USA) [Presentations [OCLC - Research]]
Chandra Prabha and Carolyn Hank
Journals: Subscriptions, Substitutions, Cancellations (PowerPoint:148K/23slides)
XXV Annual Charleston Conference, 03 November 2005, Charleston, South Carolina (USA) [Presentations [OCLC - Research]] Lynn Silipigni Connaway and Clifton Snyder
What in the World? Geographical Representation of Library Collections in WorldCat: A Prototype (PowerPoint:332K/18slides)
ASIS&T; 2005 Annual Meeting, 01 November 2005, Charlotte, North Carolina (USA) [Presentations [OCLC - Research]]Brian Lavoie
G5 Paper and Data Mining (PowerPoint:110K/14 slides)
OCLC Members Council Research and New Technologies Interest Group, 24 October 2005, Dublin, Ohio (USA) [Presentations [OCLC - Research]]
Identifiers
A topic whose time has come again. A general overview and some specific work with taxonomy identifers. Stu Weibel
Issues in Managing Persistent Identifiers (PowerPoint:145K/27 slides)
OAI 4, 20 October 2005, CERN, Genève (Switzerland) [Presentations [OCLC - Research]]Eric Childress, Andrew Houghton, and Diane Vizine-Goetz
OCLC and Vocabulary Identifiers (PowerPoint:288K/13 slides)
Presented by Eric Childress at DC-2005: Vocabularies in Practice, the International Conference on Dublin Core and Metadata Applications, 13 September 2005, Madrid (Spain) [Presentations [OCLC - Research]]
Managing bibliographic data ... differently
The library has a large investment in structured metadata - this can be made work harder to provide rich user experiences. It can also be brought to the user within different service models. Here are some presentations that discuss these issues, alongside others. Specific issues covered are how we are clustering data at work level (the FRBR model), using classification structures to provide structured browse options, exposing catalog data on the web, and our experiments with user contributed data. Thom relates these to some wider developments. Thom Hickey
The Future of the Library Catalog: Open, Interactive, Participatory (PowerPoint:4.3MB/36slides)
FedLink Fall Members Meeting (PDF:77K/1p.), 9 November 2005, Library of Congress, Washington, DC (USA) [Presentations [OCLC - Research]]Diane Vizine-Goetz
DeweyBrowser & Curiouser (PPT:1.1MB/18slides)
OCLC Members Council Research and New Technologies Interest Group, 24 October 2005, Dublin, Ohio (USA) [Presentations [OCLC - Research]]Thom Hickey
New approaches to the catalog (PowerPoint:3.6MB/48slides)
Swedish Library Association , 28 October 2005, Stockholm (Sweden) [Presentations [OCLC - Research]]
Automatic cataloging and classification
An overview of approaches and possibilities. Eric Childress
Auto Classification and Cataloging Topics (PowerPoint:104K/20slides)
OCLC Members Council Research and New Technologies Interest Group, 25 October 2005, Dublin, Ohio (USA) [Presentations [OCLC - Research]]
Preservation metadata Brian Lavoie
Preservation Metadata: Setting the Scene (PDF:148K/14slides)
PREMIS and Preservation Metadata Standards (PDF:225K/14slides)
DPC Meeting on Preservation Metadata, 8 September 2005, British Library Conference Centre, London (UK) [Presentations [OCLC - Research]]
A couple of general reviews Eric Childress
Pattern Recognition for Technical Services: Interpreting the OCLC Environmental Scan (PowerPoint:1.6MB/44 slides)
2005 Ohio Library Council Annual Conference & Exposition, 6 October 2005, Columbus, Ohio (USA). [Presentations [OCLC - Research]]Lorcan Dempsey
The Library and the Network: Flattening the Library and Turning it Inside Out (PowerPoint:4.9MB/43 slides)
Access 2005, 19 October 2005, Edmonton, Alberta (Canada)
An audio recording of this presentation is available. (MP3:15.3MB/67min.) [Presentations [OCLC - Research]]
November 27, 2005
Circulating intentional data
Libraries - organization and services, Marketing, Metadata, OCLC
I have posted a couple of times recently about intentional data, data that records choices and behaviors. I mentioned holdings data, ILL records, circulation records, and database usage records. One could extend this list to any data which records an interaction or choice. We are used to looking at transaction logs of various sorts, and new forms of data are emerging, for example, in the form of questions asked in virtual reference. What types of intelligence could be mined from a comparison of the subject profiles of virtual reference questions to the subject profile of collections? Would it expose gaps in the collection, for example?
In that context I was interested to read a post on the Gordian Knot pointing to some work by David Pattern at the University of Huddersfield which shows a 'people who borrowed this also borrowed ...' feature. And it does look like a good enhancement. (It does not seem to be available on the 'publicly visible' catalogue.)
Circulation is interesting in this context. We run into a long tail sort of a thing. Amazon is the primary exemplar of this type of 'recommender' service. Amazon aggregates supply (it has a very big database of potential hits in the context of any query, increasing the chances that a person will find something of interest), and it aggregates demand (it is a major gravitational hub on the network, so it assembles lots of eyeballs, increasing the chances that any one book will be found by an interested person). The result of this - the aggregation of supply and the aggregation of demand - is that use is driven down the long tail. More materials are aggregated, and more of them find an audience.
Now, we know that, typically, the smaller part of a library collection circulates (maybe less than 20% in a research library). We also know that, typically, interlibrary lending trafffic is very, very much smaller than circulation.
What does this suggest? Well the former suggests that we have an excess of supply over demand in any library, and we have indeed built 'just in case' collections. However, aggregating demand should make those collections more used, and this appears to be the case in services like Ohiolink, for example, which have aggregated demand for insitutional collections at the state-wide level, increasing the chances that an item will be found by an interested reader. The latter suggests that we have not aggregated supply across libraries in a systemwide way very efficiently, as library users do not very often go beyond their local collection. There are various reasons for this, including library policy in what is made available, but in general one might say that the transaction costs of discovering, locating, requesting and having delivered resources are high enough to inhibit use. Again, this suggests that we have not aggregated supply as effectively as we might in systemwide situations (this was the focus of another post).
Coming back to recommendations based on circulation, two things occur to me: - One might imagine a complement to a circulation-based recommender service which recommends other books in the collection which have not circulated, or have not circulated as much. In other words, which ties circulating books to the non-circulating ones. And we know about various 'books like this' measures: by subject, by author, by series. In fact, catalogs were originally designed to make these types of connections. However, there is other data which shares the 'intentional' element which makes circulation interesting, and which represents aggregate choices: things that have appeared on the same reading list, that have been been recommended by the same faculty member, and importantly things that cite or have been cited by the selected item. Now, in some of these cases the benefits resulting may not be worth the effort of collecting and manipulating the data; we do not know. In others, citation for example, there clearly are benefits.
- For many of these examples, it may be difficult for a library to generate the data and build services on top of it without better support - in their systems or in services available to them. Furthermore, in many cases the results may be improved by aggregating data across libraries, or across other service environments. The Gordian Knot suggests there may be scope, for example, for services based on aggregated circulation data. (This is not to ignore the real policy questions surrounding the sharing of circulation data. Of course, there are also technical issues of exporting and exchanging in common ways.) Amazon has introduced very useful services based on citation and also associates books based on shared distinctive word patterns. One could imagine those connections being leveraged in a catalog, and Amazon is well placed to do this based on the volume of data it has. In fact, one of the benefits of the mass digitization projects currently under way would be to allow more of that type of connection to be made. Clearly, services based on holdings data depend on aggregations. In WorldCat-based services, OCLC ranks results by volume of holdings, the most widely held first. And there has been interest from time to time from libraries and others in having access to holdings counts to allow them to rank results in their own environments by this measure, on the assumption that the more widely held an item is the more likely it is to meet a need. We do not offer a service like this at the moment, but you can imagine one. We are also experimenting with generating audience levels based on the pattern of holdings (something that lots of high-schools hold is likely to be different to something that only a few research libraries hold). And we are seeing growing interest in the sharing of database usage data, based on pooling of Counter-compliant data. One reason that aggregation is potentially beneficial is that it address the demand-side issue discussed above: by aggregating data one may make connections that do not get made in the data generated by a smaller group of users.
It is clear that we will see services emerge in the library space which are based on the standardization, consolidation, and syndication of 'intentional' data. We may also see greater systems support for the collection and mining of particular forms of local data. These will supply 'intelligence' to support richer user experiences and better management decisions. Compare how services can already access Amazon's data in this way (see for example the liveplasma service build on top of Amazon data).
As we extend the ways in which users can discover materials, it puts additional emphasis on the need to improve our systemwide apparatus for delivering those materials.
Making data work harder is an integral part of the Web 2.0 discussions, and we certainly have a lot of data to do things with!
November 26, 2005
On Beauty and community
Books, movies and reading ..., Digital asset management, Libraries - organization and services
I see that On Beauty is for now on top of the list of 'hot books' at the justly admired Ann Arbor District Library website.
It must be because of the Open WorldCat reviews ;-)
While I was there, I noticed the Director's Blog, in which there was real exchange about issues. I was intrigued to see the PictureAnnArbor initiative, where library users are invited to contribute their community memories in the form of images. The library will scan them, or accept them in digital format. The mission of the initiative is to "gather, capture and share information and images that reflect everyday life in our community." Like others, this library has initiatives looking at moving the historical record onto the web. What is interesting with PictureAnnArbor is that the library is offering a platform on which library users can offer their materials for sharing.
In conversations about preservation, I often wonder about community information. Materials that were once in the parish newsletter, on the community noticeboard, or in the photocopied club minutes, are moving to the web. Communities are being discussed, presented and recorded through new network tools [a telling Flickr slideshow]. The library may have collected a part of this community record in the print world. What is currently happening in the web world? Are libraries actively looking to select, capture and retain this digital record of community and communities? I was reminded of what I had written over ten years ago: That there is a role for the public library in describing the explicit relationships and objects which are evidence of a community and its sense of itself, or rather of the more or less multiple communities which share any library, is agreed. Local history records; sport, art, culture, social activities: these can be noted and described. At one end this perhaps shades into tourist information, at the other into archives. Individually such services provide value; made available as a network resource and brought into the same context of use as other such resources, that value is much enhanced. [public libraries and the information superhighway] Somewhat prolix maybe, but we are still in early days for this kind of work. It is good to see how AADL are opening up to new ways of capturing community memories. What would happen if these materials were made available for annotation Wiki-style by the community, to provide context and detail?
November 26, 2005
Geek novels
Books, movies and reading ...
Jack Schofield, pioneering technical journalist, and Guardian blogger, has polled his readers for their top 'geek novels'. Check out his results. Here is his top fifteen:
1. The HitchHiker's Guide to the Galaxy -- Douglas Adams 85% (102)
2. Nineteen Eighty-Four -- George Orwell 79% (92)
3. Brave New World -- Aldous Huxley 69% (77)
4. Do Androids Dream of Electric Sheep? -- Philip Dick 64% (67)
5. Neuromancer -- William Gibson 59% (66)
6. Dune -- Frank Herbert 53% (54)
7. I, Robot -- Isaac Asimov 52% (54)
8. Foundation -- Isaac Asimov 47% (47)
9. The Colour of Magic -- Terry Pratchett 46% (46)
10. Microserfs -- Douglas Coupland 43% (44)
11. Snow Crash -- Neal Stephenson 37% (37)
12. Watchmen -- Alan Moore & Dave Gibbons 38% (37)
13. Cryptonomicon -- Neal Stephenson 36% (36)
14. Consider Phlebas -- Iain M Banks 34% (35)
15. Stranger in a Strange Land -- Robert Heinlein 33% (33)
A couple of things about the list. Would the result have been different if carried out from a US base? There are no women novelists. Ursula Le Guin? I am not sure what to make of this.
He has moved on and is looking at movies: you can leave a comment with candidates to go forward to the next stage.
November 25, 2005
Best
Miscellaneous
I was sad to see the announcement of George Best's death. I am not a great sports fan and would only occasionally watch soccer. But growing up when George Best was playing, and playing at his peak, was to see magic. During those years he brought uplift and pleasure into millions of lives; he played transcendently and allowed others to momentarily experience transendence through his play.
Nothing that he did in his subsequent life subtracts from that magic.
November 23, 2005
Premis prize
Digital asset management
The Premis Working Group has won the Digital Preservation Coalition's award for the best work in digital preservation this year. The work is for the Premis Data Dictionary. Interestingly, the award is sponsored by Paul McCartney - an appropriate song reference escapes me. Maybe 'the long and winding road'.
Congratulations go to Priscilla Caplan and Rebecca Guenther, who chaired the group, and to Brian Lavoie and Robin Dale, who were the OCLC and RLG liaisons, respectively. And to the hardworking members of the Group itself. This initiative marks the latest stage of the collaboration between RLG and OCLC in supporting the development of digital preservation frameworks. The group was made up of active participants from several countries, and this international composition was emphasized by the judging panel. From the DPC press release: The judges were impressed by the work PREMIS has done in compiling a "data dictionary" identifying core digital preservation metadata, which they have supported with practical examples and a software protocol. A key factor in the decision was the international scope of PREMIS, and the consensus building and collaboration that is so crucial in so many digital preservation issues. [Digital Preservation Coalition - Press Releases - Digital Preservation Award 2005] The Digital Preservation Coalition is a UK-based group. We should congratulate them for this award, as it is a nice way to showcase work in digital preservation.
November 23, 2005
Aggregate intentions
Libraries - organization and services, Metadata
I was interested to see the announcement about Ebsco and web services for bringing together Counter data. The wide acceptance of the Project COUNTER Code of Practice has assisted greatly in the standardization of how usage data is counted and presented. Libraries are now looking to consolidate this normalized data as input for collection development decisions. A new and sometimes significant challenge is the actual collection of reports for analysis. SUSHI was specifically introduced to solve the problem of harvesting this data. With SUSHI, the library's usage consolidation application (often tied to a library's e-resource management software) will be able to use the Web service to automatically retrieve data whenever desired. [Library Technology Guides: Display Article] In libraries, we have several clear sources of such 'intentional' data - data that records choices made by libraries and by users.
We have holdings data (a record of choices made by librarians), ILL data (a record of choices made by users and librarians), we have circulation data (a record of choices made by users), and we have database usage data (a record of choices made by users). Over time, the standardization, consolidation and syndication of this data is potentially valuable in several library service scenarious. For recommendation services, for collection analysis, and so on.
Related entry:
November 22, 2005
QOTD
Social networking
Nicholas Carr on Dan Farber on blogging: For all the self-important talk about social networks, couldn't a case be made that the blogosphere, and the internet in general, is basically an anti-social place, a fantasy of community crowded with isolated egos pretending to connect? Sometimes, it seems like we're all climbing up into our own little treehouses and eating jellybeans for breakfast. [Rough Type: Nicholas Carr's Blog]
November 21, 2005
Where is the web?
General - distributed environments, Libraries - distributed environments, User experience
Some commentary on how the experience of the web will move from one where destinations dominate to one where lightweight client-side approaches shape presentation and consumption. This is a world where links matter more: services built on URLs, maybe Coins, ....
From Darwinian Web: The explosion I am talking about is the shifting of a website's content from internal to external. Instead of a website being a "place" where data "is" and other sites "point" to, a website will be a source of data that is in many external databases, including Google. Why "go" to a website when all of its content has already been absorbed and remixed into the collective datastream. [Darwinian Web: Friday, November 18, 2005] And Dion Hinchcliffe quoting from Dan Saffer of Adaptive Path: The tools we'll use to find, read, filter, use, mix, remix, and connect us to the Internet will have to be smarter and do a lot more work than the ones we have now. Part of that work is in formatting. Who and what determines how something looks and works? On the unstructured side of the continuum, perhaps only a veneer of form will remain. "Looks" will be an uneasy mix of the data and the tools we use to view it. Visual design is moving away from its decentralized locations on websites. Indeed, design is becoming centralized in the tools and methods we use to view and interact with content. Firefox users can already use extensions like Adblock, and especially Greasemonkey, to change the look of the Web pages they visit. RSS readers let users customize how they want to view feeds from a variety of sources. Soon, expect to see this type of customization happening with bits of functionality as well as content. [Tolerance and Experience Continuums (web2.wsj2.com)] Initial links via Read/write web which also has an interesting picture about the use of RSS.
November 21, 2005
QOTD
ebooks and other e-resources
The NYT story on Sid Verba has been noted in several places. This quote caught my eye. James Hilton, the interim university librarian at the University of Michigan, for example, said that he asked his staff a year ago to estimate how long it would take to digitize the library's seven million volumes. The answer was more than a 1,000 years. [At Harvard, a Man, a Plan and a Scanner - New York Times]
November 21, 2005
Do you mesh here often?
General - distributed environments
Every now and again you read something that shifts your thinking a little. This happened to me just now when I read the following from Ray Ozzie, CTO at Microsoft: As an industry, we have simply not designed our calendaring and directory software and services for this "mesh" model. The websites, services and servers we build seem to all want to be the "owner" and "publisher"; it's really inconsistent with the model that made email so successful, and the loosely-coupled nature of the web. [Ray Ozzie: Really Simple Sharing] This is not new, but something is being said which combines the notion of 'mesh' (what I have been calling ' intrastructure' in less than gripping fashion ;-) and the notion of 'owner' or 'publisher' that takes hold when somebody imagines that their resource is the single focus of a user's attention.
The extension to the library is clear. The broader mesh metaphor is apt: the library - data, services, people - need to 'mesh' with user environments, rather than standing aloof. More specifically, there are places where one wants to 'synchronize' data - whether with the user environment of reading lists, recommendations, and so on - or within library processes, where similar functionality would be good (whether or not SSE is the mechanism).
Incidentally, Ray Ozzie is writing about SSE - Simple Sharing Extensions - a new specificatin which extends RSS to support bidirectional flow. Just as RSS enables the aggregation of information from a variety of data sources, SSE enables the replication of information across a variety of data sources. Data sources that implement SSE will be able to exchange data with any other data source that also implements SSE. [XML Developer Center: Frequently Asked Questions for Simple Sharing Extensions (SSE)]
November 20, 2005
Discover, locate, ... vertical and horizontal integration
Libraries - distributed environments, Libraries - organization and services, Metadata
I was involved in some work years ago which developed the discover-locate-request-deliver string of verbs to talk about integrating library services. One emphasis of the work was that discovery was one part only of a whole chain (discovery2delivery - D2D) through which requirements were met. Requiring the user to complete the D2D chain by manual interactions dampened library use: writing down the results from an A&I; search and then looking in the catalog to see if the journals were held, for example. As we look at resource sharing environments, we still see that we have imperfectly integrated the D2D verbs. In fact, the integration has been greater with journals as a major focus of the OpenURL resolver is to join up the D2D chain. One wonders whether it will make sense to put the catalog behind the resolver also, and it is certainly interesting to see the importance of resolution in some of the examples below. I now think of the verbs in this way:
- Discover. Discover that a resource exists. Typically, one may have to iterate to complete the discovery experience: search or browse candidate A&I; databases, for example, and then search selected ones. The publish/subscribe model is increasingly important to discovery, as users subscribe to syndicated feeds. One of the major issues facing library users is knowing where to search or subscribe to facilitate relevant discovery
- Locate. Discover services on found resources. A service may be as simple as notifying somebody of a shelf location. Resolvers are important here: an OpenURL resolver will return services decided to be available on the resource indicated in the OpenURL.
- Request. Request a service. A user may select and initiate a found service.
- Deliver. The service is executed. A book is delivered, a document downloaded, or whatever.
Of course, other services will be deployed along the way: authorization, authentication, tracking, billing, etc.
What the web does is give us an integrated discover-locate-request-deliver experience. Some sophisticated infrastructure supports this concatenation: crawling and indexing by search engines, DNS resolution, ....
In library services the joins are more visible, and many of the places where one wants integration are precisely at the seams between these pocesses. Think horizontal and vertical as in the picture. The joins are horizontal where one wants to move between the processes, to traverse process boundaries. Having discovered that an article exists, one wants to find services that will make it available, and select one (or maybe have all of this done for you in the background, just as it does with a web page). The horizonal joins are most likely to be achieved within monolithic systems: the library catalog for example, which may allow you to discover, locate, request and have delivered items. Living in Ohio, one is very aware of the value to faculty and others of OhioLink. OhioLink closely integrates the D2D process for books on a systemwide level within Ohio higher education, and creates great value for its participants and users in so doing.
The joins are vertical where one wants to integrate activities within processes: metasearch is a topical example, where one is trying to integrate discovery across many resources. One may want to locate an item or service in several places - Amazon, the local catalog, a group of catalogs within a consortium - and present back to the user options for purchase or borrowing with indications of cost and/or likely delivery times. A request may be initiated through Inter-Library Lending or through a purchase order, and so on.
Much of the complexity of constructing distributed library systems arises from traversing the boundaries between these processes (horizontal integration) or from having unified interaction with services within a particular process (vertical integration). An example of the former is the difficulty of interrogating local circulation systems for status information; an example of the latter is differences in metadata schema or vocabularies across database boundaries.
I was reminded of the discover-locate-request-deliver string as I have been looking at various publicly available union/group activities recently, and these words crop up from time to time: - RedLightGreen offers a rich discovery experience, based on aggregate data from the RLG union catalog. It also has a marvelous name ;-) - one of the few library initiatives to have a name worthy of the Internet times we live in. I speculate that it has not had the traction that one might have expected because it does not integrate the locate-request-deliver verbs so well into the discover experience.
- The recently visible Talis Whisper demonstration site gives a nice indication of how one might tie these things together, although not all the joins appear to be working in the available site. Interestingly, it offers the user tabbed access to discover, locate and borrow processes.
- The European Library (TEL) has a facility to search across European national libraries. This somewhat confuses the discovery experience as results are not rolled up into a single set for you. There is little integration of the other services. One can configure it with an OpenURL resolver of choice, but otherwise it does not offer much integration.
- CURL (which appears to have drifted clear of its acronymic mooring to become the Consortium of Research Libraries in the British Isles) lists as part of its vision to allow researchers, "wherever in the world", to "search, locate and request all resources, whatever their format, easily and quickly from the desktop". Some of those verbs again. One vehicle for achieving this vision is COPAC, a union catalog of the national libraries in the UK and 24 research libraries in the UK and Ireland. COPAC offers discovery over its constituent catalogs. Again, it allows outward OpenURL linking through an experimental user interface, using the OpenURL Router to land in the appropriate institutional resolver. (The OpenURL Router is a UK service which provides a central registry of OpenURL Resolvers. It is similar to, and preceded, OCLC's OpenURL Resolver Regisry.)
- OCLC's OpenWorldcat does not currently have a destination site; rather, entries may be discovered in Yahoo or Google, or be directly linked to. Where we recognize a user's IP address we offer services (deep link to OPAC, user-initiated ILL, resolver) which we know they are authorized to use.
This cursory overview shows that we have intermittently and imperfectly managed to integrate location, request and delivery into systems whose focus is still largely discovery. However, discovery without fulfilment is of limited interest to an audience which wants D2D services which are quick and convenient, and which hide the system boundaries which need to be traversed in the background. I am also surprised, especially given the linking of discover services to locate services through the resolver in the journals arena, that we have not seen more linking of general discover services (e.g. Amazon) to library locate services (e.g. catalog/circ).
To complete the D2D chain efficiently in open, loosely coupled environments (that is, not within closed communities with tightly integrated systems environments) will require quite a bit of infrastructure development. Much of this relies on better metadata about institutions (libraries, branches), collections (databases, library collections, ...) and services (how to connect to catalogs, ILL systems, resolvers, e-commerce sites, ...), as well as about policies (for example, who can borrow from us and under what conditions) and terms. It is for this reason that we are seeing greater interest in registries and directories which will provide the ability to discover, locate, request and have delivered resources more effectively.
The picture is taken from an early presentation [ppt]I did at OCLC at a seminar organized by Erik Jul.
Related entries and article:
November 17, 2005
Storage and logistics
Libraries - organization and services
Optimising storage and access in UK research libraries [pdf], a report commissioned by The Consortium of University Research Libraries (CURL), in the UK, and the British Library, has been made available. It is addressing an issue which is arising now across the library community, and is being addressed in various regional, national and other policy regimes: are there sensible ways of managing the accumulating print collections in collective ways? This is becoming critical, both because of the pressure on library space, and the cost of redundantly managing print collections. The focus here is on serials but monographs are also considered. Some notable points: - The report estimates that by 2015, CURL libraries would need as much as an additional 350KM of shelving to accommodate growth.
- There is a brief review of collaborative storage approaches in other parts of the world. CARM is noted especially: this is a scheme run by Caval in Australia.
- The authors present several scenarios, but come down to recommending that an approach be based on the British Library holdings. They suggest that a National Research Reserve could be built on this basis, with libraries contributing materials that the BL does not have.
The British Library Document Supply Centre was established as a logistics hub for the UK library community. It was built to efficiently process inventory, a large number of requests, and distribution in the mail system. It is interesting to see how this logistics role might be revisited in the context of discussing how best to manage the collective CURL collection.
November 16, 2005
2b?Ntb?=?
Books, movies and reading ...
British student phone service dot mobile is providing a service which compresses classics into a text message idiom to help with revision. Here is the summary of Pride and Prejudice: 5SistrsWntngHsbnds.NwMenIn Twn-Bingly&Darcy; Fit&Loadd.; BigSisJaneFals4B,2ndSisLizH8s DCozHesProud.SlimySoljrWikam SysDHsShadyPast.TrnsOutHes ActulyARlyNysGuy&RlyFancysLiz.SheDecydsSheLyksHim.Evry1GtsMaryd.; [Guardian Unlimited Books | News | If you don't want to know how Bleak House ends, look away now] For those that are struggling, there is a 'translation' in the article.
November 15, 2005
Tag teams
Search , Social networking
There is an interesting article in Business 2.0 about the 'Flickrization of Yahoo'. Indeed, the Flickr purchase helped ignite a larger strategy. Thanks to a new generation of managers like Butterfield and Fake, Yahoo is starting to see how user-generated content, or "social media," is a key weapon in its war against Google ('GOOG'). That upstart in neighboring Mountain View may have a better reputation for search, it may dominate online advertising, and it may always win when it comes to machines and math. But Yahoo has 191 million registered users. What would happen if it could form deep, lasting, Flickr-like bonds with them -- and get them to apply tags not just to photos, but to the entire Web? [Business 2.0 :: Magazine Article :: Features :: The Flickrization of Yahoo] An example is the new Shoposphere service, where Yahoo has introduced user assembled picklists, tagging, revenue sharing and APIs to encourage the development of 'conversations' which may drive sales. At the same time, Amazon has introduced tagging capability.
The more I see of tagging the more like a conversation it seems. As I say below ... Given all that, I find the most interesting recent development to be the new search capacity on del.icio.us. It allows navigation within the aggregate personal choices of its users - in terms of tags and links. And in fact, as I used it, it struck me that following tags is like conversation in some ways: in each case, you get hints and pointers, you fork in different directions, you are at the intersection of a variety of interests, and you are woven into a fabric of reference which may have more or less to do with the world outside the conversation. [Lorcan Dempsey's weblog: Search results]
Shoposphere and Amazon references via Techcrunch.
November 15, 2005
Calendar
Marketing
The University of Edinburgh has published a calendar with images from its special collections ...
November 14, 2005
FRBR fervor
Knowledge organization and representation, Metadata
FRBR is another of those things that we have given a public-unfriendly name to. This is a pity, because the concept is one that makes a lot of sense to people when they understand it. For example, in discussions about how OpenWorldcat data is surfaced in search engines this is one of the things that quickly comes up. For many people, it just seems natural to roll up the various manifestions and so on under the work, and then to drill down into detail as required. Here is David Weinberger talking about how we mean several things when we talk about a 'book': First, books are way complex. What is Hamlet? Any book of the play? The Signet edition? A reprint of the Signet edition? The Signet edition with a new preface? With errata corrected? The Signet large print edition? The German translation? The original manuscript? Hamlet in the one-volume Collected Works? This matters because when you're looking for a copy of Hamlet, you're acting as if that were unambiguous when in fact there are various forms of the book that will or will not satisfy you. This is the type of complexity that drives people to create ontologies. Short of that, xISBN tries to cluster books in reasonable ways. . And there's a standard (I can't lay my hands on it now -- FRBR? -- I'm slightly on the road) that lays out the various levels of abstraction. [Joho the Blog: Two points that didn't fit into The Globe] The Globe article he mentions is here, and he notes the need for more fine-grained identifiers (for chapters, illustrations, etc) and the desirability of much more metadata for books, created by users (reviews, annotations, and so on). We agree about the latter, and are beginning to collect user-generated metadata (we can do more). The former is interesting. Not only is there a granularity issue, there is an abstraction one which comes back to the question about what is a book? The ISBN, for example, is typically applied at the manifestation level (in FRBR terms). Do we need an identifier for works? What resolution and registration services would be useful so as to be able to tie together identifiers for the multiplying versions (think of the various digitization initiatives).
November 12, 2005
QOTD
Research, learning and scholarly communication
From the Economist: Although search technology is constantly tweaked to provide better performance and more relevant results, studies by Microsoft have shown that around half of all search queries fail to provide the information that users want. "We need to get offline content online. Offline is where trusted content is, and where people who need to answer questions go," explains Danielle Tiedt, manager of search content acquisition at MSN. "Books are only the first step," she says. [Pulp Friction. Economist, November 10 2005] It is interesting to wonder what it will mean for libraries when the bulk of scholarly literature is much more easily available than it is now ...
November 12, 2005
Blogs, media and voice
Books, movies and reading ..., Social networking
Ian Buruma, author and Henry R. Luce Professor of Human Rights and Journalism at Bard College writes an interesting piece about the soundtrack of protest, and finishes with some comments about blogging: As the mainstream media, especially in the US, have become part of the same corporate entertainment empires that own most popular music, there is little or no room for a new Edward Murrow to stick his knife into the powers that be. But the blogosphere is buzzing with life. Subversion of all kinds, much of it mad and malicious, has been privatised, as it were. If hip-hop and rap fill large niche markets, internet journalism fills millions of niches, some of them no bigger than the author him or herself. [Guardian Unlimited | Arts front | Silent protest] A characteristic of the more popular blogs is that they have a 'voice', a very particular personal presence. And it is not surprising that David Brooks and Paul Krugman regularly feature in the top ten Technorati searches, as high-profile print 'voices'. Library blogs are probably more edgy and current than our somewhat dreary mainstream print literature, and they cater for niche interests.
I sometimes wonder if it is significant that Blaise Cronin and Michael Gorman are both strong library 'voices': they are prolific and self-consciously stylish contributors to that print literature, across the scholarly to popular spectrum.
|