Notes from the UBiRD and INVISQUE meeting (11 August 2009)

Both project team members met up today – 11 Aug 2009, 1400-1600h, Room SB26, Sheppard Library.

Attending: Nazlin Bhimani, Hanna Stelmaszewska, Neesha Kodagoda, and William Wong

1. We reviewed the key user information search and retrieval strategies being identified by the UBiRD project. We may use frameworks such as Ellis’ (1998) information seeking behaviour model to help organise the data describing the users’ information search strategies. This framework would then give us a basis for comparing strategies across the different categories of users at each stage of the information search process, and within each of these stages.

2. From an INVISQUE perspective, we also reviewed a number of internet-based information search tools, such as Grokker, AllPlus, and publishers’ resource discovery tools such as that provided by EBSCO, and we also compared their capabilities with that of an early prototype we have developed for another project intended to assist users with low literacy to find information in complex data sets such as the Citizens’ Advice Bureau. The primary purpose of this was to
identify how key functionality has been implemented in current search tools.

3. Following the review, what became apparent was the large difference between the search strategies practised by the users (UBiRD study) to find scholarly information on the internet, and the information search strategies supported by current advanced search tools such as AllPlus and Grokker (INVISQUE). The next stage of our work will be (i) to find
the necessary evidence to establish this difference, using findings from the UBiRD and INVISQUE studies, (ii) articulate the nature of this gap in a way that will be useful for specifying what the INVISQUE system should be capable of providing, and (iii) develop a set of specs for the design of the future INVISQUE interface.

UBiRD and INVISQUE meeting (9 July 2009)

We organised our first UBiRD and INVISQUE projects combined meeting at Middlesex University. Both projects were JISC funded, although INVISQUE is part of the Rapid Innovation Programme.

The purpose of UBiRD is to identify and describe user information search and retrieval behaviours when using library resources and publishers’ databases to locate scholarly materials such as books and journal articles. The preliminary user behaviours identified at this early stage of the UBiRD project was intended to be used to inform the development of innovations in the resource discovery systems which will be developed in the INVISQUE project.

We called the day-long meeting an IDEAS Day. The project brought together the two teams to listen to the UBiRD findings on user behaviours so far, a review of the state of the art in information search tools and practices, e.g. using vertical search, 360 search as well as visual search tools in the context of the social networking tools, and to then brainstorm how these might influence and advance our IDEAS to develop prototypes for INVISQUE.

We also carried out a video-conference call with Dave Pattern at University of Huddersfield to discuss his ideas for enhancing the book search process implemented in Huddersfield’s library catalogue.

Some interesting findings so far, based on field studies with 18 students and researchers from LSE, Cranfield and Middlesex Universities:

Hanna Stelmaszewska presented the initial finding of the UBiRD study. The issues discussed included:
How students and researchers search for scholarly information.

- unless directed to academic sources such as EBSCO, Emerald etc,
students were most likely to resort to tools they are more familiar with, such as Google, where the expectation of the students is to find an answer (without necessarily verifying its correctness, but to just find an answer) as quickly as possible.

- in addition to commonly used search tools such as Google and Google Scholar, students and researchers have been using tools such as YouTube, and personal networks, as well as social networking software to ask people for help to find known or not-known links. The use of social networking software is new, although should not be surprising, as it in some ways, parallels our own professional behaviours as we ask colleagues for suggestions and leads. The physical library was the source of last resort.

- there seems to be a difference in the strategies used by students from different backgrounds. Their information literacy skills does not appear to correlate with their digital literacy (i.e. ability to use technology and gadgets such as iPhones, etc) skills.

Some thoughts:
- we need to determine if there are common strategies (e.g. how do users determine if the site they are using is a high scholarly quality site?) for the different user groups (e.g. same university, or same country if international students, or if part-time students).

- although it may not be part of this project, it would still be
interesting to profile how the different institutions teach information literacy, and to assess the uptake of such courses and their effects on students’ and researchers’ information search skills.

Search strategies
- most of current systems provide a basic “quick” search and an “advanced” search. Our findings so far suggests that novices or students tend to avoid the “advanced” search, thinking / assuming that it is really for the advanced researcher, and that it would be beyond their skill level to use them.

- in addition, those who used advanced search, often use ‘safe’ strategies, i.e. they insert specific information that is available, e.g. known author, dates or parts of a title. Such a search is
unlikely to reveal unanticipated associations.

- users new to such search often type in the complete title or sentence of the exercise they have been assigned into the keywords.

Further analysis will be carried out to determine their reasoning in how they narrow down their searches during the query formulation stage to identify specific possible candidate documents, and to then broaden it out to other relevant documents.

- this line of discussion led us to at least three categories of ‘results’ that a user will / should be interested in: (i) co-borrowing, (ii) co-citations, and (iii) context based tags.

- this also had implications for the kind of system architecture that would enable this to occur. While we discussed the notion of “fusion”, it is more likely that we will be needing a database architecture that enables “mapping” and “connecting”, rather than “fusing”.

- the lack of ‘spell checker’ or ‘did you mean this …’ in library search systems has quite significant consequences. Users may type in an incorrectly spelt term assuming it is correct, and as a result, the system responds as perhaps, ‘no books available’ on the (incorrectly spelt) topic, re-directing or diverting the search path of a student or researcher.

- “time out session” is a problem as when the system times out, it often loses all trace of a search activity.

Some thoughts on searching:

- what is a ‘powerful’ search term?
- is there a taxonomy of good / bad search strategies?
- what makes a good query or search? what are its attributes?
- useful insight: we should perhaps be asking “What does a better query look like?” and how this can be presented to assist a novice in improving the way they formulate their queries?
- while others have used “stop words” and other indexing techniques such as TFIDF (Terms Frequency, Inverse Document Frequency) techniques, how to make it meaningful from a user’s perspective?
- formulating a query requires certain specific knowledge, e.g. structure of the domain, some basic language or knowledge of the domain, and features and functions of the tools available to construct the query.
- providing a trace or discovery path (“oh, this is where I’ve been”) is useful in helping the user ‘see’ where they’ve been or should have been in their search for information.
- we need to articulate the assumptions behind user
searches, as what seems obvious to a software developer, seem to confuse or is counter-intuitive as far as the user is concerned.

Huddersfield library catalog system (Dave Pattern)
- the virtual bookshelf idea is a useful one to support serendipity and chance discovery
- based on data mining of high frequency combinations
- the work there is based on a single database (the library catalog, rather than on attempting to locate resources across several data sets)
- future systems should encourage or foster ‘discovery’ rather than ‘federated search’.
- techniques employed at Huddersfield for narrowing down query is based on borrowing profiles, e.g. 1st year mid-wifery students mainly borrowed books on …
- Dave also advised that a keyword and associated keywords data set is available and can be sent to us
- tag clouds as implemented on the Huddersfield lib cat system only shows one layer: the search terms used previously that are correlated with the current search.

State of the Art in Information Search Tools and Practices (Nazlin)
- a number of different tools were presented that show current leading edge on how to search for information on the web. these include: zotero, a powerful bookmarking and note-making system; MultiSearch that sends search queries to specified databases through a ‘portal’; AuthorMapper, and 360 tool; the Visual Search hierarchy tool; Twitter.
- Edinburgh’s AquaBrowser is primarily a vertical search engine, but has an implementation of what appears to be a “multi-level” tag cloud with a discovery trail. although an advance on the single layer tag cloud, still makes it hard for a user to ‘see’ where he has come from or visited, and is a little limited in its used in finding content or documents. Glasgow’s Encore lists books in the university’s library,
and shows Table of Contents and summary information with links to external sources as well.
- Cluuz,, doodlebuzz, quintura, viewzi, and others also provide other ways of searching.

Some thoughts:
- the new searching environment now seems to be a combination of (i) active searches as traditionally practised, and (ii) passive searches such as monitoring or following a Twitter stream, which can provide useful information / leads to carry out further searches in other search tools.
- what is important is to make an interface _both_ attractive and intelligent (i.e. show semantic or conceptual associations, rather than just frequency based correlations).