[GSEARCH corpus
		   search toolkit] (22k)

Welcome to the Gsearch homepage.

Gsearch is a tool designed to facilitate the selection of sentences from text corpora by syntactic criteria, even where these corpora contain no prior syntactic markup.

The Gsearch developers can be contacted at gsearch-dev@informatics.ed.ac.uk.






On this page, you can find:






Latest News:

Apr 28, 2005

Gsearch has a minor update: version 2.07 contains a bug fix for the BNC filter. This is now the default version for download. Also available is a patch for version 2.06w (instructions are in the file).

Oct 10, 2001

Gsearch has a minor update: version 2.06w is aware of Windows-style line endings, for easier operation under Cygwin. This is now the default version for download. Also available is a patch for version 2.06 (instructions are in the file).

Sep 30, 2001

Steffan Corley has experimented with a port of Gsearch to Windows. He found that Gsearch compiled and ran with no modification under Cygwin. This is very encouraging, though we would like to hear from other Windows user to see there are any liminations to Gsearch functionality under Cygwin.

Sep 14, 2001

Gsearch 2.06 is now available for download. Among the new features of this version are:

Gsearch 2.07 has been tested on Solaris (5.5.1, 5.6, 5.7, 5.8), Linux (RedHat 6.1, Mandrake 7.2), and MacOS X.

Please note that we do not intend to support versions of Gsearch prior to 2.07 in the future.






Sources and Documentation

Gsearch is designed to run on UNIX-like systems. To date, it has been tested on Solaris, Linux, MacOS X, and Cygwin under Windows.

You can obtain the full Gsearch distribution here (gsearch-2.07.tar.gz). The distribution contains the main Gsearch system, the user manual, a paper describing the Gsearch system, and several example grammars, filters, and pseudofield scripts.

You can also obtain the manual separately here (gzipped postscript).

Mailing List

If you download Gsearch, you are strongly encouraged to join the Gsearch users' mailing list. This is a low-traffic list used to discuss Gsearch usage and to announce new releases of Gsearch and associated software.

You can subscribe to the mailing list using this page.






Publications

Gsearch is described in

Note: this paper is included in the Gsearch distribution.

Other Publications which make use of Gsearch:





Gsearch Resources

Most publically available resources are currently distributed with Gsearch. If you have additional filters, grammars, or pseudofield scripts (or code enhancements) please contribute them to the Gsearch project by mailing gsearch-dev@informatics.ed.ac.uk.

Currently available:

BNCditrans.tar.gz
Grammar used with BNC to count NP-NP vs. NP-PP alternations for ditransitive verbs in English. Requires WN_anim.pl (distributed with Gsearch).
An improved raw.pl filter
The raw.pl filter (distributed with Gsearch) provides simple-minded segmentation of raw text files. With help from Garance Paris, this new version is locale-sensitive (much improving the segmentation of languages other than English).




The development of Gsearch was funded in part by the Economic and Social Research Council, and in part by collaborative ARC project 1024 of the British Council and the Deutscher Akademischer Austauschdienst.