Skip to content

Commit

Permalink
New instructions for cmake
Browse files Browse the repository at this point in the history
  • Loading branch information
ustramooner committed Jul 4, 2008
1 parent eb07acb commit 6b6f97a
Show file tree
Hide file tree
Showing 2 changed files with 150 additions and 132 deletions.
209 changes: 140 additions & 69 deletions INSTALL
Original file line number Diff line number Diff line change
@@ -1,69 +1,140 @@
Linux Build instructions
=======================================

If you downloaded CLucene as a tar ball you should be able to skip straight
to the section titled 'building', otherwise read the next section


Rebuilding the autobuild scripts
--------------------------------
If you made changes to the configure.ac or any of the Makefile.am
files you will also need to run through this process.

Requirements:
GNU autotools is required. I have the following versions installed:
Autoconf 2.57
Automake 1.72
Libtool 1.5a

If you use significantly older versions, I can almost guarantee
issues. This is because each of the autotools is constantly changing
with little regard to backward compatability or even compatiability
with the other autotools.

Run the autogen.sh file in the root directory of clucene to run the necessary commands.


Building
--------
The following will get you building assuming that you have suffciently
recent buld tools installed.
1.) unpack tarball
2.) cd into clucene
3.) if you downloaded a tar version skip to 5
4.) run ./autogen.sh
5.) run ./configure
6.) run make
7.) things will churn for a very long time, the clucene library will
be built as well as the examples.
8.) check the src/demo, test and src directory

In src/demo you should see:
cl_demo

In test you should see
cl_test

In src you should see:
libclucene.so.0.0.0 libclucene.la libclucene.a
and symbolic links to these files.

9.) If you want to run make install to copy the clucene files into the system
include and lib directories
10.) You may have to run
export LD_LIBRARY_PATH=/path/to/clucene/lib

11.) run ./cl_test in the test directory and check that the tests all run

Alternative (faster) way of building:
-------------------------------------
This method does not create library files, so depending on your needs you may not
find this method useful.

* Do steps 1-5 of the previous build process.
* Change directory into src/
* run make monolithic
* Change directory into test/ (cd ../test/)
* run make monolithic
* You should see cl_test_monolithic in this directory
* run ./cl_test_monolithic and check that the tests all run
* There are packages available for most linux distributions through the usual channels.
* The Clucene Sourceforge website also has some distributions available.

Also in this document is information how to build from source, troubleshooting,
performance, and how to create a new distribution.


Building from source:
--------------------

Dependencies:
* CMake version 2.4.2 or later.
* A functioning and fairly new C++ compiler. We test mostly on GCC and Visual Studio 6+.
Anything other than that may not work.
* Something to unzip/untar the source code.

Build instructions:
1.) Download the latest sourcecode from http://www.sourceforge.net/projects/clucene
[Choose stable if you want the 'time tested' version of code. However, often
the unstable version will suite your needs more since it is newer and has had
more work put into it. The decision is up to you.]
2.) Unpack the tarball/zip/bzip/whatever
3.) Open a command prompt, terminal window, or cygwin session.
4.) Change directory into the root of the sourcecode (from now on referred to as <clucene>)
# cd <clucene>
5.) Create and change directory into an 'out-of-source' directory for your build.
[This is by far the easiest way to build, it has the benefit of being able to
create different types of builds in the same source-tree.]
# mkdir <clucene>/build-name
# cd <clucene>/build-name
6.) Configure using cmake. This can be done many different ways, but the basic syntax is
# cmake [-G "Script name"] ..
[Where "Script name" is the name of the scripts to build (e.g. Visual Studio 8 2005).
A list of supported build scripts can be found by]
# cmake --help
7.) You can configure several options such as the build type, debugging information,
mmap support, etc, by using the CMake GUI or by calling
# ccmake ..
Make sure you call configure again if you make any changes.
8.) Start the build. This depends on which build script you specified, but it would be something like
# make
or
# nmake
Or open the solution files with your IDE.

[You can also specify to just build a certain target (such as cl_test, cl_demo,
clucene-core (shared library), clucene-core-static (static library).]
9.) The binary files will be available in <clucene>build-name/build-type/bin
10.)Test the code. (After building the tests - this is done by default, or by calling make cl_test)
# ctest -V
11.)At this point you can install the library:
# make install
[There are options to do this from the IDE, but I find it easier to create a
distribution (see instructions below) and install that instead.]
12.)Now you can develop your own code. This is beyond the scope of this document.
Read the README for information about documentation or to get help on the mailinglist.


Troubleshooting:
----------------

'Too many open files'
Some platforms don't provide enough file handles to run CLucene properly.
To solve this, increase the open file limit:

On Solaris:
ulimit -n 1024
set rlim_fd_cur=1024


Code style
--------------

Memory management:
Memory in CLucene has been a bit of a difficult thing to manage because of the
unclear specification about who owns what memory. This was mostly a result of
CLucene's java-esque coding style resulting from porting from java to c++ without
too much re-writing of the API. However, CLucene is slowly improving
in this respect and we try and follow these development and coding rules (though
we dont guarantee that they are all met at this stage):

1. Whenever possible the caller must create the object that is being filled. For example:
IndexReader->getDocument(id, document);
As opposed to the old method of document = IndexReader->getDocument(id);

2. Clone always returns a new object that must be cleaned up manually.

Questions:
1. What should be the convention for an object taking ownership of memory?
Some documenting is available on this, but not much


Performance
-----------
Very little benchmarking has been done on clucene. Andi Vajda posted some
limited statistics on the clucene list a while ago with the following results.

There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
6108kb of HTML text.
org.apache.lucene.demo.IndexFiles with java and gcj:
on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
. running with java 1.4.1_01-99 : 20379 ms
. running with gcj 3.3.2 -O2 : 17842 ms
. running clucene 0.8.9's demo : 9930 ms

I recently did some more tests and came up with these rough tests:
663mb (797 files) of Guttenberg texts
on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
• Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
• Clucene: 232141. peak mem usage ~60, avg ~4mb ram

Searching indexing using 10,000 single word queries
• Jlucene: ~60078ms and used ~13mb ram
• Clucene: ~48359ms and used ~4.2mb ram

Distribution
------------
CPack is used for creating distributions.
* Create a out-of-source build as per usual
* Next, check that the package is compliant using several tests (must be done from a linux terminal, or cygwin):
# cd <clucene>/build-name
# ../dist-check.sh
* Make sure the source directory is clean. Make sure there are no unknown svn files:
# svn stat ..
* Run the tests to make sure that the code is ok (documented above)
* If all tests pass, then run
# make package
for the binary package (and header files). This will only create a tar.gz package.
and/or
# make package_source
for the source package. This will create a ZIP on windows, and tar.bz2 and tar.gz packages on other platforms.

There are also options for create RPM, Cygwin, NSIS, Debian packages, etc. It depends on your version of CPack.
Call
# cpack --help
to get a list of generators.

Then create a special package by calling
# cpack -G <GENERATOR> CPackConfig.cmake

73 changes: 10 additions & 63 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ the Apache License, Version 2.0
See the LGPL.license and APACHE.license for the respective license information.
Read COPYING for more about the license.


Installation
------------
* For Linux, MacOSX, cygwin and MinGW build information, read INSTALL.
* Boost.Jam files are provided in the root directory and subdirectories.
* Microsoft Visual Studio (6&7) are provided in the win32 folder.
Read the INSTALL file


Mailing List
------------
Expand All @@ -31,57 +31,25 @@ Find subscription instructions at
Suggestions and bug reports can be made on our bug tracking database
(http://sourceforge.net/tracker/?group_id=80013&atid=558446)


The latest version
------------------
Details of the latest version can be found on the CLucene sourceforge project
web site: http://www.sourceforge.net/projects/clucene


Documentation
-------------
Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/
You can also build your own documentation by running doxygen from the root directory
of clucene.
You can build your own documentation by running 'make DoxygenDoc' from your
'out-of-source' cmake-configured build directory.
CLucene is a very close port of Java Lucene, so you can also try looking at the
Java Docs on http://lucene.apache.org/java/
There is an online version (which won't be as up to date as if you build your
own) at http://clucene.sourceforge.net/doc/html/


Performance
-----------
Very little benchmarking has been done on clucene. Andi Vajda posted some
limited statistics on the clucene list a while ago with the following results.

There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
6108kb of HTML text.
org.apache.lucene.demo.IndexFiles with java and gcj:
on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
. running with java 1.4.1_01-99 : 20379 ms
. running with gcj 3.3.2 -O2 : 17842 ms
. running clucene 0.8.9's demo : 9930 ms

I recently did some more tests and came up with these rough tests:
663mb (797 files) of Guttenberg texts
on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
� Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
� Clucene: 232141. peak mem usage ~60, avg ~4mb ram

Searching indexing using 10,000 single word queries
� Jlucene: ~60078ms and used ~13mb ram
� Clucene: ~48359ms and used ~4.2mb ram

Platform notes
--------------

'Too many open files'
Some platforms don't provide enough file handles to run CLucene properly.
To solve this, increase the open file limit:

On Solaris:
ulimit -n 1024
set rlim_fd_cur=1024

Acknowledgments
----------------

The Apache Lucene project is the basis for this software, so the biggest
acknoledgment goes to that project.

Expand All @@ -91,26 +59,5 @@ make up portions of the CLucene software:
This software contains code derived from the RSA Data Security
Inc. MD5 Message-Digest Algorithm.

CLucene relies heavily on the use of autoconf and libtool to provide
a build environment.

Memory Management
------------------
Memory in CLucene has been a bit of a difficult thing to manage because of the
unclear specification about who owns what memory. This was mostly a result of
CLucene's java-esque coding style that was a result of porting from java to
c++ without too much re-writing of the API. However, CLucene is slowly improving
in this respect and we try and follow these development and coding rules (though
we dont guarantee that they are all met at this stage):

1. Whenever possible the caller must create the object that is being filled. For example:
IndexReader->getDocument(id, document);
As opposed to the old method of document = IndexReader->getDocument(id);

2. Clone always returns a new object that must be cleaned up manually.

Questions:
1. What should be the convention for an object taking ownership of memory? Documenting this would be a minimum.


CLucene relies heavily on the use of cmake to provide a stable build environment.

0 comments on commit 6b6f97a

Please sign in to comment.