Monday, September 24, 2007

Running Lucene inside your Oracle JVM

Since October of 2006 a new way to use Lucene is ready to use.
You can run Lucene inside your Oracle JVM because is pure Java Framework and get all benefits of them for your Oracle Application.
Why do I use Lucene inside Oracle?
Oracle include a full feature Enterprise Engine named Oracle Text made in C and fully integrated to the Oracle Engine, but:
  • you can not control which functionality will be included into next release
  • and you can not easily customize it for your needs
So why not load Lucene inside the Oracle JVM and integrate it to the Engine.
Luckily Oracle provides a Data Cartridge API, a powerful connector to implement custom index called Domain Index, Oracle Text is implemented using this API.
So uploading Lucene into the Oracle JVM and implementing a Domain Index is good option for many applications. OJVMDirectory is this option, the name is related to the component used to replace Lucene File System Storage by BLOB storage, here a simple list of points take into account to choose this storage:
  • Using traditional File System for storing the inverted index is not a good option for some users.
  • Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshallings.
  • Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
  • The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operations on tables are in the same memory space!!
Also there are some important points integrating Lucene by using Oracle Data Cartridge API:
  • Changes on rows are automatically notified to Lucene, now these changes are enqueued using Oracle AQ. User can control if these changes are applied On Line or Deferred using Sync operation in batch mode.
  • Oracle optimizer can choose a proper execution plan if there is Domain Index created
  • You can mix lcontains() and lscore() operators in your queries.
Latest addition sponsored by Lending Club provides many other functionalities such as:
  • Synchronized with latest Lucene 2.2.0 production
  • Replaced in memory storage using Vector based implementation by direct BLOB IO, reducing memory usage for large index.
  • Support for User Data Stores, it means you can not only index one column at time (limited by Data Cartridge API on 10g), now you can index multiples columns at base table and columns on related tabled joined together.
  • User Data Stores can be customized by users, it means writing a simple Java Class users can control which column are indexed, padding functionality, storage and so on.
  • There is a DefaultUserDataStore which gets all columns of the query and built a Lucene Document with Fields representing each database columns, these fields are automatically padded if they have NUMBER or rounded if they have DATE data, for example.
  • lcontains() SQL operator support full Lucene's QueryParser syntax to provide access to all columns indexed.
  • Support for DOMAIN_INDEX_SORT and FIRST_ROWS hint, it means that if you want to get rows order by lscore() operator (ascending,descending) the optimizer hint will assume that Lucene Domain Index will returns rowids in proper order avoided an inline-view to sort it.
  • Automatic index synchronization by using AQ's Call Back.
  • Lucene Domain Index creates extra tables named IndexName$T and an Oracle AQ named IndexName$Q with his storage table IndexName$QT at user's schema, so you can alter storage's preference if you want.
Now we are working in another additions:
  • Implementing Array DML API, it means faster response time with operation such as insert into .. select .. from DML and faster dequeuing operations.
  • Independent column format for each indexed columns including left padding with 0 for NUMBER, left padding with any char for VARCHAR2 (specially for columns which represent codes), XPath expressions for XMLType and rounding for DATE/TIMESTAMP.
  • In Line pagination information at lcontains() operator using Lucene Query Parser's syntax, for example select .. from ... where lcontains('rownum:[1 TO 10] AND f2:xx AND f3:yy') instead of using Top-N query syntax.
Well anybody can join to this Open Source efforts :)

44 comments:

Jhovanny said...

Hi Marcelo,

I am testing OJVM and comparing the performance compared to Oracle Text. For a 1.3MM rows table i indexed a column that contains an average of 26 characters per row. Running "Select * From LARGETABLE Where LCONTAINS(name,'word1')>0" takes 10.3 seconds against 0.8 seconds when replacing LCONTAINS for CONTAINS. In this sample, 'word1' appears in 23877 rows. I'm not using the native compilation, but i'll try that next.

Marcelo Ochoa said...

Hi Jhovanny:
First thanks for your feedback.
Oracle Data Cartridge API performs these steps in your query example:
Call ODCIStart passing all query info, such as index info and obviously the query "word1"
Then call to ODCIFetch with an output array of 2000 rowids, here is the overhead because we need to populate this array with 23000 rowids.
Finally call to ODCIClose.
Latest commited code has two special improvement:
- a LuceneDomainIndex.countHits function to use instead of select count ... from .. where lcontains
- a inline pagination syntax inside lcontains to avoid Top-N query, for example select ... from .. where lcontains(text,"ronum:[20 TO 40] AND word1",1) order by lscore(1) DESC;
Typically free text searching applications count a total number of hits and the paginate it in batch of 20 rows, for example Google does not accept query larger than 1000 hits, try it by your self ;)
Best regards, Marcelo.

Sujit said...

Hi marcelo,
I want to integrate OJVMDirectory into our database for providing the text search solution.

Can you please guide me what are the steps i need to perform for integrating this.

Thanks

Marcelo Ochoa said...

Hi Sujit:
You can download Lucene OJVM code source from SourceForge.net CVS using anonymous user (module ojvm):
http://sourceforge.net/cvs/?group_id=56183
Latest documentation is at:
http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg
Or you can download a binary distribution for 10g and 11g databases which is an straightforward installation option at:
http://sourceforge.net/project/showfiles.php?group_id=56183&package_id=255524&release_id=561159
Best regards, Marcelo.
PD: Remember that if you want to check performance at 10g databases you need to use NCOMPed installation, see docs for more detail.

vincent said...

Hi Marcelo,
I am very interested in using lucene within oracle as we have other requirements such as spatial integration. What I was interested in was whether lucene filters can be implemented within the embedded oracle lucene index and if so how would this be achieved.

Marcelo Ochoa said...

Hi vincent:
Basically you can put into Lucene Domain Index any functionality supported by Lucene and Java.
For example to use Lucene's sort we expanded lcontains() operator with an extra argument to inject Lucene's field used for sorting during the query and adding DOMAIN_INDEX_SORT optimizer hints Oracle returns a set of rows using Lucene sorting values.
If you want, just drop me an email with more information about your needs and I can provide more hints for integrating it with Lucene Domain Index.
Best regards, Marcelo.

Richard said...

Hi Marcelo,

Firstly, many thanks for the effort & time you have put into Lucene!

Here's my question:
I have an Oracle 10g database that already has CLOBs in tables (i.e. the text objects are stored in the database as columns in tables). Does Lucene work with data stored like this, or must it be BFILE data (i.e. text files stored in the filesystem outside the database)?

Marcelo Ochoa said...

@Richard:
Sure you can index your CLOB columns directly with Lucene Domain Index.
For production system we recomend to install Lucene Domain Index native compiled by NCOMP, there is a target on Lucene Domain Index binary distribution to automatically do that.
Lucene Domain Index works by default with any standard SQL types and with Oracle XMLType/CLOB.
See complete information and examples at:
http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg
Lucene Domain Index replaces Lucene's inverted index storage, by default in files, by an OJVMDirectory which replaces files by Oracle BLOBs stored in a table.
Best regards, Marcelo.

Mark said...

Hi Marcelo

I have the Lucene index working, and can run a query like this:

select id from x where lcontains (y,'AAA',0)>0)

But have you succesfully tried to use it in PL/SQL ?
I'm just disconnected from the database and get some hopeless error if I try the query in pl/sql:

begin for r in (select id from x where lcontains (y,'AAA',0)>0) loop null; end loop; end;
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 30054
Session ID: 138 Serial number: 76

thanks.
Mark

Marcelo Ochoa said...

Hi Mark:
Which database version are you using?
There is similar error on 11g which requires a patch 6445561 available at Metalink. Here a complete description:
6445561 - ORA-00600 [26599] [62] DUE TO INCORRECT PERSISTENCE OF BY INVOKER PIN
There is code example at the file db/testSourceBig.sql which reproduce the problem.
If you are not using 11g, a .trc information will be helpful.
To continue with the thread we can move the question/answer to this help list:
http://sourceforge.net/forum/forum.php?forum_id=187896
Best regards, Marcelo

Mark said...

Hi Marcelo,

indeed I'm using 11g. I'll try the patch and will let you know if that fixes the problem.

Thanks!
Mark

Anonymous said...

Hi Marcelo,

We are using oracle 9i, please let me know how we can use the lucene with oracle 9i.

Thanks,
Radhakrishna

Marcelo Ochoa said...

Hi Radhakrishna:
Oracle 9i is bundled with a JDK 1.3 compatible JVM.
Lucene 2.x is JDK 1.4+ compliant, and will be 1.5+ in 3.0 release.
I don't know if the code of Lucene can be downgrade to 1.3 :(
There is a tool named retrotranslator which do this job, but I never tried it.
Also there are PLSQL code which is tested against 10g Data Cartridge API and I didn't checked them in 9i.
Best regards, Marcelo.

sanok said...

Hi Marcelo!
First of all, thank you for your Lucene Domain Index project.

Can Lucene Domain Index be partitioned? When I tried to create partitioned lucene index

------------------------------------------
create index fts_lidx on luc_part_tbl(txt)
indextype is lucene.LuceneIndex local
------------------------------------------

I have got error "ORA-29855. Error occured in ODCINDEXCREATE routine. Cause: Failed to successfully execute the ODCIndexCreate routine. Action: Check to see if routine has been coded correctly."

It works fine without "local" keyword at the end. Here is my test table:

CREATE TABLE LUC_PART_TBL
( ID NUMBER,
TXT VARCHAR2(200),
TYP NUMBER
)
PARTITION BY RANGE (TYP)
(PARTITION "X" VALUES LESS THAN (1) ,
PARTITION "Y" VALUES LESS THAN (2));

I tried to examine the problem by browsing source code. I found that lucene indextype created as partitioned...

----[File LuceneDomainIndex.sql]---------
create indextype LuceneIndex
...
with local range partition --<<<< It seems that partitioning should be supported
...
-----------------------------------------

...but methods ODCIIndexExchangePartition(), ODCIIndexMergePartition() and ODCIIndexSplitPartition() are
not implemented. Isn't this feature supported in current Lucene Domain Index version?

Best regards, Alexander Savochkin.

Marcelo Ochoa said...

Hi Alexander:
Partitioning index is not supported with current implementation, at the time of latest available implementation I had tested the code with Oracle Standard Edition One which doesn't include partitioning support :(
I am starting evaluating this functionality but first, I need to implement some kind of parallel searching to collect the result of all partitions and get a new one.
This is in my TODO list, but I am working on that :)
Best regards, Marcelo.

David Webb said...

Marcelo,

I am try to using Lucene and get off of Oracle Text altogether. There are 2 functions of Oracle Text that I use a lot, SNIPPET and HIGHLIGHT.

Are those available in your implementation?

Thank you.

Marcelo Ochoa said...

Hi David:
I am working a new binary distribution based on Lucene 2.4 release.
This new version already included a lhighlight() ancillary operator and a preliminary pipeline function phighlight() for a general purpose highlighting operation on any query.
lhighligh() performs highlighting operation on the default column of the domain index. For example:

SELECT /*+ DOMAIN_INDEX_SORT */ lhighlight(1) txt,lscore(1) sc,subject FROM emails where lcontains(bodytext,'security OR mysql','subject:ASC',1)>0

But with phighlight you can do something like this:

SELECT * FROM
TABLE(highlight_view(
'EMAIL_LIDX',
'lucene OR mysql',
'subject,bodytext',
'select lscore(1) sc,e.* from eMails e where lcontains(bodytext,''security OR mysql'',''subject:ASC'',1)>0'
));

will highlight the columns subject and bodytext using Lucene Domain Index EMAIL_LIDX index information and the query 'lucene OR mysql'.
If you want to test this preliminary release just drop me an email or send an email using SourceForge help list.
Best regards, Marcelo.

Sirko said...

Hi Marcelo,
select /*+ DOMAIN_INDEX_SORT */ count(*) from t where lcontains(name,'man','id:ASC',1)>0
*
ERROR at line 1:
ORA-29902: error in executing ODCIIndexStart() routine
ORA-29532: Java call terminated by uncaught Java exception: java.lang.RuntimeException: field "id" does not appear to be indexed

Index was created as simple and as ExtraCols:id "id"

where is wrong?

Marcelo Ochoa said...

Hi Sirko:
Columns used in sort should be indexed as UN_TOKENIZED is they are of type VARCHAR2,CLOB or XMLType.
Columns of type NUMBER,TIMESTAMP,DATE, etc. are automatically considered as UN_TOKENIZED by DefaultUserDataStore class.
Also you can use LogLevel:INFO parameter when you create the index to see the exact SQL query performed by Lucene Domain Index to extract the columns.
Please use SF.net Discussion Forums for more questions.
Best regards, Marcelo.

Anonymous said...

Hello Marcelo,

first of all thank you for your effort in implementing lucene into Oracle. I tried to install it into my Oracle 10g database instance. Everything seems to be ok during installation process. As suggested by documentation (http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg#_1_2_What_is_Lucene_Domain_Ind_508262077334089) I've created a test table and an associated index as follow:



create table t1 (

f1 number,

f2 varchar2(200),

f3 varchar2(200),

f4 number unique);





create index it1 on t1(f2) indextype is lucene.LuceneIndex

parameters('Stemmer:English');


but the response is the following error:

create index it1 on t1(f2) indextype is lucene.LuceneIndex
parameters('Stemmer:English')
Error at line 1
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-29532: Java call terminated by uncaught Java exception: java.lang.ExceptionInInitializerError

Thanks in advance.

Best Regards,
Davide Raccagni

Marcelo Ochoa said...

Hi Davide:
A complete explanation of the error is at user dump destination file, once you get the error look for latest .trc file at this directory.
Best regards, Marcelo.

PD: Did you execute this?:
-- connected as sysdba
begin
dbms_java.grant_permission('SCOTT','SYS:java.util.logging.LoggingPermission', 'control', '' );
commit;
end;
/
replacing SCOTT with your DB user, except LUCENE which already have this grant.

Anonymous said...

Hi Marcelo,
thank you for your answer. Now the user has been properly granted. Looking at the trace file the relevant error seems to be:

*** 2009-03-25 14:54:35.771
*** ACTION NAME:() 2009-03-25 14:54:35.770
*** MODULE NAME:(TOAD 8.5.3.2) 2009-03-25 14:54:35.770
*** SERVICE NAME:(SYS$USERS) 2009-03-25 14:54:35.770
*** SESSION ID:(251.764) 2009-03-25 14:54:35.770
kwqalqt2: Warning !! queue table FAO_FAOTERM.IT3$QT missing
kwqalqt2: Warning !! queue table FAO_FAOTERM.IT4$QT missing
*** 2009-03-25 15:05:50.768
kwqalqt2: Warning !! queue table FAO_FAOTERM.IT3$QT missing
kwqalqt2: Warning !! queue table FAO_FAOTERM.IT4$QT missing
*** 2009-03-25 15:07:37.246
=====================
PARSING IN CURSOR #26 len=35 dep=0 uid=51 oct=42 lid=51 tim=1208974665279966 hv=4002676834 ad='38a478cc'
ALTER SESSION SET sql_trace = TRUE
END OF STMT
EXEC #26:c=0,e=405,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1208974665279956
*** 2009-03-25 15:07:48.486
=====================
PARSE ERROR #25:len=101 dep=0 uid=51 oct=1 lid=51 tim=1208974676256042 err=911
CREATE TABLE T2
(
F1 NUMBER,
F2 VARCHAR2(200 BYTE),
F3 VARCHAR2(200 BYTE),
F4 NUMBER
);
XCTEND rlbk=0, rd_only=1
=====================
PARSING IN CURSOR #20 len=99 dep=0 uid=51 oct=1 lid=51 tim=1208974676299362 hv=3570727389 ad='3a66669c'
CREATE TABLE T2
(
F1 NUMBER,
F2 VARCHAR2(200 BYTE),
F3 VARCHAR2(200 BYTE),
F4 NUMBER
)
END OF STMT
PARSE #20:c=10000,e=2585,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1208974676299347
=====================
PARSING IN CURSOR #8 len=198 dep=1 uid=0 oct=3 lid=0 tim=1208974676303373 hv=4125641360 ad='3fad7b80'
select obj#,type#,ctime,mtime,stime,status,dataobj#,flags,oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
END OF STMT
PARSE #8:c=0,e=3567,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676303361
EXEC #8:c=0,e=16761,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676320303
FETCH #8:c=0,e=57,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=1208974676320440
STAT #8 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=0 pw=0 time=65 us)'
STAT #8 id=2 cnt=0 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=0 pw=0 time=57 us)'
=====================
PARSING IN CURSOR #4 len=116 dep=1 uid=0 oct=3 lid=0 tim=1208974676323260 hv=854877822 ad='3fad5df4'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,o.dataobj#,o.flags from obj$ o where o.obj#=:1
END OF STMT
PARSE #4:c=0,e=2586,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676323250
EXEC #4:c=0,e=3456,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676326868
FETCH #4:c=0,e=38,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,tim=1208974676326985
STAT #4 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=2 pr=0 pw=0 time=45 us)'
STAT #4 id=2 cnt=0 pid=1 pos=1 obj=36 op='INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=0 pw=0 time=35 us)'
=====================
PARSING IN CURSOR #2 len=205 dep=1 uid=0 oct=2 lid=0 tim=1208974676329933 hv=3861105107 ad='38baf314'
insert into obj$(owner#,name,namespace,obj#,type#,ctime,mtime,stime,status,remoteowner,linkname,subname,dataobj#,flags,oid$,spare1,spare2)values(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16, :17)
END OF STMT
PARSE #2:c=10000,e=2781,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676329921
EXEC #2:c=0,e=3581,p=0,cr=1,cu=7,mis=1,r=1,dep=1,og=4,tim=1208974676333696
=====================
PARSING IN CURSOR #16 len=36 dep=1 uid=0 oct=3 lid=0 tim=1208974676336624 hv=1570213724 ad='3fabedb8'
select file# from file$ where ts#=:1
END OF STMT
PARSE #16:c=0,e=2567,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676336612
EXEC #16:c=10000,e=3341,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676340093
FETCH #16:c=0,e=96,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=4,tim=1208974676340289
FETCH #16:c=0,e=8,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=4,tim=1208974676340316
STAT #16 id=1 cnt=1 pid=0 pos=1 obj=17 op='TABLE ACCESS FULL FILE$ (cr=4 pr=0 pw=0 time=86 us)'
=====================
PARSING IN CURSOR #27 len=206 dep=1 uid=0 oct=3 lid=0 tim=1208974676343431 hv=2888850659 ad='3fac5ff0'
select type#,blocks,extents,minexts,maxexts,extsize,extpct,user#,iniexts,NVL(lists,65535),NVL(groups,65535),cachehint,hwmincr, NVL(spare1,0),NVL(scanhint,0) from seg$ where ts#=:1 and file#=:2 and block#=:3
END OF STMT
PARSE #27:c=0,e=2760,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676343419
EXEC #27:c=10000,e=3727,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676347324
FETCH #27:c=0,e=93,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=1208974676347498
STAT #27 id=1 cnt=0 pid=0 pos=1 obj=14 op='TABLE ACCESS CLUSTER SEG$ (cr=3 pr=0 pw=0 time=99 us)'
STAT #27 id=2 cnt=1 pid=1 pos=1 obj=9 op='INDEX UNIQUE SCAN I_FILE#_BLOCK# (cr=2 pr=0 pw=0 time=37 us)'
=====================
PARSING IN CURSOR #24 len=256 dep=1 uid=0 oct=2 lid=0 tim=1208974676350592 hv=2325978899 ad='3a172c2c'
insert into seg$ (file#,block#,type#,ts#,blocks,extents,minexts,maxexts,extsize,extpct,user#,iniexts,lists,groups,cachehint,bitmapranges,hwmincr, spare1, scanhint) values (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,0,:16,DECODE(:17,0,NULL,:17),:18)
END OF STMT
PARSE #24:c=0,e=2749,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676350582
EXEC #24:c=0,e=3271,p=0,cr=2,cu=1,mis=1,r=1,dep=1,og=4,tim=1208974676354036
=====================
PARSING IN CURSOR #4 len=494 dep=1 uid=0 oct=2 lid=0 tim=1208974676357299 hv=2283232350 ad='3a153e2c'
insert into tab$(obj#,ts#,file#,block#,bobj#,tab#,intcols,kernelcols,clucols,audit$,flags,pctfree$,pctused$,initrans,maxtrans,rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,analyzetime,samplesize,cols,property,degree,instances,dataobj#,avgspc_flb,flbcnt,trigflag,spare1,spare6)values(:1,:2,:3,:4,decode(:5,0,null,:5),decode(:6,0,null,:6),:7,:8,decode(:9,0,null,:9),:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,:20,:21,:22,:23,:24,:25,decode(:26,1,null,:26),decode(:27,1,null,:27),:28,:29,:30,:31,:32,:33)
END OF STMT
PARSE #4:c=10000,e=2862,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676357289
EXEC #4:c=0,e=4560,p=0,cr=3,cu=14,mis=1,r=1,dep=1,og=4,tim=1208974676362093
=====================
PARSING IN CURSOR #2 len=453 dep=1 uid=0 oct=2 lid=0 tim=1208974676365160 hv=224718466 ad='38890864'
insert into col$(obj#,name,intcol#,segcol#,type#,length,precision#,scale,null$,offset,fixedstorage,segcollength,deflength,default$,col#,property,charsetid,charsetform,spare1,spare2,spare3)values(:1,:2,:3,:4,:5,:6,decode(:5,182/*DTYIYM*/,:7,183/*DTYIDS*/,:7,decode(:7,0,null,:7)),decode(:5,2,decode(:8,-127/*MAXSB1MINAL*/,null,:8),178,:8,179,:8,180,:8,181,:8,182,:8,183,:8,231,:8,null),:9,0,:10,:11,decode(:12,0,null,:12),:13,:14,:15,:16,:17,:18,:19,:20)
END OF STMT
PARSE #2:c=0,e=2852,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676365148
EXEC #2:c=10000,e=3978,p=0,cr=2,cu=8,mis=1,r=1,dep=1,og=4,tim=1208974676369336
=====================
PARSING IN CURSOR #16 len=453 dep=1 uid=0 oct=2 lid=0 tim=1208974676369868 hv=224718466 ad='38890864'
insert into col$(obj#,name,intcol#,segcol#,type#,length,precision#,scale,null$,offset,fixedstorage,segcollength,deflength,default$,col#,property,charsetid,charsetform,spare1,spare2,spare3)values(:1,:2,:3,:4,:5,:6,decode(:5,182/*DTYIYM*/,:7,183/*DTYIDS*/,:7,decode(:7,0,null,:7)),decode(:5,2,decode(:8,-127/*MAXSB1MINAL*/,null,:8),178,:8,179,:8,180,:8,181,:8,182,:8,183,:8,231,:8,null),:9,0,:10,:11,decode(:12,0,null,:12),:13,:14,:15,:16,:17,:18,:19,:20)
END OF STMT
EXEC #16:c=0,e=288,p=0,cr=2,cu=8,mis=0,r=1,dep=1,og=4,tim=1208974676369862
=====================
PARSING IN CURSOR #27 len=453 dep=1 uid=0 oct=2 lid=0 tim=1208974676370286 hv=224718466 ad='38890864'
insert into col$(obj#,name,intcol#,segcol#,type#,length,precision#,scale,null$,offset,fixedstorage,segcollength,deflength,default$,col#,property,charsetid,charsetform,spare1,spare2,spare3)values(:1,:2,:3,:4,:5,:6,decode(:5,182/*DTYIYM*/,:7,183/*DTYIDS*/,:7,decode(:7,0,null,:7)),decode(:5,2,decode(:8,-127/*MAXSB1MINAL*/,null,:8),178,:8,179,:8,180,:8,181,:8,182,:8,183,:8,231,:8,null),:9,0,:10,:11,decode(:12,0,null,:12),:13,:14,:15,:16,:17,:18,:19,:20)
END OF STMT
EXEC #27:c=0,e=277,p=0,cr=2,cu=8,mis=0,r=1,dep=1,og=4,tim=1208974676370280
=====================
PARSING IN CURSOR #24 len=453 dep=1 uid=0 oct=2 lid=0 tim=1208974676370784 hv=224718466 ad='38890864'
insert into col$(obj#,name,intcol#,segcol#,type#,length,precision#,scale,null$,offset,fixedstorage,segcollength,deflength,default$,col#,property,charsetid,charsetform,spare1,spare2,spare3)values(:1,:2,:3,:4,:5,:6,decode(:5,182/*DTYIYM*/,:7,183/*DTYIDS*/,:7,decode(:7,0,null,:7)),decode(:5,2,decode(:8,-127/*MAXSB1MINAL*/,null,:8),178,:8,179,:8,180,:8,181,:8,182,:8,183,:8,231,:8,null),:9,0,:10,:11,decode(:12,0,null,:12),:13,:14,:15,:16,:17,:18,:19,:20)
END OF STMT
EXEC #24:c=0,e=346,p=0,cr=2,cu=10,mis=0,r=1,dep=1,og=4,tim=1208974676370779
=====================
PARSING IN CURSOR #25 len=150 dep=1 uid=0 oct=3 lid=0 tim=1208974676373404 hv=3988833529 ad='38becaa4'
select 1 from sys.streams$_prepare_ddl p where ((p.global_flag = 1 and :1 is null) or (p.global_flag = 0 and p.usrid = :2)) and rownum = 1
END OF STMT
PARSE #25:c=0,e=2450,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676373391
EXEC #25:c=10000,e=3334,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676376816
FETCH #25:c=0,e=42,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=4,tim=1208974676376912
STAT #25 id=1 cnt=0 pid=0 pos=1 obj=0 op='COUNT STOPKEY (cr=1 pr=0 pw=0 time=49 us)'
STAT #25 id=2 cnt=0 pid=1 pos=1 obj=624 op='INDEX FULL SCAN I_STREAMS_PREPARE_DDL (cr=1 pr=0 pw=0 time=42 us)'
XCTEND rlbk=0, rd_only=0
=====================
PARSING IN CURSOR #8 len=296 dep=1 uid=0 oct=6 lid=0 tim=1208974676380036 hv=2379717279 ad='3898b7a8'
update seg$ set type#=:4,blocks=:5,extents=:6,minexts=:7,maxexts=:8,extsize=:9,extpct=:10,user#=:11,iniexts=:12,lists=decode(:13, 65535, NULL, :13),groups=decode(:14, 65535, NULL, :14), cachehint=:15, hwmincr=:16, spare1=DECODE(:17,0,NULL,:17),scanhint=:18 where ts#=:1 and file#=:2 and block#=:3
END OF STMT
PARSE #8:c=0,e=2856,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1208974676380025
EXEC #8:c=0,e=4287,p=0,cr=5,cu=1,mis=1,r=1,dep=1,og=4,tim=1208974676384496
STAT #8 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE SEG$ (cr=5 pr=0 pw=0 time=369 us)'
STAT #8 id=2 cnt=1 pid=1 pos=1 obj=14 op='TABLE ACCESS CLUSTER SEG$ (cr=5 pr=0 pw=0 time=192 us)'
STAT #8 id=3 cnt=1 pid=2 pos=1 obj=9 op='INDEX UNIQUE SCAN I_FILE#_BLOCK# (cr=2 pr=0 pw=0 time=36 us)'
EXEC #20:c=60000,e=85430,p=0,cr=33,cu=69,mis=0,r=0,dep=0,og=1,tim=1208974676384890
=====================
PARSE ERROR #4:len=61 dep=0 uid=51 oct=9 lid=51 tim=1208974681159861 err=911
CREATE INDEX IT4 ON T2(F3)
INDEXTYPE IS LUCENE.LUCENEINDEX;
XCTEND rlbk=0, rd_only=1
=====================
PARSING IN CURSOR #16 len=210 dep=1 uid=0 oct=3 lid=0 tim=1208974681235610 hv=864012087 ad='3fa387dc'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #16:c=0,e=2973,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,tim=1208974681235598
EXEC #16:c=0,e=3197,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,tim=1208974681238985
FETCH #16:c=0,e=51,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=3,tim=1208974681239117
STAT #16 id=1 cnt=0 pid=0 pos=1 obj=255 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=2 pr=0 pw=0 time=59 us)'
STAT #16 id=2 cnt=0 pid=1 pos=1 obj=257 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=0 pw=0 time=51 us)'
=====================


Thanks in advance,
Davide

Marcelo Ochoa said...

Hi Davide:
I don't see anything about JVM problem on the .trc file.
Also you don't need to execute ALTER SESSION SET sql_trace = TRUE to get OJVM stack trace information in your .trc files.
Latest log information reflects information about JVM system's view and no more than that.
Lucene Domain Index shows information using Java logging package and look like:

Trace file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_13800.trc
Oracle Database 11g Release 11.1.0.6.0 - Production
ORACLE_HOME = /u01/app/oracle/product/11.1.0.6.0/db_1
System name: Linux
Node name: mochoa
Release: 2.6.27.19-desktop-1mnb
Version: #1 SMP Wed Mar 4 23:59:13 EST 2009
Machine: i686
Instance name: test
Redo thread mounted by this instance: 1
Oracle process number: 25
Unix process pid: 13800, image: oracle@mochoa


*** 2009-03-25 08:04:02.004
*** SESSION ID:(140.46664) 2009-03-25 08:04:02.004
*** CLIENT ID:() 2009-03-25 08:04:02.004
*** SERVICE NAME:(SYS$USERS) 2009-03-25 08:04:02.004
*** MODULE NAME:(JDeveloper) 2009-03-25 08:04:02.004
*** ACTION NAME:() 2009-03-25 08:04:02.004

Mar 25, 2009 8:04:01 AM org.apache.lucene.search.facets.ojvm.HitCounter ODCIInitialize
INFO: key '1' main category '-ALL-'
Mar 25, 2009 8:04:02 AM org.apache.lucene.search.facets.ojvm.HitCounter ODCIIterate
INFO: sub category 'text:(Ciencias naturales y formales)'
....
INFO logging is shown if you defined a parameter "LogLevel:INFO" when you create the index.
In the other hand could you start a discussion at SF Discussion forums?
Blog comments have no good support for threads :(
Best regards, Marcelo.

Vikas said...

Hi
I downloaded the OJVM binary lucene-ojvm-bin-10g-2.4.1.1.0 and we are trying to install that on our 10g database.

When we execute "ant install-ojvm "

we get the following set of errors
the pl/sql procedures prior to this work fine.


------------------------------------
[exec] Elapsed: 00:00:00.34
[exec] dbms_java.set_native_compiler_option('optimizerLoopPagedConversion','false');
[exec] *
[exec] ERROR at line 4:
[exec] ORA-06550: line 4, column 13:
[exec] PLS-00302: component 'SET_NATIVE_COMPILER_OPTION' must be declared
[exec] ORA-06550: line 4, column 3:
[exec] PL/SQL: Statement ignored
[exec]
[exec]
[exec] Elapsed: 00:00:00.00
[exec] Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
[exec] With the Partitioning, OLAP and Data Mining options

load-junit:
[java] arguments: '-nodefiner' '-v' '-s' '-g' 'public' '-u' 'lucene/LUCENE@gpdev04' 'lib/junit-3.8.2.jar'
[java] Error while connecting with oci8 driver to gpdev04: oracle.aurora.server.tools.loadjava.ToolsException: The JDBC OCI8 Driver is not installed properly, use -thin to try the JDBC Thin driver: /u01/app/oracle/product/10.2.0.2/db_1/lib/libocijdbc10.so: /u01/app/oracle/product/10.2.0.2/db_1/lib/libocijdbc10.so: cannot open shared object file: No such file or directory
[java] exiting : could not open connection

BUILD FAILED
/home/lucene-ojvm/build.xml:70: The following error occurred while executing this line:
/home/lucene-ojvm/loadjava.xml:44: Java returned: 1
------------------------

help appreciated

Anonymous said...

The 10g scripts seems to be using set_native_compiler_option which is a method available in 11g. Any idea?

Marcelo Ochoa said...

Hi Vikas:
You can ignore the exception of DBMS_COMPILER package, you are right is only for 11g and I'll fix the script into the next release.
Regarding to the JDBC OCI problem edit your LD_LIBRARY_PATH variable with:
export LD_LIBRARY_PATH=$ORACLE_HOME/lib32:$LD_LIBRARY_PATH This is known problem with 64 bits installations, see this Metalink Note 468153.1.
Further questions please use SF Forums instead blog comments, it have a better editor ;)
Best regards, Marcelo.

Nickname unavailable said...

Error while installing:
Hi, I am getting an error while running LUCENEDomainIndex.sql. More specifically at the lines

ALTER JAVA CLASS "org.apache.lucene.indexer.LuceneDomainContext" RESOLVE;

The errors are:
Warning: Java altered with compilation errors.

Also when i look at dba_errors
I see a bunch of java classes unresolved.
like
ORA-29521: referenced name oracle/xml/parser/v2/SAXParser could not be found ORA-29534: referenced object LUCENE2.org/apache/lucene/store/OJVMDirectory could not be resolved etc.
Any idea what could be going wrong.
Thanks!!

Harpreet Chopra said...

Hi Marcelo,

We have create a LuceneDomain index containing 2 million rows of data (each row containing about 100 chars worth of data).

I have also ncomped the classes. However while executing queries generating a large volume of results (> 1 million), we are getting an out of mem error.

How should we proceed to remedy this. We are using Oracle 10g (rel 2).

Thanks
Harpreet

Marcelo Ochoa said...

@Nickname unavailable:
oracle/xml/parser/v2/SAXParser is system class installed when you choose OJVM option on Oracle 10g/11g.
If this class can not be resolved may be was a problem during the installation of the OJVM feature.
Also an small SGA configuration hits problems during class compilation.
@Harpreet
If you hit an OOM Java exception is due to an small SGA configuration.
Also you can use lcontains() pagination operator (rownum:[n TO m]) to reduce how many Lucene hits lcontains operator will return to the RDBMS optimizer.
Note that if you paginate your 1M rows result using standard Top-N syntax, Lucene Domain Index needs to fetch 1M rowid to return to the optimizer which then it will discard based on the Top-N pagination information.
See the section 3.4.3 Pagination at Lucene Domain Index documentation.
Please for further questions use Source Forge Help forums :)
Best regards, Marcelo.

Anonymous said...

Hai marcelo...
where luceneindex oracle storage????????????

Marcelo Ochoa said...

Hi Anonymous:
What do you mean with Lucene Domain Index storage?
If you mean how Lucene inverted index is stored the answer is there an specific table named yourIndexName$T which have several columns one of these columns is of type BLOB.
In this BLOB the lucene file are stored.
Using Oracle 11g FILESYSTEM_LIKE_LOGGING BLOB option you will have a performance on Blobs similar to a filesystem, but with additional features like transactions, compression and encription.
More on this, in a RAC enviroment, BLOBs are available to all nodes, so operation on Lucene Domain Index is like an infraestructure of Lucene over SAN/NAS storage with 0 cost of configration and management.
Best regards, Marcelo.

Anonymous said...

hi marcelo! I am a student of computer engineering . I'm doing an university project in italy and I'm going crazy to configure Lucene with Oracle.



My goal is to run Lucene in Oracle JVM.
How Do i run lucene in Oracle ojvm?

i need to edit the file build.properties indicating the parameters of my Oracle DB.

I don't found build.properties file
i found build.xml but i don't belive that build.xml is like buld.properties

I installed Ant to create the environment variable ANT_HOME and changing the path.
Ant works correctly.

My problems are:

1) I do not understand how and where to modify the build.properties


I'm working with Windows XP.
The version installed on my computer is free distribution Oracle 10g Express.

thaks marcelo

my e-mail is weareone@hotmail.it
i don't have a google account

Anonymous said...

Marcelo,

is "Lucene domain index" compatible with Oracle 10g R1 (10.1.0.5.0)?

Marcelo Ochoa said...

Hi:
I never tested LDI on Oracle 10gR1.
But 10gR1 is a JDK1.4 compatible runtime so it should be run.
May a problem could be changes on the ODCI API (extensible indexing).
Best regards, Marcelo.

Sigbert said...

Hi Marcelo,
We are trying to do a POC on using Lucene Domain Index on Oracle 10G.we were able to install lucene successfully on DB. Followed the steps and examples in https://docs.google.com/Doc?docid=0AVfwyTi0k18ZZGRndzdzanBfNTRmZ2o5a2c&pli=1.
Eg:1 - Created a table t1 having columns F1(Num),F2(Var),F3(Var),F4(Num) with domain index it1(f2).
create index it1 on t1(f2) indextype is system.LuceneIndex
parameters('Stemmer:English;FormatCols:F2(zzzzzzzzzzzzzzz),F3(00.00);ExtraCols:F3');

However when i run this query
select lscore(1),f2 from t1 where lcontains(f1, 'F2:ravi','f1:ASC',1) > 0; the result is no rows.
Question 1 - When the first parameter in lcontains is F1,the query works but when i use F2 it gives me below error "ORA-29902: error in executing ODCIIndexStart() routine
ORA-00932: inconsistent datatypes: expected an OUT argument at position 1 that is an instance of a user defined Java class convertible to an Oracle type got an object that could not be converted".
Question 2- Why does it return 0 rows when table has 2 rows in it.
Would really appreciate if you can help us out,we are almost there,yet so far.

You can mail me siggys198127@gmail.com if required.

Thanks,
Sigbert

Marcelo Ochoa said...

Hi Sigbert:
If you use FormatCols:F2(zzzzzzzzzzzzzzz),F3(00.00);ExtraCols:F3');
means that F2 columns is left padding with 'z' character, so to get a positive hit lcontains should be:
select lscore(1),f2 from t1 where lcontains(f2, 'F2:zzzzzzzzzzzravi','f1:ASC',1) > 0;
this kind of left padding is for character codes such as postal codes not for general purpose texts.
On the other hand Do you have the .trc file generated for the error ORA-29902?
Best regards, Marcelo.

PD: if you other question just drop me an email to marcelo.ochoa gmail.

Sigbert said...

Hi Marcelo,
Thanks for the quick turnaround.Really appreciate this.Have few doubts/queries which i have mailed on your gmail account.
Regards,
Sigbert

LazyDBA said...

Hi Marcelo,

We are working on lucene POC and installed Lucene domain index successfully. I did try below example
-----------------
1 create table t1 (
2 f1 number,
3 f2 varchar2(200),
4 f3 varchar2(200),
5 f4 number unique);
6
7 create index it1 on t1(f2) indextype is lucene.LuceneIndex
8 parameters('Analyzer:org.apache.lucene.analysis.SimpleAnalyzer');
-------------------------

It created 2 tables.IT1$QT, IT1$T
Can you please help me with description of these 2 table. Any link or doc which gives details explanation will also work.

Marcelo Ochoa said...

@LazyDBA
The table IT1$T is the storage for Lucene inverted index as BLOB using OJVMDirectory.
The IT1$QT is the table associated to the Oracle AQ IT1$Q associated to the index for storing DML changes on the table.
For further comments please refer to the Forums at Sf.net and Docs.
Best regards, Marcelo.

Anonymous said...

Hi Marcelo,

first of all I would like to thank you for your great job getting Lucene work in Oracle db.

I have a problem creating an index on multiple tables with multiple columns. When one column in my table has null values the creation of the index fails with:

Exception in thread "Root Thread" java.lang.NullPointerException
at org.apache.lucene.indexer.DefaultUserDataStore.valueExtractor(DefaultUserDataStore.java)
at org.apache.lucene.indexer.DefaultUserDataStore.getDocument(DefaultUserDataStore.java)
at org.apache.lucene.indexer.TableIndexer.index(TableIndexer.java)
at org.apache.lucene.indexer.LuceneDomainIndex.ODCIIndexCreate(LuceneDomainIndex.java:726)

For example:

create index idx_1 on t1(f1) indextype is lucene.LuceneIndex
parameters('ExtraCols:L$MT.f2,t2.f3;ExtraTabs:t2;WhereCondition:t2.f1 = L$MT.f1');

fails with the error message above.

The statement:

create index idx_1 on t1(f1) indextype is lucene.LuceneIndex
parameters('ExtraCols:L$MT.f2,nvl(t2.f3,0) "t2";ExtraTabs:t2;WhereCondition:t2.f1 = L$MT.f1');

works.
Is it right, that only not null columns can be indexed? That would be a great problem for us.

Greetings from Germany,
Dirk

Marcelo Ochoa said...

Hi Dirk:
Which table definition are you using at the example?
A simple test case with null works fine, for example:

SQL> create table t1(a varchar2(10),b number);
Table created.
SQL> insert into t1 values ('aa',1);
1 row created.
SQL> insert into t1 values ('bb',null);
1 row created.
SQL> commit;
Commit complete.
SQL> create index it1 on t1(a) indextype is
2 Lucene.LuceneIndex parameters('ExtraCols:b;LogLevel:ALL');
Index created.

Best regards, Marcelo.
PD: Please post comments, bugs, suggestion at SF.net forums

Andrea said...

Hi Marcelo, We have been using OJVM Lucene in our production environment since 2009. Recently we are getting errors from the Lucene domain indexes. At client level we get error ORA-03113 and in the alert log the following lines are logged:
ORA-07445: exception encountered: core dump [kglpim()+234] [SIGSEGV] [Address not mapped to object] [0x78] [] []
ORA-29532: Java call terminated by uncaught Java exception: java.lang.NullPointerException
ORA-06512: at "LUCENE.LUCENEDOMAININDEX", line 13
ORA-06512: at "LUCENE.LUCENEDOMAININDEX", line 21


The query that produces that error is:
SELECT FT.CONCEPT_ID, FT.TERM_TEXT, FT.LANG_ID, FCS.CONC_SUBJ_TEXT, FC.DATABASE_ID AS LOCATION FROM FT_CONCEPTS FC, FT_TERMS FT, FT_CONC_VS_CONC_SUBJ FCVCS, FT_CONCEPT_SUBJECTS FCS , FT_TERM_DEFINITIONS FTD WHERE FC.CONCEPT_ID = FT.CONCEPT_ID AND FC.CONCEPT_ID = FCVCS.CONCEPT_ID AND FCS.CONC_SUBJ_ID = FCVCS.CONC_SUBJ_ID AND FT.TERM_ID = FTD.TERM_ID(+) AND FC.DATABASE_ID iN (4,1) AND LOWER(FT.LANG_ID) = LOWER('en') AND FT.CONCEPT_REF_ID = '0' AND lcontains(FTD.TERM_DEFINITION_TEXT, '"flowering"~0.7', 1) > 0 ORDER BY NLSSORT(Upper(FT.TERM_TEXT), 'NLS_SORT=generic_m'), FC.CONCEPT_ID

Would you be so kind to help us debugging this issue?
Thank you
Andrea

Anonymous said...

Hi Marcelo, We have been using OJVM Lucene in our production environment since 2009. Recently we are getting errors from the Lucene domain indexes. At client level we get error ORA-03113 and in the alert log the following lines are logged:
ORA-07445: exception encountered: core dump [kglpim()+234] [SIGSEGV] [Address not mapped to object] [0x78] [] []
ORA-29532: Java call terminated by uncaught Java exception: java.lang.NullPointerException
ORA-06512: at "LUCENE.LUCENEDOMAININDEX", line 13
ORA-06512: at "LUCENE.LUCENEDOMAININDEX", line 21


The query that produces that error is:
SELECT FT.CONCEPT_ID, FT.TERM_TEXT, FT.LANG_ID, FCS.CONC_SUBJ_TEXT, FC.DATABASE_ID AS LOCATION FROM FT_CONCEPTS FC, FT_TERMS FT, FT_CONC_VS_CONC_SUBJ FCVCS, FT_CONCEPT_SUBJECTS FCS , FT_TERM_DEFINITIONS FTD WHERE FC.CONCEPT_ID = FT.CONCEPT_ID AND FC.CONCEPT_ID = FCVCS.CONCEPT_ID AND FCS.CONC_SUBJ_ID = FCVCS.CONC_SUBJ_ID AND FT.TERM_ID = FTD.TERM_ID(+) AND FC.DATABASE_ID iN (4,1) AND LOWER(FT.LANG_ID) = LOWER('en') AND FT.CONCEPT_REF_ID = '0' AND lcontains(FTD.TERM_DEFINITION_TEXT, '"flowering"~0.7', 1) > 0 ORDER BY NLSSORT(Upper(FT.TERM_TEXT), 'NLS_SORT=generic_m'), FC.CONCEPT_ID

Would you be so kind to help us debugging this issue?
Thank you
Andrea

Marcelo Ochoa said...

Hi Andrea:
Usually at .trc file there is more info to see which is exactly the error which throws the exception.
If you have the complete file just send me a direct mail to my google account marcelo.ochoa gmail and I can help you to realize which is the problem.
Best regards, Marcelo.