Index: 3rdParty_sources/lucene/org/apache/lucene/LucenePackage.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/LucenePackage.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/LucenePackage.java 17 Aug 2012 14:55:14 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/LucenePackage.java 16 Dec 2014 11:32:21 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/Analyzer.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/Analyzer.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/Analyzer.java 17 Aug 2012 14:55:08 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/Analyzer.java 16 Dec 2014 11:31:58 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,65 +17,458 @@
* limitations under the License.
*/
-import java.io.Reader;
+import org.apache.lucene.store.AlreadyClosedException;
+import org.apache.lucene.util.CloseableThreadLocal;
+import org.apache.lucene.util.Version;
+
+import java.io.Closeable;
import java.io.IOException;
+import java.io.Reader;
+import java.util.HashMap;
+import java.util.Map;
-/** An Analyzer builds TokenStreams, which analyze text. It thus represents a
- * policy for extracting index terms from text.
- *
- * Typical implementations first build a Tokenizer, which breaks the stream of
- * characters from the Reader into raw Tokens. One or more TokenFilters may
- * then be applied to the output of the Tokenizer.
+/**
+ * An Analyzer builds TokenStreams, which analyze text. It thus represents a
+ * policy for extracting index terms from text.
+ *
+ * In order to define what analysis is done, subclasses must define their
+ * {@link TokenStreamComponents TokenStreamComponents} in {@link #createComponents(String, Reader)}.
+ * The components are then reused in each call to {@link #tokenStream(String, Reader)}.
+ *
+ * For some concrete implementations bundled with Lucene, look in the analysis modules:
+ *
storedValue = new CloseableThreadLocal<>();
+
+ /**
+ * Create a new Analyzer, reusing the same set of components per-thread
+ * across calls to {@link #tokenStream(String, Reader)}.
*/
- public abstract TokenStream tokenStream(String fieldName, Reader reader);
+ public Analyzer() {
+ this(GLOBAL_REUSE_STRATEGY);
+ }
- /** Creates a TokenStream that is allowed to be re-used
- * from the previous time that the same thread called
- * this method. Callers that do not need to use more
- * than one TokenStream at the same time from this
- * analyzer should use this method for better
- * performance.
+ /**
+ * Expert: create a new Analyzer with a custom {@link ReuseStrategy}.
+ *
+ * NOTE: if you just want to reuse on a per-field basis, its easier to
+ * use a subclass of {@link AnalyzerWrapper} such as
+ *
+ * PerFieldAnalyerWrapper instead.
*/
- public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
- return tokenStream(fieldName, reader);
+ public Analyzer(ReuseStrategy reuseStrategy) {
+ this.reuseStrategy = reuseStrategy;
}
- private ThreadLocal tokenStreams = new ThreadLocal();
+ /**
+ * Creates a new {@link TokenStreamComponents} instance for this analyzer.
+ *
+ * @param fieldName
+ * the name of the fields content passed to the
+ * {@link TokenStreamComponents} sink as a reader
+ * @param reader
+ * the reader passed to the {@link Tokenizer} constructor
+ * @return the {@link TokenStreamComponents} for this analyzer.
+ */
+ protected abstract TokenStreamComponents createComponents(String fieldName,
+ Reader reader);
- /** Used by Analyzers that implement reusableTokenStream
- * to retrieve previously saved TokenStreams for re-use
- * by the same thread. */
- protected Object getPreviousTokenStream() {
- return tokenStreams.get();
+ /**
+ * Returns a TokenStream suitable for fieldName
, tokenizing
+ * the contents of reader
.
+ *
+ * This method uses {@link #createComponents(String, Reader)} to obtain an
+ * instance of {@link TokenStreamComponents}. It returns the sink of the
+ * components and stores the components internally. Subsequent calls to this
+ * method will reuse the previously stored components after resetting them
+ * through {@link TokenStreamComponents#setReader(Reader)}.
+ *
+ * NOTE: After calling this method, the consumer must follow the
+ * workflow described in {@link TokenStream} to properly consume its contents.
+ * See the {@link org.apache.lucene.analysis Analysis package documentation} for
+ * some examples demonstrating this.
+ *
+ * NOTE: If your data is available as a {@code String}, use
+ * {@link #tokenStream(String, String)} which reuses a {@code StringReader}-like
+ * instance internally.
+ *
+ * @param fieldName the name of the field the created TokenStream is used for
+ * @param reader the reader the streams source reads from
+ * @return TokenStream for iterating the analyzed content of reader
+ * @throws AlreadyClosedException if the Analyzer is closed.
+ * @throws IOException if an i/o error occurs.
+ * @see #tokenStream(String, String)
+ */
+ public final TokenStream tokenStream(final String fieldName,
+ final Reader reader) throws IOException {
+ TokenStreamComponents components = reuseStrategy.getReusableComponents(this, fieldName);
+ final Reader r = initReader(fieldName, reader);
+ if (components == null) {
+ components = createComponents(fieldName, r);
+ reuseStrategy.setReusableComponents(this, fieldName, components);
+ } else {
+ components.setReader(r);
+ }
+ return components.getTokenStream();
}
-
- /** Used by Analyzers that implement reusableTokenStream
- * to save a TokenStream for later re-use by the same
- * thread. */
- protected void setPreviousTokenStream(Object obj) {
- tokenStreams.set(obj);
+
+ /**
+ * Returns a TokenStream suitable for fieldName
, tokenizing
+ * the contents of text
.
+ *
+ * This method uses {@link #createComponents(String, Reader)} to obtain an
+ * instance of {@link TokenStreamComponents}. It returns the sink of the
+ * components and stores the components internally. Subsequent calls to this
+ * method will reuse the previously stored components after resetting them
+ * through {@link TokenStreamComponents#setReader(Reader)}.
+ *
+ * NOTE: After calling this method, the consumer must follow the
+ * workflow described in {@link TokenStream} to properly consume its contents.
+ * See the {@link org.apache.lucene.analysis Analysis package documentation} for
+ * some examples demonstrating this.
+ *
+ * @param fieldName the name of the field the created TokenStream is used for
+ * @param text the String the streams source reads from
+ * @return TokenStream for iterating the analyzed content of reader
+ * @throws AlreadyClosedException if the Analyzer is closed.
+ * @throws IOException if an i/o error occurs (may rarely happen for strings).
+ * @see #tokenStream(String, Reader)
+ */
+ public final TokenStream tokenStream(final String fieldName, final String text) throws IOException {
+ TokenStreamComponents components = reuseStrategy.getReusableComponents(this, fieldName);
+ @SuppressWarnings("resource") final ReusableStringReader strReader =
+ (components == null || components.reusableStringReader == null) ?
+ new ReusableStringReader() : components.reusableStringReader;
+ strReader.setValue(text);
+ final Reader r = initReader(fieldName, strReader);
+ if (components == null) {
+ components = createComponents(fieldName, r);
+ reuseStrategy.setReusableComponents(this, fieldName, components);
+ } else {
+ components.setReader(r);
+ }
+ components.reusableStringReader = strReader;
+ return components.getTokenStream();
}
+
+ /**
+ * Override this if you want to add a CharFilter chain.
+ *
+ * The default implementation returns reader
+ * unchanged.
+ *
+ * @param fieldName IndexableField name being indexed
+ * @param reader original Reader
+ * @return reader, optionally decorated with CharFilter(s)
+ */
+ protected Reader initReader(String fieldName, Reader reader) {
+ return reader;
+ }
-
/**
- * Invoked before indexing a Fieldable instance if
+ * Invoked before indexing a IndexableField instance if
* terms have already been added to that field. This allows custom
* analyzers to place an automatic position increment gap between
- * Fieldable instances using the same field name. The default value
+ * IndexbleField instances using the same field name. The default value
* position increment gap is 0. With a 0 position increment gap and
* the typical default token position increment of 1, all terms in a field,
- * including across Fieldable instances, are in successive positions, allowing
- * exact PhraseQuery matches, for instance, across Fieldable instance boundaries.
+ * including across IndexableField instances, are in successive positions, allowing
+ * exact PhraseQuery matches, for instance, across IndexableField instance boundaries.
*
- * @param fieldName Fieldable name being indexed.
- * @return position increment gap, added to the next token emitted from {@link #tokenStream(String,Reader)}
+ * @param fieldName IndexableField name being indexed.
+ * @return position increment gap, added to the next token emitted from {@link #tokenStream(String,Reader)}.
+ * This value must be {@code >= 0}.
*/
- public int getPositionIncrementGap(String fieldName)
- {
+ public int getPositionIncrementGap(String fieldName) {
return 0;
}
+
+ /**
+ * Just like {@link #getPositionIncrementGap}, except for
+ * Token offsets instead. By default this returns 1.
+ * This method is only called if the field
+ * produced at least one token for indexing.
+ *
+ * @param fieldName the field just indexed
+ * @return offset gap, added to the next token emitted from {@link #tokenStream(String,Reader)}.
+ * This value must be {@code >= 0}.
+ */
+ public int getOffsetGap(String fieldName) {
+ return 1;
+ }
+
+ /**
+ * Returns the used {@link ReuseStrategy}.
+ */
+ public final ReuseStrategy getReuseStrategy() {
+ return reuseStrategy;
+ }
+
+ /**
+ * Set the version of Lucene this analyzer should mimic the behavior for for analysis.
+ */
+ public void setVersion(Version v) {
+ version = v; // TODO: make write once?
+ }
+
+ /**
+ * Return the version of Lucene this analyzer will mimic the behavior of for analysis.
+ */
+ public Version getVersion() {
+ return version;
+ }
+
+ /** Frees persistent resources used by this Analyzer */
+ @Override
+ public void close() {
+ if (storedValue != null) {
+ storedValue.close();
+ storedValue = null;
+ }
+ }
+
+ /**
+ * This class encapsulates the outer components of a token stream. It provides
+ * access to the source ({@link Tokenizer}) and the outer end (sink), an
+ * instance of {@link TokenFilter} which also serves as the
+ * {@link TokenStream} returned by
+ * {@link Analyzer#tokenStream(String, Reader)}.
+ */
+ public static class TokenStreamComponents {
+ /**
+ * Original source of the tokens.
+ */
+ protected final Tokenizer source;
+ /**
+ * Sink tokenstream, such as the outer tokenfilter decorating
+ * the chain. This can be the source if there are no filters.
+ */
+ protected final TokenStream sink;
+
+ /** Internal cache only used by {@link Analyzer#tokenStream(String, String)}. */
+ transient ReusableStringReader reusableStringReader;
+
+ /**
+ * Creates a new {@link TokenStreamComponents} instance.
+ *
+ * @param source
+ * the analyzer's tokenizer
+ * @param result
+ * the analyzer's resulting token stream
+ */
+ public TokenStreamComponents(final Tokenizer source,
+ final TokenStream result) {
+ this.source = source;
+ this.sink = result;
+ }
+
+ /**
+ * Creates a new {@link TokenStreamComponents} instance.
+ *
+ * @param source
+ * the analyzer's tokenizer
+ */
+ public TokenStreamComponents(final Tokenizer source) {
+ this.source = source;
+ this.sink = source;
+ }
+
+ /**
+ * Resets the encapsulated components with the given reader. If the components
+ * cannot be reset, an Exception should be thrown.
+ *
+ * @param reader
+ * a reader to reset the source component
+ * @throws IOException
+ * if the component's reset method throws an {@link IOException}
+ */
+ protected void setReader(final Reader reader) throws IOException {
+ source.setReader(reader);
+ }
+
+ /**
+ * Returns the sink {@link TokenStream}
+ *
+ * @return the sink {@link TokenStream}
+ */
+ public TokenStream getTokenStream() {
+ return sink;
+ }
+
+ /**
+ * Returns the component's {@link Tokenizer}
+ *
+ * @return Component's {@link Tokenizer}
+ */
+ public Tokenizer getTokenizer() {
+ return source;
+ }
+ }
+
+ /**
+ * Strategy defining how TokenStreamComponents are reused per call to
+ * {@link Analyzer#tokenStream(String, java.io.Reader)}.
+ */
+ public static abstract class ReuseStrategy {
+
+ /** Sole constructor. (For invocation by subclass constructors, typically implicit.) */
+ public ReuseStrategy() {}
+
+ /**
+ * Gets the reusable TokenStreamComponents for the field with the given name.
+ *
+ * @param analyzer Analyzer from which to get the reused components. Use
+ * {@link #getStoredValue(Analyzer)} and {@link #setStoredValue(Analyzer, Object)}
+ * to access the data on the Analyzer.
+ * @param fieldName Name of the field whose reusable TokenStreamComponents
+ * are to be retrieved
+ * @return Reusable TokenStreamComponents for the field, or {@code null}
+ * if there was no previous components for the field
+ */
+ public abstract TokenStreamComponents getReusableComponents(Analyzer analyzer, String fieldName);
+
+ /**
+ * Stores the given TokenStreamComponents as the reusable components for the
+ * field with the give name.
+ *
+ * @param fieldName Name of the field whose TokenStreamComponents are being set
+ * @param components TokenStreamComponents which are to be reused for the field
+ */
+ public abstract void setReusableComponents(Analyzer analyzer, String fieldName, TokenStreamComponents components);
+
+ /**
+ * Returns the currently stored value.
+ *
+ * @return Currently stored value or {@code null} if no value is stored
+ * @throws AlreadyClosedException if the Analyzer is closed.
+ */
+ protected final Object getStoredValue(Analyzer analyzer) {
+ if (analyzer.storedValue == null) {
+ throw new AlreadyClosedException("this Analyzer is closed");
+ }
+ return analyzer.storedValue.get();
+ }
+
+ /**
+ * Sets the stored value.
+ *
+ * @param storedValue Value to store
+ * @throws AlreadyClosedException if the Analyzer is closed.
+ */
+ protected final void setStoredValue(Analyzer analyzer, Object storedValue) {
+ if (analyzer.storedValue == null) {
+ throw new AlreadyClosedException("this Analyzer is closed");
+ }
+ analyzer.storedValue.set(storedValue);
+ }
+
+ }
+
+ /**
+ * A predefined {@link ReuseStrategy} that reuses the same components for
+ * every field.
+ */
+ public static final ReuseStrategy GLOBAL_REUSE_STRATEGY = new GlobalReuseStrategy();
+
+ /**
+ * Implementation of {@link ReuseStrategy} that reuses the same components for
+ * every field.
+ * @deprecated This implementation class will be hidden in Lucene 5.0.
+ * Use {@link Analyzer#GLOBAL_REUSE_STRATEGY} instead!
+ */
+ @Deprecated
+ public final static class GlobalReuseStrategy extends ReuseStrategy {
+
+ /** Sole constructor. (For invocation by subclass constructors, typically implicit.)
+ * @deprecated Don't create instances of this class, use {@link Analyzer#GLOBAL_REUSE_STRATEGY} */
+ @Deprecated
+ public GlobalReuseStrategy() {}
+
+ @Override
+ public TokenStreamComponents getReusableComponents(Analyzer analyzer, String fieldName) {
+ return (TokenStreamComponents) getStoredValue(analyzer);
+ }
+
+ @Override
+ public void setReusableComponents(Analyzer analyzer, String fieldName, TokenStreamComponents components) {
+ setStoredValue(analyzer, components);
+ }
+ }
+
+ /**
+ * A predefined {@link ReuseStrategy} that reuses components per-field by
+ * maintaining a Map of TokenStreamComponent per field name.
+ */
+ public static final ReuseStrategy PER_FIELD_REUSE_STRATEGY = new PerFieldReuseStrategy();
+
+ /**
+ * Implementation of {@link ReuseStrategy} that reuses components per-field by
+ * maintaining a Map of TokenStreamComponent per field name.
+ * @deprecated This implementation class will be hidden in Lucene 5.0.
+ * Use {@link Analyzer#PER_FIELD_REUSE_STRATEGY} instead!
+ */
+ @Deprecated
+ public static class PerFieldReuseStrategy extends ReuseStrategy {
+
+ /** Sole constructor. (For invocation by subclass constructors, typically implicit.)
+ * @deprecated Don't create instances of this class, use {@link Analyzer#PER_FIELD_REUSE_STRATEGY} */
+ @Deprecated
+ public PerFieldReuseStrategy() {}
+
+ @SuppressWarnings("unchecked")
+ @Override
+ public TokenStreamComponents getReusableComponents(Analyzer analyzer, String fieldName) {
+ Map componentsPerField = (Map) getStoredValue(analyzer);
+ return componentsPerField != null ? componentsPerField.get(fieldName) : null;
+ }
+
+ @SuppressWarnings("unchecked")
+ @Override
+ public void setReusableComponents(Analyzer analyzer, String fieldName, TokenStreamComponents components) {
+ Map componentsPerField = (Map) getStoredValue(analyzer);
+ if (componentsPerField == null) {
+ componentsPerField = new HashMap<>();
+ setStoredValue(analyzer, componentsPerField);
+ }
+ componentsPerField.put(fieldName, components);
+ }
+ }
+
}
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/AnalyzerWrapper.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/CachingTokenFilter.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/CachingTokenFilter.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/CachingTokenFilter.java 17 Aug 2012 14:55:08 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/CachingTokenFilter.java 16 Dec 2014 11:31:58 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -22,52 +22,77 @@
import java.util.LinkedList;
import java.util.List;
+import org.apache.lucene.util.AttributeSource;
+
/**
- * This class can be used if the Tokens of a TokenStream
+ * This class can be used if the token attributes of a TokenStream
* are intended to be consumed more than once. It caches
- * all Tokens locally in a List.
+ * all token attribute states locally in a List.
*
- * CachingTokenFilter implements the optional method
+ * CachingTokenFilter implements the optional method
* {@link TokenStream#reset()}, which repositions the
* stream to the first Token.
- *
*/
-public class CachingTokenFilter extends TokenFilter {
- private List cache;
- private Iterator iterator;
+public final class CachingTokenFilter extends TokenFilter {
+ private List cache = null;
+ private Iterator iterator = null;
+ private AttributeSource.State finalState;
+ /**
+ * Create a new CachingTokenFilter around input
,
+ * caching its token attributes, which can be replayed again
+ * after a call to {@link #reset()}.
+ */
public CachingTokenFilter(TokenStream input) {
super(input);
}
- public Token next(final Token reusableToken) throws IOException {
- assert reusableToken != null;
+ @Override
+ public final boolean incrementToken() throws IOException {
if (cache == null) {
// fill cache lazily
- cache = new LinkedList();
- fillCache(reusableToken);
+ cache = new LinkedList<>();
+ fillCache();
iterator = cache.iterator();
}
if (!iterator.hasNext()) {
- // the cache is exhausted, return null
- return null;
+ // the cache is exhausted, return false
+ return false;
}
// Since the TokenFilter can be reset, the tokens need to be preserved as immutable.
- Token nextToken = (Token) iterator.next();
- return (Token) nextToken.clone();
+ restoreState(iterator.next());
+ return true;
}
- public void reset() throws IOException {
+ @Override
+ public final void end() {
+ if (finalState != null) {
+ restoreState(finalState);
+ }
+ }
+
+ /**
+ * Rewinds the iterator to the beginning of the cached list.
+ *
+ * Note that this does not call reset() on the wrapped tokenstream ever, even
+ * the first time. You should reset() the inner tokenstream before wrapping
+ * it with CachingTokenFilter.
+ */
+ @Override
+ public void reset() {
if(cache != null) {
- iterator = cache.iterator();
+ iterator = cache.iterator();
}
}
- private void fillCache(final Token reusableToken) throws IOException {
- for (Token nextToken = input.next(reusableToken); nextToken != null; nextToken = input.next(reusableToken)) {
- cache.add(nextToken.clone());
+ private void fillCache() throws IOException {
+ while(input.incrementToken()) {
+ cache.add(captureState());
}
+ // capture final state
+ input.end();
+ finalState = captureState();
}
}
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/CharArraySet.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/CharFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/CharTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/DelegatingAnalyzerWrapper.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ISOLatin1AccentFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/KeywordAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/KeywordTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/LengthFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/LetterTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/LowerCaseFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/LowerCaseTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/NumericTokenStream.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/PerFieldAnalyzerWrapper.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/PorterStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/PorterStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ReusableStringReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/SimpleAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/SinkTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/StopAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/StopFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/TeeTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/Token.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/Token.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/Token.java 17 Aug 2012 14:55:07 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/Token.java 16 Dec 2014 11:31:57 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,165 +17,63 @@
* limitations under the License.
*/
-import org.apache.lucene.index.Payload;
-import org.apache.lucene.index.TermPositions; // for javadoc
-import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+import org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl;
+import org.apache.lucene.analysis.tokenattributes.PayloadAttribute;
+import org.apache.lucene.index.DocsAndPositionsEnum; // for javadoc
+import org.apache.lucene.util.Attribute;
+import org.apache.lucene.util.AttributeFactory;
+import org.apache.lucene.util.AttributeImpl;
+import org.apache.lucene.util.AttributeReflector;
+import org.apache.lucene.util.BytesRef;
-/** A Token is an occurrence of a term from the text of a field. It consists of
+/**
+ A Token is an occurrence of a term from the text of a field. It consists of
a term's text, the start and end offset of the term in the text of the field,
and a type string.
The start and end offsets permit applications to re-associate a token with
its source text, e.g., to display highlighted query terms in a document
- browser, or to show matching text fragments in a KWIC (KeyWord In Context)
+ browser, or to show matching text fragments in a KWIC
display, etc.
The type is a string, assigned by a lexical analyzer
(a.k.a. tokenizer), naming the lexical or syntactic class that the token
belongs to. For example an end of sentence marker token might be implemented
with type "eos". The default token type is "word".
- A Token can optionally have metadata (a.k.a. Payload) in the form of a variable
- length byte array. Use {@link TermPositions#getPayloadLength()} and
- {@link TermPositions#getPayload(byte[], int)} to retrieve the payloads from the index.
+ A Token can optionally have metadata (a.k.a. payload) in the form of a variable
+ length byte array. Use {@link DocsAndPositionsEnum#getPayload()} to retrieve the
+ payloads from the index.
-
- WARNING: The status of the Payloads feature is experimental.
- The APIs introduced here might change in the future and will not be
- supported anymore in such a case.
-
-
-
-
NOTE: As of 2.3, Token stores the term text
- internally as a malleable char[] termBuffer instead of
- String termText. The indexing code and core tokenizers
- have been changed to re-use a single Token instance, changing
- its buffer and other fields in-place as the Token is
- processed. This provides substantially better indexing
- performance as it saves the GC cost of new'ing a Token and
- String for every term. The APIs that accept String
- termText are still available but a warning about the
- associated performance cost has been added (below). The
- {@link #termText()} method has been deprecated.
- Tokenizers and filters should try to re-use a Token
- instance when possible for best performance, by
- implementing the {@link TokenStream#next(Token)} API.
- Failing that, to create a new Token you should first use
- one of the constructors that starts with null text. To load
- the token from a char[] use {@link #setTermBuffer(char[], int, int)}.
- To load from a String use {@link #setTermBuffer(String)} or {@link #setTermBuffer(String, int, int)}.
- Alternatively you can get the Token's termBuffer by calling either {@link #termBuffer()},
- if you know that your text is shorter than the capacity of the termBuffer
- or {@link #resizeTermBuffer(int)}, if there is any possibility
- that you may need to grow the buffer. Fill in the characters of your term into this
- buffer, with {@link String#getChars(int, int, char[], int)} if loading from a string,
- or with {@link System#arraycopy(Object, int, Object, int, int)}, and finally call {@link #setTermLength(int)} to
- set the length of the term text. See LUCENE-969
- for details.
- Typical reuse patterns:
-
- Copying text from a string (type is reset to #DEFAULT_TYPE if not specified):
-
- return reusableToken.reinit(string, startOffset, endOffset[, type]);
-
-
- Copying some text from a string (type is reset to #DEFAULT_TYPE if not specified):
-
- return reusableToken.reinit(string, 0, string.length(), startOffset, endOffset[, type]);
-
-
-
- Copying text from char[] buffer (type is reset to #DEFAULT_TYPE if not specified):
-
- return reusableToken.reinit(buffer, 0, buffer.length, startOffset, endOffset[, type]);
-
-
- Copying some text from a char[] buffer (type is reset to #DEFAULT_TYPE if not specified):
-
- return reusableToken.reinit(buffer, start, end - start, startOffset, endOffset[, type]);
-
-
- Copying from one one Token to another (type is reset to #DEFAULT_TYPE if not specified):
-
- return reusableToken.reinit(source.termBuffer(), 0, source.termLength(), source.startOffset(), source.endOffset()[, source.type()]);
-
-
-
+ NOTE: As of 2.9, Token implements all {@link Attribute} interfaces
+ that are part of core Lucene and can be found in the {@code tokenattributes} subpackage.
+ Even though it is not necessary to use Token anymore, with the new TokenStream API it can
+ be used as convenience class that implements all {@link Attribute}s, which is especially useful
+ to easily switch from the old to the new TokenStream API.
+
A few things to note:
- clear() initializes most of the fields to default values, but not startOffset, endOffset and type.
+ clear() initializes all of the fields to default values. This was changed in contrast to Lucene 2.4, but should affect no one.
Because TokenStreams
can be chained, one cannot assume that the Token's
current type is correct.
- The startOffset and endOffset represent the start and offset in the source text. So be careful in adjusting them.
+ The startOffset and endOffset represent the start and offset in the source text, so be careful in adjusting them.
When caching a reusable token, clone it. When injecting a cached token into a stream that can be reset, clone it again.
-
- @see org.apache.lucene.index.Payload
+
+ Please note: With Lucene 3.1, the {@linkplain #toString toString()}
method had to be changed to match the
+ {@link CharSequence} interface introduced by the interface {@link org.apache.lucene.analysis.tokenattributes.CharTermAttribute}.
+ This method now only prints the term text, no additional information anymore.
+
+ @deprecated This class is outdated and no longer used since Lucene 2.9. Nuke it finally!
*/
-public class Token implements Cloneable {
+@Deprecated
+public class Token extends PackedTokenAttributeImpl implements FlagsAttribute, PayloadAttribute {
- public static final String DEFAULT_TYPE = "word";
-
- private static int MIN_BUFFER_SIZE = 10;
-
- /** @deprecated We will remove this when we remove the
- * deprecated APIs */
- private String termText;
-
- /**
- * Characters for the term text.
- * @deprecated This will be made private. Instead, use:
- * {@link termBuffer()},
- * {@link #setTermBuffer(char[], int, int)},
- * {@link #setTermBuffer(String)}, or
- * {@link #setTermBuffer(String, int, int)}
- */
- char[] termBuffer;
-
- /**
- * Length of term text in the buffer.
- * @deprecated This will be made private. Instead, use:
- * {@link termLength()}, or @{link setTermLength(int)}.
- */
- int termLength;
-
- /**
- * Start in source text.
- * @deprecated This will be made private. Instead, use:
- * {@link startOffset()}, or @{link setStartOffset(int)}.
- */
- int startOffset;
-
- /**
- * End in source text.
- * @deprecated This will be made private. Instead, use:
- * {@link endOffset()}, or @{link setEndOffset(int)}.
- */
- int endOffset;
-
- /**
- * The lexical type of the token.
- * @deprecated This will be made private. Instead, use:
- * {@link type()}, or @{link setType(String)}.
- */
- String type = DEFAULT_TYPE;
-
private int flags;
-
- /**
- * @deprecated This will be made private. Instead, use:
- * {@link getPayload()}, or @{link setPayload(Payload)}.
- */
- Payload payload;
-
- /**
- * @deprecated This will be made private. Instead, use:
- * {@link getPositionIncrement()}, or @{link setPositionIncrement(String)}.
- */
- int positionIncrement = 1;
+ private BytesRef payload;
/** Constructs a Token will null text. */
public Token() {
@@ -186,8 +84,7 @@
* @param start start offset in the source text
* @param end end offset in the source text */
public Token(int start, int end) {
- startOffset = start;
- endOffset = end;
+ setOffset(start, end);
}
/** Constructs a Token with null text and start & end
@@ -196,9 +93,8 @@
* @param end end offset in the source text
* @param typ the lexical type of this Token */
public Token(int start, int end, String typ) {
- startOffset = start;
- endOffset = end;
- type = typ;
+ setOffset(start, end);
+ setType(typ);
}
/**
@@ -209,9 +105,8 @@
* @param flags The bits to set for this token
*/
public Token(int start, int end, int flags) {
- startOffset = start;
- endOffset = end;
- this.flags = flags;
+ setOffset(start, end);
+ setFlags(flags);
}
/** Constructs a Token with the given term text, and start
@@ -220,640 +115,273 @@
* instead use the char[] termBuffer methods to set the
* term text.
* @param text term text
- * @param start start offset
- * @param end end offset
- * @deprecated
+ * @param start start offset in the source text
+ * @param end end offset in the source text
*/
- public Token(String text, int start, int end) {
- termText = text;
- startOffset = start;
- endOffset = end;
+ public Token(CharSequence text, int start, int end) {
+ append(text);
+ setOffset(start, end);
}
/** Constructs a Token with the given text, start and end
* offsets, & type. NOTE: for better indexing
* speed you should instead use the char[] termBuffer
* methods to set the term text.
* @param text term text
- * @param start start offset
- * @param end end offset
+ * @param start start offset in the source text
+ * @param end end offset in the source text
* @param typ token type
- * @deprecated
*/
public Token(String text, int start, int end, String typ) {
- termText = text;
- startOffset = start;
- endOffset = end;
- type = typ;
+ append(text);
+ setOffset(start, end);
+ setType(typ);
}
/**
* Constructs a Token with the given text, start and end
* offsets, & type. NOTE: for better indexing
* speed you should instead use the char[] termBuffer
* methods to set the term text.
- * @param text
- * @param start
- * @param end
+ * @param text term text
+ * @param start start offset in the source text
+ * @param end end offset in the source text
* @param flags token type bits
- * @deprecated
*/
public Token(String text, int start, int end, int flags) {
- termText = text;
- startOffset = start;
- endOffset = end;
- this.flags = flags;
+ append(text);
+ setOffset(start, end);
+ setFlags(flags);
}
/**
* Constructs a Token with the given term buffer (offset
* & length), start and end
* offsets
- * @param startTermBuffer
- * @param termBufferOffset
- * @param termBufferLength
- * @param start
- * @param end
+ * @param startTermBuffer buffer containing term text
+ * @param termBufferOffset the index in the buffer of the first character
+ * @param termBufferLength number of valid characters in the buffer
+ * @param start start offset in the source text
+ * @param end end offset in the source text
*/
public Token(char[] startTermBuffer, int termBufferOffset, int termBufferLength, int start, int end) {
- setTermBuffer(startTermBuffer, termBufferOffset, termBufferLength);
- startOffset = start;
- endOffset = end;
+ copyBuffer(startTermBuffer, termBufferOffset, termBufferLength);
+ setOffset(start, end);
}
- /** Set the position increment. This determines the position of this token
- * relative to the previous Token in a {@link TokenStream}, used in phrase
- * searching.
- *
- * The default value is one.
- *
- *
Some common uses for this are:
- *
- * Set it to zero to put multiple terms in the same position. This is
- * useful if, e.g., a word has multiple stems. Searches for phrases
- * including either stem will match. In this case, all but the first stem's
- * increment should be set to zero: the increment of the first instance
- * should be one. Repeating a token with an increment of zero can also be
- * used to boost the scores of matches on that token.
- *
- * Set it to values greater than one to inhibit exact phrase matches.
- * If, for example, one does not want phrases to match across removed stop
- * words, then one could build a stop word filter that removes stop words and
- * also sets the increment to the number of stop words removed before each
- * non-stop word. Then exact phrase queries will only match when the terms
- * occur with no intervening stop words.
- *
- *
- * @param positionIncrement the distance from the prior term
- * @see org.apache.lucene.index.TermPositions
- */
- public void setPositionIncrement(int positionIncrement) {
- if (positionIncrement < 0)
- throw new IllegalArgumentException
- ("Increment must be zero or greater: " + positionIncrement);
- this.positionIncrement = positionIncrement;
- }
-
- /** Returns the position increment of this Token.
- * @see #setPositionIncrement
- */
- public int getPositionIncrement() {
- return positionIncrement;
- }
-
- /** Sets the Token's term text. NOTE: for better
- * indexing speed you should instead use the char[]
- * termBuffer methods to set the term text.
- * @deprecated use {@link #setTermBuffer(char[], int, int)} or
- * {@link #setTermBuffer(String)} or
- * {@link #setTermBuffer(String, int, int)}.
- */
- public void setTermText(String text) {
- termText = text;
- termBuffer = null;
- }
-
- /** Returns the Token's term text.
- *
- * @deprecated This method now has a performance penalty
- * because the text is stored internally in a char[]. If
- * possible, use {@link #termBuffer()} and {@link
- * #termLength()} directly instead. If you really need a
- * String, use {@link #term()}
- */
- public final String termText() {
- if (termText == null && termBuffer != null)
- termText = new String(termBuffer, 0, termLength);
- return termText;
- }
-
- /** Returns the Token's term text.
- *
- * This method has a performance penalty
- * because the text is stored internally in a char[]. If
- * possible, use {@link #termBuffer()} and {@link
- * #termLength()} directly instead. If you really need a
- * String, use this method, which is nothing more than
- * a convenience call to new String(token.termBuffer(), 0, token.termLength())
- */
- public final String term() {
- if (termText != null)
- return termText;
- initTermBuffer();
- return new String(termBuffer, 0, termLength);
- }
-
- /** Copies the contents of buffer, starting at offset for
- * length characters, into the termBuffer array.
- * @param buffer the buffer to copy
- * @param offset the index in the buffer of the first character to copy
- * @param length the number of characters to copy
- */
- public final void setTermBuffer(char[] buffer, int offset, int length) {
- termText = null;
- char[] newCharBuffer = growTermBuffer(length);
- if (newCharBuffer != null) {
- termBuffer = newCharBuffer;
- }
- System.arraycopy(buffer, offset, termBuffer, 0, length);
- termLength = length;
- }
-
- /** Copies the contents of buffer into the termBuffer array.
- * @param buffer the buffer to copy
- */
- public final void setTermBuffer(String buffer) {
- termText = null;
- int length = buffer.length();
- char[] newCharBuffer = growTermBuffer(length);
- if (newCharBuffer != null) {
- termBuffer = newCharBuffer;
- }
- buffer.getChars(0, length, termBuffer, 0);
- termLength = length;
- }
-
- /** Copies the contents of buffer, starting at offset and continuing
- * for length characters, into the termBuffer array.
- * @param buffer the buffer to copy
- * @param offset the index in the buffer of the first character to copy
- * @param length the number of characters to copy
- */
- public final void setTermBuffer(String buffer, int offset, int length) {
- assert offset <= buffer.length();
- assert offset + length <= buffer.length();
- termText = null;
- char[] newCharBuffer = growTermBuffer(length);
- if (newCharBuffer != null) {
- termBuffer = newCharBuffer;
- }
- buffer.getChars(offset, offset + length, termBuffer, 0);
- termLength = length;
- }
-
- /** Returns the internal termBuffer character array which
- * you can then directly alter. If the array is too
- * small for your token, use {@link
- * #resizeTermBuffer(int)} to increase it. After
- * altering the buffer be sure to call {@link
- * #setTermLength} to record the number of valid
- * characters that were placed into the termBuffer. */
- public final char[] termBuffer() {
- initTermBuffer();
- return termBuffer;
- }
-
- /** Grows the termBuffer to at least size newSize, preserving the
- * existing content. Note: If the next operation is to change
- * the contents of the term buffer use
- * {@link #setTermBuffer(char[], int, int)},
- * {@link #setTermBuffer(String)}, or
- * {@link #setTermBuffer(String, int, int)}
- * to optimally combine the resize with the setting of the termBuffer.
- * @param newSize minimum size of the new termBuffer
- * @return newly created termBuffer with length >= newSize
- */
- public char[] resizeTermBuffer(int newSize) {
- char[] newCharBuffer = growTermBuffer(newSize);
- if (termBuffer == null) {
- // If there were termText, then preserve it.
- // note that if termBuffer is null then newCharBuffer cannot be null
- assert newCharBuffer != null;
- if (termText != null) {
- termText.getChars(0, termText.length(), newCharBuffer, 0);
- }
- termBuffer = newCharBuffer;
- } else if (newCharBuffer != null) {
- // Note: if newCharBuffer != null then termBuffer needs to grow.
- // If there were a termBuffer, then preserve it
- System.arraycopy(termBuffer, 0, newCharBuffer, 0, termBuffer.length);
- termBuffer = newCharBuffer;
- }
- termText = null;
- return termBuffer;
- }
-
- /** Allocates a buffer char[] of at least newSize
- * @param newSize minimum size of the buffer
- * @return newly created buffer with length >= newSize or null if the current termBuffer is big enough
- */
- private char[] growTermBuffer(int newSize) {
- if (termBuffer != null) {
- if (termBuffer.length >= newSize)
- // Already big enough
- return null;
- else
- // Not big enough; create a new array with slight
- // over allocation:
- return new char[ArrayUtil.getNextSize(newSize)];
- } else {
-
- // determine the best size
- // The buffer is always at least MIN_BUFFER_SIZE
- if (newSize < MIN_BUFFER_SIZE) {
- newSize = MIN_BUFFER_SIZE;
- }
-
- // If there is already a termText, then the size has to be at least that big
- if (termText != null) {
- int ttLength = termText.length();
- if (newSize < ttLength) {
- newSize = ttLength;
- }
- }
-
- return new char[newSize];
- }
- }
-
- // TODO: once we remove the deprecated termText() method
- // and switch entirely to char[] termBuffer we don't need
- // to use this method anymore
- private void initTermBuffer() {
- if (termBuffer == null) {
- if (termText == null) {
- termBuffer = new char[MIN_BUFFER_SIZE];
- termLength = 0;
- } else {
- int length = termText.length();
- if (length < MIN_BUFFER_SIZE) length = MIN_BUFFER_SIZE;
- termBuffer = new char[length];
- termLength = termText.length();
- termText.getChars(0, termText.length(), termBuffer, 0);
- termText = null;
- }
- } else if (termText != null)
- termText = null;
- }
-
- /** Return number of valid characters (length of the term)
- * in the termBuffer array. */
- public final int termLength() {
- initTermBuffer();
- return termLength;
- }
-
- /** Set number of valid characters (length of the term) in
- * the termBuffer array. Use this to truncate the termBuffer
- * or to synchronize with external manipulation of the termBuffer.
- * Note: to grow the size of the array,
- * use {@link #resizeTermBuffer(int)} first.
- * @param length the truncated length
- */
- public final void setTermLength(int length) {
- initTermBuffer();
- if (length > termBuffer.length)
- throw new IllegalArgumentException("length " + length + " exceeds the size of the termBuffer (" + termBuffer.length + ")");
- termLength = length;
- }
-
- /** Returns this Token's starting offset, the position of the first character
- corresponding to this token in the source text.
-
- Note that the difference between endOffset() and startOffset() may not be
- equal to termText.length(), as the term text may have been altered by a
- stemmer or some other filter. */
- public final int startOffset() {
- return startOffset;
- }
-
- /** Set the starting offset.
- @see #startOffset() */
- public void setStartOffset(int offset) {
- this.startOffset = offset;
- }
-
- /** Returns this Token's ending offset, one greater than the position of the
- last character corresponding to this token in the source text. The length
- of the token in the source text is (endOffset - startOffset). */
- public final int endOffset() {
- return endOffset;
- }
-
- /** Set the ending offset.
- @see #endOffset() */
- public void setEndOffset(int offset) {
- this.endOffset = offset;
- }
-
- /** Returns this Token's lexical type. Defaults to "word". */
- public final String type() {
- return type;
- }
-
- /** Set the lexical type.
- @see #type() */
- public final void setType(String type) {
- this.type = type;
- }
-
/**
- * EXPERIMENTAL: While we think this is here to stay, we may want to change it to be a long.
- *
- *
- * Get the bitset for any bits that have been set. This is completely distinct from {@link #type()}, although they do share similar purposes.
- * The flags can be used to encode information about the token for use by other {@link org.apache.lucene.analysis.TokenFilter}s.
- *
- *
- * @return The bits
+ * {@inheritDoc}
+ * @see FlagsAttribute
*/
+ @Override
public int getFlags() {
return flags;
}
/**
- * @see #getFlags()
+ * {@inheritDoc}
+ * @see FlagsAttribute
*/
+ @Override
public void setFlags(int flags) {
this.flags = flags;
}
/**
- * Returns this Token's payload.
- */
- public Payload getPayload() {
+ * {@inheritDoc}
+ * @see PayloadAttribute
+ */
+ @Override
+ public BytesRef getPayload() {
return this.payload;
}
- /**
- * Sets this Token's payload.
+ /**
+ * {@inheritDoc}
+ * @see PayloadAttribute
*/
- public void setPayload(Payload payload) {
+ @Override
+ public void setPayload(BytesRef payload) {
this.payload = payload;
}
- public String toString() {
- StringBuffer sb = new StringBuffer();
- sb.append('(');
- initTermBuffer();
- if (termBuffer == null)
- sb.append("null");
- else
- sb.append(termBuffer, 0, termLength);
- sb.append(',').append(startOffset).append(',').append(endOffset);
- if (!type.equals("word"))
- sb.append(",type=").append(type);
- if (positionIncrement != 1)
- sb.append(",posIncr=").append(positionIncrement);
- sb.append(')');
- return sb.toString();
- }
-
- /** Resets the term text, payload, flags, and positionIncrement to default.
- * Other fields such as startOffset, endOffset and the token type are
- * not reset since they are normally overwritten by the tokenizer. */
+ /** Resets the term text, payload, flags, positionIncrement, positionLength,
+ * startOffset, endOffset and token type to default.
+ */
+ @Override
public void clear() {
- payload = null;
- // Leave termBuffer to allow re-use
- termLength = 0;
- termText = null;
- positionIncrement = 1;
+ super.clear();
flags = 0;
- // startOffset = endOffset = 0;
- // type = DEFAULT_TYPE;
+ payload = null;
}
- public Object clone() {
- try {
- Token t = (Token)super.clone();
- // Do a deep clone
- if (termBuffer != null) {
- t.termBuffer = (char[]) termBuffer.clone();
- }
- if (payload != null) {
- t.setPayload((Payload) payload.clone());
- }
- return t;
- } catch (CloneNotSupportedException e) {
- throw new RuntimeException(e); // shouldn't happen
+ @Override
+ public Token clone() {
+ Token t = (Token)super.clone();
+ // Do a deep clone
+ if (payload != null) {
+ t.payload = payload.clone();
}
- }
-
- /** Makes a clone, but replaces the term buffer &
- * start/end offset in the process. This is more
- * efficient than doing a full clone (and then calling
- * setTermBuffer) because it saves a wasted copy of the old
- * termBuffer. */
- public Token clone(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset) {
- final Token t = new Token(newTermBuffer, newTermOffset, newTermLength, newStartOffset, newEndOffset);
- t.positionIncrement = positionIncrement;
- t.flags = flags;
- t.type = type;
- if (payload != null)
- t.payload = (Payload) payload.clone();
return t;
}
+ @Override
public boolean equals(Object obj) {
if (obj == this)
return true;
if (obj instanceof Token) {
- Token other = (Token) obj;
-
- initTermBuffer();
- other.initTermBuffer();
-
- if (termLength == other.termLength &&
- startOffset == other.startOffset &&
- endOffset == other.endOffset &&
- flags == other.flags &&
- positionIncrement == other.positionIncrement &&
- subEqual(type, other.type) &&
- subEqual(payload, other.payload)) {
- for(int i=0;iToken as implementation for the basic
+ * attributes and return the default impl (with "Impl" appended) for all other
+ * attributes.
+ * @since 3.0
+ */
+ public static final AttributeFactory TOKEN_ATTRIBUTE_FACTORY =
+ AttributeFactory.getStaticImplementation(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY, Token.class);
}
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/TokenFilter.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/TokenFilter.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/TokenFilter.java 17 Aug 2012 14:55:08 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/TokenFilter.java 16 Dec 2014 11:31:57 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -19,30 +19,54 @@
import java.io.IOException;
-/** A TokenFilter is a TokenStream whose input is another token stream.
+/** A TokenFilter is a TokenStream whose input is another TokenStream.
- This is an abstract class.
- NOTE: subclasses must override {@link #next(Token)}. It's
- also OK to instead override {@link #next()} but that
- method is now deprecated in favor of {@link #next(Token)}.
+ This is an abstract class; subclasses must override {@link #incrementToken()}.
+ @see TokenStream
*/
public abstract class TokenFilter extends TokenStream {
/** The source of tokens for this filter. */
- protected TokenStream input;
+ protected final TokenStream input;
/** Construct a token stream filtering the given input. */
protected TokenFilter(TokenStream input) {
+ super(input);
this.input = input;
}
-
- /** Close the input TokenStream. */
+
+ /**
+ * {@inheritDoc}
+ *
+ * NOTE:
+ * The default implementation chains the call to the input TokenStream, so
+ * be sure to call super.end()
first when overriding this method.
+ */
+ @Override
+ public void end() throws IOException {
+ input.end();
+ }
+
+ /**
+ * {@inheritDoc}
+ *
+ * NOTE:
+ * The default implementation chains the call to the input TokenStream, so
+ * be sure to call super.close()
when overriding this method.
+ */
+ @Override
public void close() throws IOException {
input.close();
}
- /** Reset the filter as well as the input TokenStream. */
+ /**
+ * {@inheritDoc}
+ *
+ * NOTE:
+ * The default implementation chains the call to the input TokenStream, so
+ * be sure to call super.reset()
when overriding this method.
+ */
+ @Override
public void reset() throws IOException {
- super.reset();
input.reset();
}
}
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/TokenStream.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/TokenStream.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/TokenStream.java 17 Aug 2012 14:55:07 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/TokenStream.java 16 Dec 2014 11:31:57 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,94 +17,192 @@
* limitations under the License.
*/
-import org.apache.lucene.index.Payload;
-
import java.io.IOException;
+import java.io.Closeable;
+import java.lang.reflect.Modifier;
-/** A TokenStream enumerates the sequence of tokens, either from
- fields of a document or from query text.
-
- This is an abstract class. Concrete subclasses are:
-
- {@link Tokenizer}, a TokenStream
- whose input is a Reader; and
- {@link TokenFilter}, a TokenStream
- whose input is another TokenStream.
-
- NOTE: subclasses must override {@link #next(Token)}. It's
- also OK to instead override {@link #next()} but that
- method is now deprecated in favor of {@link #next(Token)}.
- */
+import org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl;
+import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.util.Attribute;
+import org.apache.lucene.util.AttributeFactory;
+import org.apache.lucene.util.AttributeImpl;
+import org.apache.lucene.util.AttributeSource;
-public abstract class TokenStream {
+/**
+ * A TokenStream
enumerates the sequence of tokens, either from
+ * {@link Field}s of a {@link Document} or from query text.
+ *
+ * This is an abstract class; concrete subclasses are:
+ *
+ * {@link Tokenizer}, a TokenStream
whose input is a Reader; and
+ * {@link TokenFilter}, a TokenStream
whose input is another
+ * TokenStream
.
+ *
+ * A new TokenStream
API has been introduced with Lucene 2.9. This API
+ * has moved from being {@link Token}-based to {@link Attribute}-based. While
+ * {@link Token} still exists in 2.9 as a convenience class, the preferred way
+ * to store the information of a {@link Token} is to use {@link AttributeImpl}s.
+ *
+ * TokenStream
now extends {@link AttributeSource}, which provides
+ * access to all of the token {@link Attribute}s for the TokenStream
.
+ * Note that only one instance per {@link AttributeImpl} is created and reused
+ * for every token. This approach reduces object creation and allows local
+ * caching of references to the {@link AttributeImpl}s. See
+ * {@link #incrementToken()} for further details.
+ *
+ * The workflow of the new TokenStream
API is as follows:
+ *
+ * Instantiation of TokenStream
/{@link TokenFilter}s which add/get
+ * attributes to/from the {@link AttributeSource}.
+ * The consumer calls {@link TokenStream#reset()}.
+ * The consumer retrieves attributes from the stream and stores local
+ * references to all attributes it wants to access.
+ * The consumer calls {@link #incrementToken()} until it returns false
+ * consuming the attributes after each call.
+ * The consumer calls {@link #end()} so that any end-of-stream operations
+ * can be performed.
+ * The consumer calls {@link #close()} to release any resource when finished
+ * using the TokenStream
.
+ *
+ * To make sure that filters and consumers know which attributes are available,
+ * the attributes must be added during instantiation. Filters and consumers are
+ * not required to check for availability of attributes in
+ * {@link #incrementToken()}.
+ *
+ * You can find some example code for the new API in the analysis package level
+ * Javadoc.
+ *
+ * Sometimes it is desirable to capture a current state of a TokenStream
,
+ * e.g., for buffering purposes (see {@link CachingTokenFilter},
+ * TeeSinkTokenFilter). For this usecase
+ * {@link AttributeSource#captureState} and {@link AttributeSource#restoreState}
+ * can be used.
+ *
The {@code TokenStream}-API in Lucene is based on the decorator pattern.
+ * Therefore all non-abstract subclasses must be final or have at least a final
+ * implementation of {@link #incrementToken}! This is checked when Java
+ * assertions are enabled.
+ */
+public abstract class TokenStream extends AttributeSource implements Closeable {
+
+ /** Default {@link AttributeFactory} instance that should be used for TokenStreams. */
+ public static final AttributeFactory DEFAULT_TOKEN_ATTRIBUTE_FACTORY =
+ AttributeFactory.getStaticImplementation(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY, PackedTokenAttributeImpl.class);
- /** Returns the next token in the stream, or null at EOS.
- * @deprecated The returned Token is a "full private copy" (not
- * re-used across calls to next()) but will be slower
- * than calling {@link #next(Token)} instead.. */
- public Token next() throws IOException {
- final Token reusableToken = new Token();
- Token nextToken = next(reusableToken);
-
- if (nextToken != null) {
- Payload p = nextToken.getPayload();
- if (p != null) {
- nextToken.setPayload((Payload) p.clone());
- }
+ /**
+ * A TokenStream using the default attribute factory.
+ */
+ protected TokenStream() {
+ super(DEFAULT_TOKEN_ATTRIBUTE_FACTORY);
+ assert assertFinal();
+ }
+
+ /**
+ * A TokenStream that uses the same attributes as the supplied one.
+ */
+ protected TokenStream(AttributeSource input) {
+ super(input);
+ assert assertFinal();
+ }
+
+ /**
+ * A TokenStream using the supplied AttributeFactory for creating new {@link Attribute} instances.
+ */
+ protected TokenStream(AttributeFactory factory) {
+ super(factory);
+ assert assertFinal();
+ }
+
+ private boolean assertFinal() {
+ try {
+ final Class> clazz = getClass();
+ if (!clazz.desiredAssertionStatus())
+ return true;
+ assert clazz.isAnonymousClass() ||
+ (clazz.getModifiers() & (Modifier.FINAL | Modifier.PRIVATE)) != 0 ||
+ Modifier.isFinal(clazz.getMethod("incrementToken").getModifiers()) :
+ "TokenStream implementation classes or at least their incrementToken() implementation must be final";
+ return true;
+ } catch (NoSuchMethodException nsme) {
+ return false;
}
-
- return nextToken;
}
-
- /** Returns the next token in the stream, or null at EOS.
- * When possible, the input Token should be used as the
- * returned Token (this gives fastest tokenization
- * performance), but this is not required and a new Token
- * may be returned. Callers may re-use a single Token
- * instance for successive calls to this method.
- *
- * This implicitly defines a "contract" between
- * consumers (callers of this method) and
- * producers (implementations of this method
- * that are the source for tokens):
- *
- * A consumer must fully consume the previously
- * returned Token before calling this method again.
- * A producer must call {@link Token#clear()}
- * before setting the fields in it & returning it
- *
- * Also, the producer must make no assumptions about a
- * Token after it has been returned: the caller may
- * arbitrarily change it. If the producer needs to hold
- * onto the token for subsequent calls, it must clone()
- * it before storing it.
- * Note that a {@link TokenFilter} is considered a consumer.
- * @param reusableToken a Token that may or may not be used to
- * return; this parameter should never be null (the callee
- * is not required to check for null before using it, but it is a
- * good idea to assert that it is not null.)
- * @return next token in the stream or null if end-of-stream was hit
+
+ /**
+ * Consumers (i.e., {@link IndexWriter}) use this method to advance the stream to
+ * the next token. Implementing classes must implement this method and update
+ * the appropriate {@link AttributeImpl}s with the attributes of the next
+ * token.
+ *
+ * The producer must make no assumptions about the attributes after the method
+ * has been returned: the caller may arbitrarily change it. If the producer
+ * needs to preserve the state for subsequent calls, it can use
+ * {@link #captureState} to create a copy of the current attribute state.
+ *
+ * This method is called for every token of a document, so an efficient
+ * implementation is crucial for good performance. To avoid calls to
+ * {@link #addAttribute(Class)} and {@link #getAttribute(Class)},
+ * references to all {@link AttributeImpl}s that this stream uses should be
+ * retrieved during instantiation.
+ *
+ * To ensure that filters and consumers know which attributes are available,
+ * the attributes must be added during instantiation. Filters and consumers
+ * are not required to check for availability of attributes in
+ * {@link #incrementToken()}.
+ *
+ * @return false for end of stream; true otherwise
*/
- public Token next(final Token reusableToken) throws IOException {
- // We don't actually use inputToken, but still add this assert
- assert reusableToken != null;
- return next();
+ public abstract boolean incrementToken() throws IOException;
+
+ /**
+ * This method is called by the consumer after the last token has been
+ * consumed, after {@link #incrementToken()} returned false
+ * (using the new TokenStream
API). Streams implementing the old API
+ * should upgrade to use this feature.
+ *
+ * This method can be used to perform any end-of-stream operations, such as
+ * setting the final offset of a stream. The final offset of a stream might
+ * differ from the offset of the last token eg in case one or more whitespaces
+ * followed after the last token, but a WhitespaceTokenizer was used.
+ *
+ * Additionally any skipped positions (such as those removed by a stopfilter)
+ * can be applied to the position increment, or any adjustment of other
+ * attributes where the end-of-stream value may be important.
+ *
+ * If you override this method, always call {@code super.end()}.
+ *
+ * @throws IOException If an I/O error occurs
+ */
+ public void end() throws IOException {
+ clearAttributes(); // LUCENE-3849: don't consume dirty atts
+ PositionIncrementAttribute posIncAtt = getAttribute(PositionIncrementAttribute.class);
+ if (posIncAtt != null) {
+ posIncAtt.setPositionIncrement(0);
+ }
}
- /** Resets this stream to the beginning. This is an
- * optional operation, so subclasses may or may not
- * implement this method. Reset() is not needed for
- * the standard indexing process. However, if the Tokens
- * of a TokenStream are intended to be consumed more than
- * once, it is necessary to implement reset(). Note that
- * if your TokenStream caches tokens and feeds them back
- * again after a reset, it is imperative that you
- * clone the tokens when you store them away (on the
- * first pass) as well as when you return them (on future
- * passes after reset()).
+ /**
+ * This method is called by a consumer before it begins consumption using
+ * {@link #incrementToken()}.
+ *
+ * Resets this stream to a clean state. Stateful implementations must implement
+ * this method so that they can be reused, just as if they had been created fresh.
+ *
+ * If you override this method, always call {@code super.reset()}, otherwise
+ * some internal state will not be correctly reset (e.g., {@link Tokenizer} will
+ * throw {@link IllegalStateException} on further usage).
*/
public void reset() throws IOException {}
- /** Releases resources associated with this stream. */
+ /** Releases resources associated with this stream.
+ *
+ * If you override this method, always call {@code super.close()}, otherwise
+ * some internal state will not be correctly reset (e.g., {@link Tokenizer} will
+ * throw {@link IllegalStateException} on reuse).
+ */
+ @Override
public void close() throws IOException {}
+
}
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/TokenStreamToAutomaton.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/Tokenizer.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/Tokenizer.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/Tokenizer.java 17 Aug 2012 14:55:08 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/Tokenizer.java 16 Dec 2014 11:31:57 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,43 +17,104 @@
* limitations under the License.
*/
+import org.apache.lucene.util.AttributeFactory;
+import org.apache.lucene.util.AttributeSource;
+
import java.io.Reader;
import java.io.IOException;
/** A Tokenizer is a TokenStream whose input is a Reader.
- This is an abstract class.
+ This is an abstract class; subclasses must override {@link #incrementToken()}
- NOTE: subclasses must override {@link #next(Token)}. It's
- also OK to instead override {@link #next()} but that
- method is now deprecated in favor of {@link #next(Token)}.
-
- NOTE: subclasses overriding {@link #next(Token)} must
- call {@link Token#clear()}.
+ NOTE: Subclasses overriding {@link #incrementToken()} must
+ call {@link AttributeSource#clearAttributes()} before
+ setting attributes.
*/
-
-public abstract class Tokenizer extends TokenStream {
+public abstract class Tokenizer extends TokenStream {
/** The text source for this Tokenizer. */
- protected Reader input;
+ protected Reader input = ILLEGAL_STATE_READER;
+
+ /** Pending reader: not actually assigned to input until reset() */
+ private Reader inputPending = ILLEGAL_STATE_READER;
- /** Construct a tokenizer with null input. */
- protected Tokenizer() {}
-
/** Construct a token stream processing the given input. */
protected Tokenizer(Reader input) {
- this.input = input;
+ if (input == null) {
+ throw new NullPointerException("input must not be null");
+ }
+ this.inputPending = input;
}
+
+ /** Construct a token stream processing the given input using the given AttributeFactory. */
+ protected Tokenizer(AttributeFactory factory, Reader input) {
+ super(factory);
+ if (input == null) {
+ throw new NullPointerException("input must not be null");
+ }
+ this.inputPending = input;
+ }
- /** By default, closes the input Reader. */
+ /**
+ * {@inheritDoc}
+ *
+ * NOTE:
+ * The default implementation closes the input Reader, so
+ * be sure to call super.close()
when overriding this method.
+ */
+ @Override
public void close() throws IOException {
input.close();
+ // LUCENE-2387: don't hold onto Reader after close, so
+ // GC can reclaim
+ inputPending = input = ILLEGAL_STATE_READER;
}
+
+ /** Return the corrected offset. If {@link #input} is a {@link CharFilter} subclass
+ * this method calls {@link CharFilter#correctOffset}, else returns currentOff
.
+ * @param currentOff offset as seen in the output
+ * @return corrected offset based on the input
+ * @see CharFilter#correctOffset
+ */
+ protected final int correctOffset(int currentOff) {
+ return (input instanceof CharFilter) ? ((CharFilter) input).correctOffset(currentOff) : currentOff;
+ }
- /** Expert: Reset the tokenizer to a new reader. Typically, an
- * analyzer (in its reusableTokenStream method) will use
+ /** Expert: Set a new reader on the Tokenizer. Typically, an
+ * analyzer (in its tokenStream method) will use
* this to re-use a previously created tokenizer. */
- public void reset(Reader input) throws IOException {
- this.input = input;
+ public final void setReader(Reader input) throws IOException {
+ if (input == null) {
+ throw new NullPointerException("input must not be null");
+ } else if (this.input != ILLEGAL_STATE_READER) {
+ throw new IllegalStateException("TokenStream contract violation: close() call missing");
+ }
+ this.inputPending = input;
+ assert setReaderTestPoint();
}
+
+ @Override
+ public void reset() throws IOException {
+ super.reset();
+ input = inputPending;
+ inputPending = ILLEGAL_STATE_READER;
+ }
+
+ // only used by assert, for testing
+ boolean setReaderTestPoint() {
+ return true;
+ }
+
+ private static final Reader ILLEGAL_STATE_READER = new Reader() {
+ @Override
+ public int read(char[] cbuf, int off, int len) {
+ throw new IllegalStateException("TokenStream contract violation: reset()/close() call missing, " +
+ "reset() called multiple times, or subclass does not call super.reset(). " +
+ "Please see Javadocs of TokenStream class for more information about the correct consuming workflow.");
+ }
+
+ @Override
+ public void close() {}
+ };
}
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/WhitespaceAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/WhitespaceTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/WordlistLoader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/package.html
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/package.html,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/package.html 17 Aug 2012 14:55:08 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/package.html 16 Dec 2014 11:31:58 -0000 1.1.2.1
@@ -18,13 +18,12 @@
-
API and code to convert text into indexable/searchable tokens. Covers {@link org.apache.lucene.analysis.Analyzer} and related classes.
Parsing? Tokenization? Analysis!
-Lucene, indexing and search library, accepts only plain text input.
+Lucene, an indexing and search library, accepts only plain text input.
Parsing
@@ -34,20 +33,29 @@
Tokenization
-Plain text passed to Lucene for indexing goes through a process generally called tokenization – namely breaking of the
-input text into small indexing elements –
-{@link org.apache.lucene.analysis.Token Tokens}.
-The way input text is broken into tokens very
-much dictates further capabilities of search upon that text.
+Plain text passed to Lucene for indexing goes through a process generally called tokenization. Tokenization is the process
+of breaking input text into small indexing elements – tokens.
+The way input text is broken into tokens heavily influences how people will then be able to search for that text.
For instance, sentences beginnings and endings can be identified to provide for more accurate phrase
and proximity searches (though sentence identification is not provided by Lucene).
-In some cases simply breaking the input text into tokens is not enough – a deeper Analysis is needed,
-providing for several functions, including (but not limited to):
+ In some cases simply breaking the input text into tokens is not enough
+ – a deeper Analysis may be needed. Lucene includes both
+ pre- and post-tokenization analysis facilities.
+
+
+ Pre-tokenization analysis can include (but is not limited to) stripping
+ HTML markup, and transforming or removing text matching arbitrary patterns
+ or sets of fixed strings.
+
+
+ There are many post-tokenization steps that can be done, including
+ (but not limited to):
+
+
+
+ {@link org.apache.lucene.analysis.Analyzer} – An Analyzer is
+ responsible for building a
+ {@link org.apache.lucene.analysis.TokenStream} which can be consumed
+ by the indexing and searching processes. See below for more information
+ on implementing your own Analyzer.
+
+
+ CharFilter – CharFilter extends
+ {@link java.io.Reader} to perform pre-tokenization substitutions,
+ deletions, and/or insertions on an input Reader's text, while providing
+ corrected character offsets to account for these modifications. This
+ capability allows highlighting to function over the original text when
+ indexed tokens are created from CharFilter-modified text with offsets
+ that are not the same as those in the original text. Tokenizers'
+ constructors and reset() methods accept a CharFilter. CharFilters may
+ be chained to perform multiple pre-tokenization modifications.
+
+
+ {@link org.apache.lucene.analysis.Tokenizer} – A Tokenizer is a
+ {@link org.apache.lucene.analysis.TokenStream} and is responsible for
+ breaking up incoming text into tokens. In most cases, an Analyzer will
+ use a Tokenizer as the first step in the analysis process. However,
+ to modify text prior to tokenization, use a CharStream subclass (see
+ above).
+
+
+ {@link org.apache.lucene.analysis.TokenFilter} – A TokenFilter is
+ also a {@link org.apache.lucene.analysis.TokenStream} and is responsible
+ for modifying tokens that have been created by the Tokenizer. Common
+ modifications performed by a TokenFilter are: deletion, stemming, synonym
+ injection, and down casing. Not all Analyzers require TokenFilters.
+
+
Hints, Tips and Traps
- The synergy between {@link org.apache.lucene.analysis.Analyzer} and {@link org.apache.lucene.analysis.Tokenizer}
- is sometimes confusing. To ease on this confusion, some clarifications:
-
- The {@link org.apache.lucene.analysis.Analyzer} is responsible for the entire task of
- creating tokens out of the input text, while the {@link org.apache.lucene.analysis.Tokenizer}
- is only responsible for breaking the input text into tokens. Very likely, tokens created
- by the {@link org.apache.lucene.analysis.Tokenizer} would be modified or even omitted
- by the {@link org.apache.lucene.analysis.Analyzer} (via one or more
- {@link org.apache.lucene.analysis.TokenFilter}s) before being returned.
-
- {@link org.apache.lucene.analysis.Tokenizer} is a {@link org.apache.lucene.analysis.TokenStream},
- but {@link org.apache.lucene.analysis.Analyzer} is not.
-
- {@link org.apache.lucene.analysis.Analyzer} is "field aware", but
- {@link org.apache.lucene.analysis.Tokenizer} is not.
-
-
+ The synergy between {@link org.apache.lucene.analysis.Analyzer} and
+ {@link org.apache.lucene.analysis.Tokenizer} is sometimes confusing. To ease
+ this confusion, some clarifications:
+
+
+ The {@link org.apache.lucene.analysis.Analyzer} is responsible for the entire task of
+ creating tokens out of the input text, while the {@link org.apache.lucene.analysis.Tokenizer}
+ is only responsible for breaking the input text into tokens. Very likely, tokens created
+ by the {@link org.apache.lucene.analysis.Tokenizer} would be modified or even omitted
+ by the {@link org.apache.lucene.analysis.Analyzer} (via one or more
+ {@link org.apache.lucene.analysis.TokenFilter}s) before being returned.
+
+
+ {@link org.apache.lucene.analysis.Tokenizer} is a {@link org.apache.lucene.analysis.TokenStream},
+ but {@link org.apache.lucene.analysis.Analyzer} is not.
+
+
+ {@link org.apache.lucene.analysis.Analyzer} is "field aware", but
+ {@link org.apache.lucene.analysis.Tokenizer} is not.
+
+
- Lucene Java provides a number of analysis capabilities, the most commonly used one being the {@link
- org.apache.lucene.analysis.standard.StandardAnalyzer}. Many applications will have a long and industrious life with nothing more
+ Lucene Java provides a number of analysis capabilities, the most commonly used one being the StandardAnalyzer.
+ Many applications will have a long and industrious life with nothing more
than the StandardAnalyzer. However, there are a few other classes/packages that are worth mentioning:
-
- {@link org.apache.lucene.analysis.PerFieldAnalyzerWrapper} – Most Analyzers perform the same operation on all
- {@link org.apache.lucene.document.Field}s. The PerFieldAnalyzerWrapper can be used to associate a different Analyzer with different
- {@link org.apache.lucene.document.Field}s.
- The contrib/analyzers library located at the root of the Lucene distribution has a number of different Analyzer implementations to solve a variety
- of different problems related to searching. Many of the Analyzers are designed to analyze non-English languages.
- The contrib/snowball library
- located at the root of the Lucene distribution has Analyzer and TokenFilter
- implementations for a variety of Snowball stemmers.
- See http://snowball.tartarus.org
- for more information on Snowball stemmers.
- There are a variety of Tokenizer and TokenFilter implementations in this package. Take a look around, chances are someone has implemented what you need.
-
+
+
+ PerFieldAnalyzerWrapper – Most Analyzers perform the same operation on all
+ {@link org.apache.lucene.document.Field}s. The PerFieldAnalyzerWrapper can be used to associate a different Analyzer with different
+ {@link org.apache.lucene.document.Field}s.
+
+
+ The analysis library located at the root of the Lucene distribution has a number of different Analyzer implementations to solve a variety
+ of different problems related to searching. Many of the Analyzers are designed to analyze non-English languages.
+
+
+ There are a variety of Tokenizer and TokenFilter implementations in this package. Take a look around, chances are someone has implemented what you need.
+
+
Analysis is one of the main causes of performance degradation during indexing. Simply put, the more you analyze the slower the indexing (in most cases).
- Perhaps your application would be just fine using the simple {@link org.apache.lucene.analysis.WhitespaceTokenizer} combined with a
- {@link org.apache.lucene.analysis.StopFilter}. The contrib/benchmark library can be useful for testing out the speed of the analysis process.
+ Perhaps your application would be just fine using the simple WhitespaceTokenizer combined with a StopFilter. The benchmark/ library can be useful
+ for testing out the speed of the analysis process.
Invoking the Analyzer
Applications usually do not invoke analysis – Lucene does it for them:
-
- At indexing, as a consequence of
- {@link org.apache.lucene.index.IndexWriter#addDocument(org.apache.lucene.document.Document) addDocument(doc)},
- the Analyzer in effect for indexing is invoked for each indexed field of the added document.
-
- At search, as a consequence of
- {@link org.apache.lucene.queryParser.QueryParser#parse(java.lang.String) QueryParser.parse(queryText)},
- the QueryParser may invoke the Analyzer in effect.
- Note that for some queries analysis does not take place, e.g. wildcard queries.
-
-
+
+
+
+ At indexing, as a consequence of
+ {@link org.apache.lucene.index.IndexWriter#addDocument(Iterable) addDocument(doc)},
+ the Analyzer in effect for indexing is invoked for each indexed field of the added document.
+
+
+ At search, a QueryParser may invoke the Analyzer during parsing. Note that for some queries, analysis does not
+ take place, e.g. wildcard queries.
+
+
+
However an application might invoke Analysis of any text for testing or for any other purpose, something like:
-
- Analyzer analyzer = new StandardAnalyzer(); // or any other analyzer
- TokenStream ts = analyzer.tokenStream("myfield",new StringReader("some text goes here"));
- Token t = ts.next();
- while (t!=null) {
- System.out.println("token: "+t));
- t = ts.next();
- }
-
+
+ Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY
+ Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other analyzer
+ TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some text goes here"));
+ OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
+
+ try {
+ ts.reset(); // Resets this stream to the beginning. (Required)
+ while (ts.incrementToken()) {
+ // Use {@link org.apache.lucene.util.AttributeSource#reflectAsString(boolean)}
+ // for token stream debugging.
+ System.out.println("token: " + ts.reflectAsString(true));
+
+ System.out.println("token start offset: " + offsetAtt.startOffset());
+ System.out.println(" token end offset: " + offsetAtt.endOffset());
+ }
+ ts.end(); // Perform end-of-stream operations, e.g. set the final offset.
+ } finally {
+ ts.close(); // Release resources associated with this stream.
+ }
+
Indexing Analysis vs. Search Analysis
Selecting the "correct" analyzer is crucial
@@ -170,18 +223,25 @@
Implementing your own Analyzer
-Creating your own Analyzer is straightforward. It usually involves either wrapping an existing Tokenizer and set of TokenFilters to create a new Analyzer
-or creating both the Analyzer and a Tokenizer or TokenFilter. Before pursuing this approach, you may find it worthwhile
-to explore the contrib/analyzers library and/or ask on the java-user@lucene.apache.org mailing list first to see if what you need already exists.
-If you are still committed to creating your own Analyzer or TokenStream derivation (Tokenizer or TokenFilter) have a look at
-the source code of any one of the many samples located in this package.
+
+ Creating your own Analyzer is straightforward. Your Analyzer can wrap
+ existing analysis components — CharFilter(s) (optional) , a
+ Tokenizer, and TokenFilter(s) (optional) — or components you
+ create, or a combination of existing and newly created components. Before
+ pursuing this approach, you may find it worthwhile to explore the
+ analyzers-common library and/or ask on the
+ java-user@lucene.apache.org mailing list first to see if what you
+ need already exists. If you are still committed to creating your own
+ Analyzer, have a look at the source code of any one of the many samples
+ located in this package.
The following sections discuss some aspects of implementing your own analyzer.
-Field Section Boundaries
+Field Section Boundaries
- When {@link org.apache.lucene.document.Document#add(org.apache.lucene.document.Fieldable) document.add(field)}
+ When {@link org.apache.lucene.document.Document#add(org.apache.lucene.index.IndexableField) document.add(field)}
is called multiple times for the same field name, we could say that each such call creates a new
section for that field in that document.
In fact, a separate call to
@@ -191,82 +251,722 @@
This allows phrase search and proximity search to seamlessly cross
boundaries between these "sections".
In other words, if a certain field "f" is added like this:
-
- document.add(new Field("f","first ends",...);
- document.add(new Field("f","starts two",...);
- indexWriter.addDocument(document);
-
+
+
+ document.add(new Field("f","first ends",...);
+ document.add(new Field("f","starts two",...);
+ indexWriter.addDocument(document);
+
+
Then, a phrase search for "ends starts" would find that document.
Where desired, this behavior can be modified by introducing a "position gap" between consecutive field "sections",
simply by overriding
{@link org.apache.lucene.analysis.Analyzer#getPositionIncrementGap(java.lang.String) Analyzer.getPositionIncrementGap(fieldName)}:
-
- Analyzer myAnalyzer = new StandardAnalyzer() {
- public int getPositionIncrementGap(String fieldName) {
- return 10;
- }
- };
-
-Token Position Increments
+
+ Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY
+ Analyzer myAnalyzer = new StandardAnalyzer(matchVersion) {
+ public int getPositionIncrementGap(String fieldName) {
+ return 10;
+ }
+ };
+
+Token Position Increments
By default, all tokens created by Analyzers and Tokenizers have a
- {@link org.apache.lucene.analysis.Token#getPositionIncrement() position increment} of one.
+ {@link org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute#getPositionIncrement() position increment} of one.
This means that the position stored for that token in the index would be one more than
that of the previous token.
Recall that phrase and proximity searches rely on position info.
If the selected analyzer filters the stop words "is" and "the", then for a document
containing the string "blue is the sky", only the tokens "blue", "sky" are indexed,
- with position("sky") = 1 + position("blue"). Now, a phrase query "blue is the sky"
+ with position("sky") = 3 + position("blue"). Now, a phrase query "blue is the sky"
would find that document, because the same analyzer filters the same stop words from
- that query. But also the phrase query "blue sky" would find that document.
+ that query. But the phrase query "blue sky" would not find that document because the
+ position increment between "blue" and "sky" is only 1.
- If this behavior does not fit the application needs,
- a modified analyzer can be used, that would increment further the positions of
- tokens following a removed stop word, using
- {@link org.apache.lucene.analysis.Token#setPositionIncrement(int)}.
- This can be done with something like:
-
- public TokenStream tokenStream(final String fieldName, Reader reader) {
- final TokenStream ts = someAnalyzer.tokenStream(fieldName, reader);
- TokenStream res = new TokenStream() {
- public Token next() throws IOException {
- int extraIncrement = 0;
- while (true) {
- Token t = ts.next();
- if (t!=null) {
- if (stopWords.contains(t.termText())) {
- extraIncrement++; // filter this word
- continue;
- }
- if (extraIncrement>0) {
- t.setPositionIncrement(t.getPositionIncrement()+extraIncrement);
- }
- }
- return t;
+ If this behavior does not fit the application needs, the query parser needs to be
+ configured to not take position increments into account when generating phrase queries.
+
+
+ Note that a StopFilter MUST increment the position increment in order not to generate corrupt
+ tokenstream graphs. Here is the logic used by StopFilter to increment positions when filtering out tokens:
+
+
+ public TokenStream tokenStream(final String fieldName, Reader reader) {
+ final TokenStream ts = someAnalyzer.tokenStream(fieldName, reader);
+ TokenStream res = new TokenStream() {
+ CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
+ PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);
+
+ public boolean incrementToken() throws IOException {
+ int extraIncrement = 0;
+ while (true) {
+ boolean hasNext = ts.incrementToken();
+ if (hasNext) {
+ if (stopWords.contains(termAtt.toString())) {
+ extraIncrement += posIncrAtt.getPositionIncrement(); // filter this word
+ continue;
+ }
+ if (extraIncrement>0) {
+ posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement()+extraIncrement);
}
}
- };
- return res;
+ return hasNext;
+ }
}
-
- Now, with this modified analyzer, the phrase query "blue sky" would find that document.
- But note that this is yet not a perfect solution, because any phrase query "blue w1 w2 sky"
- where both w1 and w2 are stop words would match that document.
+ };
+ return res;
+ }
+
+
+ A few more use cases for modifying position increments are:
+
+ Inhibiting phrase and proximity matches in sentence boundaries – for this, a tokenizer that
+ identifies a new sentence can add 1 to the position increment of the first token of the new sentence.
+ Injecting synonyms – here, synonyms of a token should be added after that token,
+ and their position increment should be set to 0.
+ As result, all synonyms of a token would be considered to appear in exactly the
+ same position as that token, and so would they be seen by phrase and proximity searches.
+
+
+Token Position Length
- Few more use cases for modifying position increments are:
-
- Inhibiting phrase and proximity matches in sentence boundaries – for this, a tokenizer that
- identifies a new sentence can add 1 to the position increment of the first token of the new sentence.
- Injecting synonyms – here, synonyms of a token should be added after that token,
- and their position increment should be set to 0.
- As result, all synonyms of a token would be considered to appear in exactly the
- same position as that token, and so would they be seen by phrase and proximity searches.
-
+ By default, all tokens created by Analyzers and Tokenizers have a
+ {@link org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#getPositionLength() position length} of one.
+ This means that the token occupies a single position. This attribute is not indexed
+ and thus not taken into account for positional queries, but is used by eg. suggesters.
+
+ The main use case for positions lengths is multi-word synonyms. With single-word
+ synonyms, setting the position increment to 0 is enough to denote the fact that two
+ words are synonyms, for example:
+
+
+Term red magenta
+Position increment 1 0
+
+
+ Given that position(magenta) = 0 + position(red), they are at the same position, so anything
+ working with analyzers will return the exact same result if you replace "magenta" with "red"
+ in the input. However, multi-word synonyms are more tricky. Let's say that you want to build
+ a TokenStream where "IBM" is a synonym of "Internal Business Machines". Position increments
+ are not enough anymore:
+
+
+Term IBM International Business Machines
+Position increment 1 0 1 1
+
+
+ The problem with this token stream is that "IBM" is at the same position as "International"
+ although it is a synonym with "International Business Machines" as a whole. Setting
+ the position increment of "Business" and "Machines" to 0 wouldn't help as it would mean
+ than "International" is a synonym of "Business". The only way to solve this issue is to
+ make "IBM" span across 3 positions, this is where position lengths come to rescue.
+
+
+Term IBM International Business Machines
+Position increment 1 0 1 1
+Position length 3 1 1 1
+
+
+ This new attribute makes clear that "IBM" and "International Business Machines" start and end
+ at the same positions.
+
+
+How to not write corrupt token streams
+
+ There are a few rules to observe when writing custom Tokenizers and TokenFilters:
+
+
+ The first position increment must be > 0.
+ Positions must not go backward.
+ Tokens that have the same start position must have the same start offset.
+ Tokens that have the same end position (taking into account the
+ position length) must have the same end offset.
+ Tokenizers must call {@link
+ org.apache.lucene.util.AttributeSource#clearAttributes()} in
+ incrementToken().
+ Tokenizers must override {@link
+ org.apache.lucene.analysis.TokenStream#end()}, and pass the final
+ offset (the total number of input characters processed) to both
+ parameters of {@link org.apache.lucene.analysis.tokenattributes.OffsetAttribute#setOffset(int, int)}.
+
+
+ Although these rules might seem easy to follow, problems can quickly happen when chaining
+ badly implemented filters that play with positions and offsets, such as synonym or n-grams
+ filters. Here are good practices for writing correct filters:
+
+
+ Token filters should not modify offsets. If you feel that your filter would need to modify offsets, then it should probably be implemented as a tokenizer.
+ Token filters should not insert positions. If a filter needs to add tokens, then they should all have a position increment of 0.
+ When they add tokens, token filters should call {@link org.apache.lucene.util.AttributeSource#clearAttributes()} first.
+ When they remove tokens, token filters should increment the position increment of the following token.
+ Token filters should preserve position lengths.
+
+TokenStream API
+
+ "Flexible Indexing" summarizes the effort of making the Lucene indexer
+ pluggable and extensible for custom index formats. A fully customizable
+ indexer means that users will be able to store custom data structures on
+ disk. Therefore an API is necessary that can transport custom types of
+ data from the documents to the indexer.
+
+Attribute and AttributeSource
+
+ Classes {@link org.apache.lucene.util.Attribute} and
+ {@link org.apache.lucene.util.AttributeSource} serve as the basis upon which
+ the analysis elements of "Flexible Indexing" are implemented. An Attribute
+ holds a particular piece of information about a text token. For example,
+ {@link org.apache.lucene.analysis.tokenattributes.CharTermAttribute}
+ contains the term text of a token, and
+ {@link org.apache.lucene.analysis.tokenattributes.OffsetAttribute} contains
+ the start and end character offsets of a token. An AttributeSource is a
+ collection of Attributes with a restriction: there may be only one instance
+ of each attribute type. TokenStream now extends AttributeSource, which means
+ that one can add Attributes to a TokenStream. Since TokenFilter extends
+ TokenStream, all filters are also AttributeSources.
+
+
+ Lucene provides seven Attributes out of the box:
+
+
+
+ {@link org.apache.lucene.analysis.tokenattributes.CharTermAttribute}
+
+ The term text of a token. Implements {@link java.lang.CharSequence}
+ (providing methods length() and charAt(), and allowing e.g. for direct
+ use with regular expression {@link java.util.regex.Matcher}s) and
+ {@link java.lang.Appendable} (allowing the term text to be appended to.)
+
+
+
+ {@link org.apache.lucene.analysis.tokenattributes.OffsetAttribute}
+ The start and end offset of a token in characters.
+
+
+ {@link org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute}
+ See above for detailed information about position increment.
+
+
+ {@link org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute}
+ The number of positions occupied by a token.
+
+
+ {@link org.apache.lucene.analysis.tokenattributes.PayloadAttribute}
+ The payload that a Token can optionally have.
+
+
+ {@link org.apache.lucene.analysis.tokenattributes.TypeAttribute}
+ The type of the token. Default is 'word'.
+
+
+ {@link org.apache.lucene.analysis.tokenattributes.FlagsAttribute}
+ Optional flags a token can have.
+
+
+ {@link org.apache.lucene.analysis.tokenattributes.KeywordAttribute}
+
+ Keyword-aware TokenStreams/-Filters skip modification of tokens that
+ return true from this attribute's isKeyword() method.
+
+
+
+More Requirements for Analysis Component Classes
+Due to the historical development of the API, there are some perhaps
+less than obvious requirements to implement analysis components
+classes.
+Token Stream Lifetime
+The code fragment of the analysis workflow
+protocol above shows a token stream being obtained, used, and then
+left for garbage. However, that does not mean that the components of
+that token stream will, in fact, be discarded. The default is just the
+opposite. {@link org.apache.lucene.analysis.Analyzer} applies a reuse
+strategy to the tokenizer and the token filters. It will reuse
+them. For each new input, it calls {@link org.apache.lucene.analysis.Tokenizer#setReader(java.io.Reader)}
+to set the input. Your components must be prepared for this scenario,
+as described below.
+Tokenizer
+
+
+ You should create your tokenizer class by extending {@link org.apache.lucene.analysis.Tokenizer}.
+
+
+ Your tokenizer must never make direct use of the
+ {@link java.io.Reader} supplied to its constructor(s). (A future
+ release of Apache Lucene may remove the reader parameters from the
+ Tokenizer constructors.)
+ {@link org.apache.lucene.analysis.Tokenizer} wraps the reader in an
+ object that helps enforce that applications comply with the analysis workflow . Thus, your class
+ should only reference the input via the protected 'input' field
+ of Tokenizer.
+
+
+ Your tokenizer must override {@link org.apache.lucene.analysis.TokenStream#end()}.
+ Your implementation must call
+ super.end()
. It must set a correct final offset into
+ the offset attribute, and finish up and other attributes to reflect
+ the end of the stream.
+
+
+ If your tokenizer overrides {@link org.apache.lucene.analysis.TokenStream#reset()}
+ or {@link org.apache.lucene.analysis.TokenStream#close()}, it
+ must call the corresponding superclass method.
+
+
+Token Filter
+ You should create your token filter class by extending {@link org.apache.lucene.analysis.TokenFilter}.
+ If your token filter overrides {@link org.apache.lucene.analysis.TokenStream#reset()},
+ {@link org.apache.lucene.analysis.TokenStream#end()}
+ or {@link org.apache.lucene.analysis.TokenStream#close()}, it
+ must call the corresponding superclass method.
+Creating delegates
+ Forwarding classes (those which extend {@link org.apache.lucene.analysis.Tokenizer} but delegate
+ selected logic to another tokenizer) must also set the reader to the delegate in the overridden
+ {@link org.apache.lucene.analysis.Tokenizer#reset()} method, e.g.:
+
+ public class ForwardingTokenizer extends Tokenizer {
+ private Tokenizer delegate;
+ ...
+ {@literal @Override}
+ public void reset() {
+ super.reset();
+ delegate.setReader(this.input);
+ delegate.reset();
+ }
+ }
+
+Testing Your Analysis Component
+
+ The lucene-test-framework component defines
+ BaseTokenStreamTestCase . By extending
+ this class, you can create JUnit tests that validate that your
+ Analyzer and/or analysis components correctly implement the
+ protocol. The checkRandomData methods of that class are particularly effective in flushing out errors.
+
+Using the TokenStream API
+There are a few important things to know in order to use the new API efficiently which are summarized here. You may want
+to walk through the example below first and come back to this section afterwards.
+
+Please keep in mind that an AttributeSource can only have one instance of a particular Attribute. Furthermore, if
+a chain of a TokenStream and multiple TokenFilters is used, then all TokenFilters in that chain share the Attributes
+with the TokenStream.
+
+
+
+Attribute instances are reused for all tokens of a document. Thus, a TokenStream/-Filter needs to update
+the appropriate Attribute(s) in incrementToken(). The consumer, commonly the Lucene indexer, consumes the data in the
+Attributes and then calls incrementToken() again until it returns false, which indicates that the end of the stream
+was reached. This means that in each call of incrementToken() a TokenStream/-Filter can safely overwrite the data in
+the Attribute instances.
+
+
+
+For performance reasons a TokenStream/-Filter should add/get Attributes during instantiation; i.e., create an attribute in the
+constructor and store references to it in an instance variable. Using an instance variable instead of calling addAttribute()/getAttribute()
+in incrementToken() will avoid attribute lookups for every token in the document.
+
+
+
+All methods in AttributeSource are idempotent, which means calling them multiple times always yields the same
+result. This is especially important to know for addAttribute(). The method takes the type (Class
)
+of an Attribute as an argument and returns an instance . If an Attribute of the same type was previously added, then
+the already existing instance is returned, otherwise a new instance is created and returned. Therefore TokenStreams/-Filters
+can safely call addAttribute() with the same Attribute type multiple times. Even consumers of TokenStreams should
+normally call addAttribute() instead of getAttribute(), because it would not fail if the TokenStream does not have this
+Attribute (getAttribute() would throw an IllegalArgumentException, if the Attribute is missing). More advanced code
+could simply check with hasAttribute(), if a TokenStream has it, and may conditionally leave out processing for
+extra performance.
+
+Example
+
+ In this example we will create a WhiteSpaceTokenizer and use a LengthFilter to suppress all words that have
+ only two or fewer characters. The LengthFilter is part of the Lucene core and its implementation will be explained
+ here to illustrate the usage of the TokenStream API.
+
+
+ Then we will develop a custom Attribute, a PartOfSpeechAttribute, and add another filter to the chain which
+ utilizes the new custom attribute, and call it PartOfSpeechTaggingFilter.
+
+Whitespace tokenization
+
+public class MyAnalyzer extends Analyzer {
+
+ private Version matchVersion;
+
+ public MyAnalyzer(Version matchVersion) {
+ this.matchVersion = matchVersion;
+ }
+
+ {@literal @Override}
+ protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
+ return new TokenStreamComponents(new WhitespaceTokenizer(matchVersion, reader));
+ }
+
+ public static void main(String[] args) throws IOException {
+ // text to tokenize
+ final String text = "This is a demo of the TokenStream API";
+
+ Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY
+ MyAnalyzer analyzer = new MyAnalyzer(matchVersion);
+ TokenStream stream = analyzer.tokenStream("field", new StringReader(text));
+
+ // get the CharTermAttribute from the TokenStream
+ CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
+
+ try {
+ stream.reset();
+
+ // print all tokens until stream is exhausted
+ while (stream.incrementToken()) {
+ System.out.println(termAtt.toString());
+ }
+
+ stream.end();
+ } finally {
+ stream.close();
+ }
+ }
+}
+
+In this easy example a simple white space tokenization is performed. In main() a loop consumes the stream and
+prints the term text of the tokens by accessing the CharTermAttribute that the WhitespaceTokenizer provides.
+Here is the output:
+
+This
+is
+a
+demo
+of
+the
+new
+TokenStream
+API
+
+Adding a LengthFilter
+We want to suppress all tokens that have 2 or less characters. We can do that
+easily by adding a LengthFilter to the chain. Only the
+createComponents()
method in our analyzer needs to be changed:
+
+ {@literal @Override}
+ protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
+ final Tokenizer source = new WhitespaceTokenizer(matchVersion, reader);
+ TokenStream result = new LengthFilter(true, source, 3, Integer.MAX_VALUE);
+ return new TokenStreamComponents(source, result);
+ }
+
+Note how now only words with 3 or more characters are contained in the output:
+
+This
+demo
+the
+new
+TokenStream
+API
+
+Now let's take a look how the LengthFilter is implemented:
+
+public final class LengthFilter extends FilteringTokenFilter {
+
+ private final int min;
+ private final int max;
+
+ private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
+
+ /**
+ * Create a new LengthFilter. This will filter out tokens whose
+ * CharTermAttribute is either too short
+ * (< min) or too long (> max).
+ * @param version the Lucene match version
+ * @param in the TokenStream to consume
+ * @param min the minimum length
+ * @param max the maximum length
+ */
+ public LengthFilter(Version version, TokenStream in, int min, int max) {
+ super(version, in);
+ this.min = min;
+ this.max = max;
+ }
+
+ {@literal @Override}
+ public boolean accept() {
+ final int len = termAtt.length();
+ return (len >= min && len <= max);
+ }
+
+}
+
+
+ In LengthFilter, the CharTermAttribute is added and stored in the instance
+ variable termAtt
. Remember that there can only be a single
+ instance of CharTermAttribute in the chain, so in our example the
+ addAttribute()
call in LengthFilter returns the
+ CharTermAttribute that the WhitespaceTokenizer already added.
+
+
+ The tokens are retrieved from the input stream in FilteringTokenFilter's
+ incrementToken()
method (see below), which calls LengthFilter's
+ accept()
method. By looking at the term text in the
+ CharTermAttribute, the length of the term can be determined and tokens that
+ are either too short or too long are skipped. Note how
+ accept()
can efficiently access the instance variable; no
+ attribute lookup is necessary. The same is true for the consumer, which can
+ simply use local references to the Attributes.
+
+
+ LengthFilter extends FilteringTokenFilter:
+
+
+
+public abstract class FilteringTokenFilter extends TokenFilter {
+
+ private final PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);
+
+ /**
+ * Create a new FilteringTokenFilter.
+ * @param in the TokenStream to consume
+ */
+ public FilteringTokenFilter(Version version, TokenStream in) {
+ super(in);
+ }
+
+ /** Override this method and return if the current input token should be returned by incrementToken. */
+ protected abstract boolean accept() throws IOException;
+
+ {@literal @Override}
+ public final boolean incrementToken() throws IOException {
+ int skippedPositions = 0;
+ while (input.incrementToken()) {
+ if (accept()) {
+ if (skippedPositions != 0) {
+ posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions);
+ }
+ return true;
+ }
+ skippedPositions += posIncrAtt.getPositionIncrement();
+ }
+ // reached EOS -- return false
+ return false;
+ }
+
+ {@literal @Override}
+ public void reset() throws IOException {
+ super.reset();
+ }
+
+}
+
+
+Adding a custom Attribute
+Now we're going to implement our own custom Attribute for part-of-speech tagging and call it consequently
+PartOfSpeechAttribute
. First we need to define the interface of the new Attribute:
+
+ public interface PartOfSpeechAttribute extends Attribute {
+ public static enum PartOfSpeech {
+ Noun, Verb, Adjective, Adverb, Pronoun, Preposition, Conjunction, Article, Unknown
+ }
+
+ public void setPartOfSpeech(PartOfSpeech pos);
+
+ public PartOfSpeech getPartOfSpeech();
+ }
+
+
+ Now we also need to write the implementing class. The name of that class is important here: By default, Lucene
+ checks if there is a class with the name of the Attribute with the suffix 'Impl'. In this example, we would
+ consequently call the implementing class PartOfSpeechAttributeImpl
.
+
+
+ This should be the usual behavior. However, there is also an expert-API that allows changing these naming conventions:
+ {@link org.apache.lucene.util.AttributeFactory}. The factory accepts an Attribute interface as argument
+ and returns an actual instance. You can implement your own factory if you need to change the default behavior.
+
+
+ Now here is the actual class that implements our new Attribute. Notice that the class has to extend
+ {@link org.apache.lucene.util.AttributeImpl}:
+
+
+public final class PartOfSpeechAttributeImpl extends AttributeImpl
+ implements PartOfSpeechAttribute {
+
+ private PartOfSpeech pos = PartOfSpeech.Unknown;
+
+ public void setPartOfSpeech(PartOfSpeech pos) {
+ this.pos = pos;
+ }
+
+ public PartOfSpeech getPartOfSpeech() {
+ return pos;
+ }
+
+ {@literal @Override}
+ public void clear() {
+ pos = PartOfSpeech.Unknown;
+ }
+
+ {@literal @Override}
+ public void copyTo(AttributeImpl target) {
+ ((PartOfSpeechAttribute) target).setPartOfSpeech(pos);
+ }
+}
+
+
+ This is a simple Attribute implementation has only a single variable that
+ stores the part-of-speech of a token. It extends the
+ AttributeImpl
class and therefore implements its abstract methods
+ clear()
and copyTo()
. Now we need a TokenFilter that
+ can set this new PartOfSpeechAttribute for each token. In this example we
+ show a very naive filter that tags every word with a leading upper-case letter
+ as a 'Noun' and all other words as 'Unknown'.
+
+
+ public static class PartOfSpeechTaggingFilter extends TokenFilter {
+ PartOfSpeechAttribute posAtt = addAttribute(PartOfSpeechAttribute.class);
+ CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
+
+ protected PartOfSpeechTaggingFilter(TokenStream input) {
+ super(input);
+ }
+
+ public boolean incrementToken() throws IOException {
+ if (!input.incrementToken()) {return false;}
+ posAtt.setPartOfSpeech(determinePOS(termAtt.buffer(), 0, termAtt.length()));
+ return true;
+ }
+
+ // determine the part of speech for the given term
+ protected PartOfSpeech determinePOS(char[] term, int offset, int length) {
+ // naive implementation that tags every uppercased word as noun
+ if (length > 0 && Character.isUpperCase(term[0])) {
+ return PartOfSpeech.Noun;
+ }
+ return PartOfSpeech.Unknown;
+ }
+ }
+
+
+ Just like the LengthFilter, this new filter stores references to the
+ attributes it needs in instance variables. Notice how you only need to pass
+ in the interface of the new Attribute and instantiating the correct class
+ is automatically taken care of.
+
+Now we need to add the filter to the chain in MyAnalyzer:
+
+ {@literal @Override}
+ protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
+ final Tokenizer source = new WhitespaceTokenizer(matchVersion, reader);
+ TokenStream result = new LengthFilter(true, source, 3, Integer.MAX_VALUE);
+ result = new PartOfSpeechTaggingFilter(result);
+ return new TokenStreamComponents(source, result);
+ }
+
+Now let's look at the output:
+
+This
+demo
+the
+new
+TokenStream
+API
+
+Apparently it hasn't changed, which shows that adding a custom attribute to a TokenStream/Filter chain does not
+affect any existing consumers, simply because they don't know the new Attribute. Now let's change the consumer
+to make use of the new PartOfSpeechAttribute and print it out:
+
+ public static void main(String[] args) throws IOException {
+ // text to tokenize
+ final String text = "This is a demo of the TokenStream API";
+
+ MyAnalyzer analyzer = new MyAnalyzer();
+ TokenStream stream = analyzer.tokenStream("field", new StringReader(text));
+
+ // get the CharTermAttribute from the TokenStream
+ CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
+
+ // get the PartOfSpeechAttribute from the TokenStream
+ PartOfSpeechAttribute posAtt = stream.addAttribute(PartOfSpeechAttribute.class);
+
+ try {
+ stream.reset();
+
+ // print all tokens until stream is exhausted
+ while (stream.incrementToken()) {
+ System.out.println(termAtt.toString() + ": " + posAtt.getPartOfSpeech());
+ }
+
+ stream.end();
+ } finally {
+ stream.close();
+ }
+ }
+
+The change that was made is to get the PartOfSpeechAttribute from the TokenStream and print out its contents in
+the while loop that consumes the stream. Here is the new output:
+
+This: Noun
+demo: Unknown
+the: Unknown
+new: Unknown
+TokenStream: Noun
+API: Noun
+
+Each word is now followed by its assigned PartOfSpeech tag. Of course this is a naive
+part-of-speech tagging. The word 'This' should not even be tagged as noun; it is only spelled capitalized because it
+is the first word of a sentence. Actually this is a good opportunity for an exercise. To practice the usage of the new
+API the reader could now write an Attribute and TokenFilter that can specify for each word if it was the first token
+of a sentence or not. Then the PartOfSpeechTaggingFilter can make use of this knowledge and only tag capitalized words
+as nouns if not the first word of a sentence (we know, this is still not a correct behavior, but hey, it's a good exercise).
+As a small hint, this is how the new Attribute class could begin:
+
+ public class FirstTokenOfSentenceAttributeImpl extends AttributeImpl
+ implements FirstTokenOfSentenceAttribute {
+
+ private boolean firstToken;
+
+ public void setFirstToken(boolean firstToken) {
+ this.firstToken = firstToken;
+ }
+
+ public boolean getFirstToken() {
+ return firstToken;
+ }
+
+ {@literal @Override}
+ public void clear() {
+ firstToken = false;
+ }
+
+ ...
+
+Adding a CharFilter chain
+Analyzers take Java {@link java.io.Reader}s as input. Of course you can wrap your Readers with {@link java.io.FilterReader}s
+to manipulate content, but this would have the big disadvantage that character offsets might be inconsistent with your original
+text.
+
+{@link org.apache.lucene.analysis.CharFilter} is designed to allow you to pre-process input like a FilterReader would, but also
+preserve the original offsets associated with those characters. This way mechanisms like highlighting still work correctly.
+CharFilters can be chained.
+
+Example:
+
+public class MyAnalyzer extends Analyzer {
+
+ {@literal @Override}
+ protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
+ return new TokenStreamComponents(new MyTokenizer(reader));
+ }
+
+ {@literal @Override}
+ protected Reader initReader(String fieldName, Reader reader) {
+ // wrap the Reader in a CharFilter chain.
+ return new SecondCharFilter(new FirstCharFilter(reader));
+ }
+}
+
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/ArabicAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/ArabicLetterTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/ArabicLetterTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/ArabicNormalizationFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/ArabicNormalizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/ArabicStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/ArabicStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/ArabicStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ar/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/bg/BulgarianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/bg/BulgarianStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/bg/BulgarianStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/bg/BulgarianStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/bg/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/br/BrazilianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/br/BrazilianStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/br/BrazilianStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/br/BrazilianStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/br/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ca/CatalanAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ca/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/BaseCharFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/HTMLCharacterEntities.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/HTMLStripCharFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/MappingCharFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/MappingCharFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/NormalizeCharMap.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/htmlentity.py'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/charfilter/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cjk/CJKAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cjk/CJKBigramFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cjk/CJKBigramFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cjk/CJKTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cjk/CJKTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cjk/CJKWidthFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cjk/CJKWidthFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cjk/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ckb/SoraniAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ckb/SoraniNormalizationFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ckb/SoraniNormalizationFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ckb/SoraniNormalizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ckb/SoraniStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ckb/SoraniStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ckb/SoraniStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ckb/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cn/ChineseAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cn/ChineseFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cn/ChineseFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cn/ChineseTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cn/ChineseTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cn/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/commongrams/CommonGramsFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/commongrams/CommonGramsFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/commongrams/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/CompoundWordTokenFilterBase.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/Lucene43CompoundWordTokenFilterBase.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/Lucene43DictionaryCompoundWordTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/Lucene43HyphenationCompoundWordTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/hyphenation/ByteVector.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/hyphenation/CharVector.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/hyphenation/Hyphen.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/hyphenation/Hyphenation.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/hyphenation/HyphenationTree.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/hyphenation/PatternConsumer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/hyphenation/PatternParser.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/hyphenation/TernaryTree.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/compound/hyphenation/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/KeywordAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/KeywordTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/KeywordTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/LetterTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/LetterTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/LowerCaseFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/LowerCaseFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/LowerCaseTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/LowerCaseTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/SimpleAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/StopAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/StopFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/StopFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/TypeTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/TypeTokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/UpperCaseFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/UpperCaseFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/WhitespaceAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/WhitespaceTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/WhitespaceTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/core/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cz/CzechAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cz/CzechStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cz/CzechStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cz/CzechStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/cz/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/da/DanishAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/da/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanMinimalStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanMinimalStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanMinimalStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanNormalizationFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanNormalizationFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/GermanStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/de/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/el/GreekAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/el/GreekLowerCaseFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/el/GreekLowerCaseFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/el/GreekStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/el/GreekStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/el/GreekStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/el/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/EnglishAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/EnglishMinimalStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/EnglishMinimalStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/EnglishPossessiveFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/EnglishPossessiveFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemData1.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemData2.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemData3.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemData4.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemData5.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemData6.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemData7.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemData8.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/KStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/PorterStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/PorterStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/PorterStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/en/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/es/SpanishAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/es/SpanishLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/es/SpanishLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/es/SpanishLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/es/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/eu/BasqueAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/eu/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fa/PersianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fa/PersianCharFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fa/PersianCharFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fa/PersianNormalizationFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fa/PersianNormalizationFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fa/PersianNormalizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fa/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fi/FinnishAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fi/FinnishLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fi/FinnishLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fi/FinnishLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fi/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/FrenchAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/FrenchLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/FrenchLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/FrenchLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/FrenchMinimalStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/FrenchMinimalStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/FrenchStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/FrenchStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/fr/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ga/IrishAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ga/IrishLowerCaseFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ga/IrishLowerCaseFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ga/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/gl/GalicianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/gl/GalicianMinimalStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/gl/GalicianMinimalStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/gl/GalicianMinimalStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/gl/GalicianStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/gl/GalicianStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/gl/GalicianStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/gl/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hi/HindiAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hi/HindiNormalizationFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hi/HindiNormalizationFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hi/HindiNormalizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hi/HindiStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hi/HindiStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hi/HindiStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hi/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hu/HungarianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hu/HungarianLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hu/HungarianLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hu/HungarianLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hu/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hunspell/Dictionary.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hunspell/HunspellStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hunspell/HunspellStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hunspell/ISO8859_14Decoder.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hunspell/Stemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hunspell/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hy/ArmenianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/hy/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/id/IndonesianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/id/IndonesianStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/id/IndonesianStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/id/IndonesianStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/id/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/in/IndicNormalizationFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/in/IndicNormalizationFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/in/IndicNormalizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/in/IndicTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/in/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/it/ItalianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/it/ItalianLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/it/ItalianLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/it/ItalianLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/it/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/lv/LatvianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/lv/LatvianStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/lv/LatvianStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/lv/LatvianStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/lv/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/CapitalizationFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/CapitalizationFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/CodepointCountFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/CodepointCountFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/EmptyTokenStream.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/HyphenatedWordsFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/HyphenatedWordsFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/KeepWordFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/KeywordMarkerFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/KeywordMarkerFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/LengthFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/LengthFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/LimitTokenCountAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/LimitTokenPositionFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/LimitTokenPositionFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/Lucene47WordDelimiterFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/PatternAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/PatternKeywordMarkerFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/PerFieldAnalyzerWrapper.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/PrefixAndSuffixAwareTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/PrefixAwareTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/SetKeywordMarkerFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/SingleTokenTokenStream.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/TrimFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/TrimFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/WordDelimiterIterator.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/miscellaneous/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/EdgeNGramFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/EdgeNGramTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/Lucene43EdgeNGramTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/Lucene43EdgeNGramTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/Lucene43NGramTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/Lucene43NGramTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/NGramFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/NGramTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/NGramTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/NGramTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ngram/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/nl/DutchAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/nl/DutchStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/nl/DutchStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/nl/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/no/NorwegianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/no/NorwegianLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/no/NorwegianLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/no/NorwegianLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/no/NorwegianMinimalStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/no/NorwegianMinimalStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/no/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/path/PathHierarchyTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/path/ReversePathHierarchyTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/path/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pattern/PatternReplaceCharFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pattern/PatternReplaceCharFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pattern/PatternReplaceFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pattern/PatternTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pattern/PatternTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pattern/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/AbstractEncoder.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/DelimitedPayloadTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/DelimitedPayloadTokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/FloatEncoder.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/IdentityEncoder.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/IntegerEncoder.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/NumericPayloadTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/NumericPayloadTokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/PayloadEncoder.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/PayloadHelper.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/TokenOffsetPayloadTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/TokenOffsetPayloadTokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/TypeAsPayloadTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/TypeAsPayloadTokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/payloads/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/position/PositionFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/position/PositionFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/position/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseMinimalStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseMinimalStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseMinimalStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/PortugueseStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/RSLPStemmerBase.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/pt/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/query/QueryAutoStopWordAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/query/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/reverse/ReverseStringFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/reverse/ReverseStringFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/reverse/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ro/RomanianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ro/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ru/RussianAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ru/RussianLetterTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ru/RussianLetterTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ru/RussianLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ru/RussianLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ru/RussianLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/ru/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/shingle/ShingleAnalyzerWrapper.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/shingle/ShingleFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/shingle/ShingleFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/shingle/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sinks/DateRecognizerSinkFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sinks/TokenRangeSinkFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sinks/TokenTypeSinkFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sinks/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/SnowballAnalyzer.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/SnowballAnalyzer.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/SnowballAnalyzer.java 17 Aug 2012 14:55:15 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/SnowballAnalyzer.java 16 Dec 2014 11:32:17 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis.snowball;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -18,42 +18,71 @@
*/
import org.apache.lucene.analysis.*;
+import org.apache.lucene.analysis.core.LowerCaseFilter;
+import org.apache.lucene.analysis.core.StopFilter;
+import org.apache.lucene.analysis.en.EnglishPossessiveFilter;
import org.apache.lucene.analysis.standard.*;
+import org.apache.lucene.analysis.tr.TurkishLowerCaseFilter;
+import org.apache.lucene.analysis.util.CharArraySet;
+import org.apache.lucene.util.Version;
import java.io.Reader;
-import java.util.Set;
/** Filters {@link StandardTokenizer} with {@link StandardFilter}, {@link
* LowerCaseFilter}, {@link StopFilter} and {@link SnowballFilter}.
*
- * Available stemmers are listed in {@link net.sf.snowball.ext}. The name of a
+ * Available stemmers are listed in org.tartarus.snowball.ext. The name of a
* stemmer is the part of the class name before "Stemmer", e.g., the stemmer in
* {@link org.tartarus.snowball.ext.EnglishStemmer} is named "English".
+ *
+ * NOTE : This class uses the same {@link Version}
+ * dependent settings as {@link StandardAnalyzer}, with the following addition:
+ *
+ * As of 3.1, uses {@link TurkishLowerCaseFilter} for Turkish language.
+ *
+ *
+ * @deprecated (3.1) Use the language-specific analyzer in modules/analysis instead.
+ * This analyzer will be removed in Lucene 5.0
*/
-public class SnowballAnalyzer extends Analyzer {
+@Deprecated
+public final class SnowballAnalyzer extends Analyzer {
private String name;
- private Set stopSet;
+ private CharArraySet stopSet;
+ private final Version matchVersion;
/** Builds the named analyzer with no stop words. */
- public SnowballAnalyzer(String name) {
+ public SnowballAnalyzer(Version matchVersion, String name) {
this.name = name;
+ this.matchVersion = matchVersion;
}
/** Builds the named analyzer with the given stop words. */
- public SnowballAnalyzer(String name, String[] stopWords) {
- this(name);
- stopSet = StopFilter.makeStopSet(stopWords);
+ public SnowballAnalyzer(Version matchVersion, String name, CharArraySet stopWords) {
+ this(matchVersion, name);
+ stopSet = CharArraySet.unmodifiableSet(CharArraySet.copy(matchVersion,
+ stopWords));
}
/** Constructs a {@link StandardTokenizer} filtered by a {@link
- StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
- public TokenStream tokenStream(String fieldName, Reader reader) {
- TokenStream result = new StandardTokenizer(reader);
- result = new StandardFilter(result);
- result = new LowerCaseFilter(result);
+ StandardFilter}, a {@link LowerCaseFilter}, a {@link StopFilter},
+ and a {@link SnowballFilter} */
+ @Override
+ public TokenStreamComponents createComponents(String fieldName, Reader reader) {
+ Tokenizer tokenizer = new StandardTokenizer(matchVersion, reader);
+ TokenStream result = new StandardFilter(matchVersion, tokenizer);
+ // remove the possessive 's for english stemmers
+ if (matchVersion.onOrAfter(Version.LUCENE_3_1) &&
+ (name.equals("English") || name.equals("Porter") || name.equals("Lovins")))
+ result = new EnglishPossessiveFilter(result);
+ // Use a special lowercase filter for turkish, the stemmer expects it.
+ if (matchVersion.onOrAfter(Version.LUCENE_3_1) && name.equals("Turkish"))
+ result = new TurkishLowerCaseFilter(result);
+ else
+ result = new LowerCaseFilter(matchVersion, result);
if (stopSet != null)
- result = new StopFilter(result, stopSet);
+ result = new StopFilter(matchVersion,
+ result, stopSet);
result = new SnowballFilter(result, name);
- return result;
+ return new TokenStreamComponents(tokenizer, result);
}
}
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/SnowballFilter.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/SnowballFilter.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/SnowballFilter.java 17 Aug 2012 14:55:14 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/SnowballFilter.java 16 Dec 2014 11:32:17 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis.snowball;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -19,20 +19,44 @@
import java.io.IOException;
-import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.core.LowerCaseFilter;
+import org.apache.lucene.analysis.tokenattributes.KeywordAttribute;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tr.TurkishLowerCaseFilter; // javadoc @link
import org.tartarus.snowball.SnowballProgram;
/**
* A filter that stems words using a Snowball-generated stemmer.
*
* Available stemmers are listed in {@link org.tartarus.snowball.ext}.
+ * NOTE : SnowballFilter expects lowercased text.
+ *
+ * For the Turkish language, see {@link TurkishLowerCaseFilter}.
+ * For other languages, see {@link LowerCaseFilter}.
+ *
+ *
+ *
+ *
+ * Note: This filter is aware of the {@link KeywordAttribute}. To prevent
+ * certain terms from being passed to the stemmer
+ * {@link KeywordAttribute#isKeyword()} should be set to true
+ * in a previous {@link TokenStream}.
+ *
+ * Note: For including the original term as well as the stemmed version, see
+ * {@link org.apache.lucene.analysis.miscellaneous.KeywordRepeatFilterFactory}
+ *
+ *
+ *
*/
-public class SnowballFilter extends TokenFilter {
+public final class SnowballFilter extends TokenFilter {
- private SnowballProgram stemmer;
+ private final SnowballProgram stemmer;
+ private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
+ private final KeywordAttribute keywordAttr = addAttribute(KeywordAttribute.class);
+
public SnowballFilter(TokenStream input, SnowballProgram stemmer) {
super(input);
this.stemmer = stemmer;
@@ -50,27 +74,36 @@
*/
public SnowballFilter(TokenStream in, String name) {
super(in);
- try {
- Class stemClass = Class.forName("org.tartarus.snowball.ext." + name + "Stemmer");
- stemmer = (SnowballProgram) stemClass.newInstance();
+ //Class.forName is frowned upon in place of the ResourceLoader but in this case,
+ // the factory will use the other constructor so that the program is already loaded.
+ try {
+ Class extends SnowballProgram> stemClass =
+ Class.forName("org.tartarus.snowball.ext." + name + "Stemmer").asSubclass(SnowballProgram.class);
+ stemmer = stemClass.newInstance();
} catch (Exception e) {
- throw new RuntimeException(e.toString());
+ throw new IllegalArgumentException("Invalid stemmer class specified: " + name, e);
}
}
/** Returns the next input Token, after being stemmed */
- public final Token next(final Token reusableToken) throws IOException {
- assert reusableToken != null;
- Token nextToken = input.next(reusableToken);
- if (nextToken == null)
- return null;
- String originalTerm = nextToken.term();
- stemmer.setCurrent(originalTerm);
- stemmer.stem();
- String finalTerm = stemmer.getCurrent();
- // Don't bother updating, if it is unchanged.
- if (!originalTerm.equals(finalTerm))
- nextToken.setTermBuffer(finalTerm);
- return nextToken;
+ @Override
+ public final boolean incrementToken() throws IOException {
+ if (input.incrementToken()) {
+ if (!keywordAttr.isKeyword()) {
+ char termBuffer[] = termAtt.buffer();
+ final int length = termAtt.length();
+ stemmer.setCurrent(termBuffer, length);
+ stemmer.stem();
+ final char finalTerm[] = stemmer.getCurrentBuffer();
+ final int newLength = stemmer.getCurrentBufferLength();
+ if (finalTerm != termBuffer)
+ termAtt.copyBuffer(finalTerm, 0, newLength);
+ else
+ termAtt.setLength(newLength);
+ }
+ return true;
+ } else {
+ return false;
+ }
}
}
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/SnowballPorterFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/package.html
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/package.html,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/package.html 17 Aug 2012 14:55:14 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/snowball/package.html 16 Dec 2014 11:32:17 -0000 1.1.2.1
@@ -1,7 +1,53 @@
+
+
{@link org.apache.lucene.analysis.TokenFilter} and {@link
org.apache.lucene.analysis.Analyzer} implementations that use Snowball
stemmers.
+
+This project provides pre-compiled version of the Snowball stemmers
+based on revision 500 of the Tartarus Snowball repository,
+together with classes integrating them with the Lucene search engine.
+
+
+A few changes has been made to the static Snowball code and compiled stemmers:
+
+
+ Class SnowballProgram is made abstract and contains new abstract method stem() to avoid reflection in Lucene filter class SnowballFilter.
+ All use of StringBuffers has been refactored to StringBuilder for speed.
+ Snowball BSD license header has been added to the Java classes to avoid having RAT adding ASL headers.
+
+
+See the Snowball home page for more information about the algorithms.
+
+
+
+IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY!
+
+
+An index created using the Snowball module in Lucene 2.3.2 and below
+might not be compatible with the Snowball module in Lucene 2.4 or greater.
+
+
+For more information about this issue see:
+https://issues.apache.org/jira/browse/LUCENE-1142
+
+
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/ASCIITLD.jflex-macro'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/ClassicAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/ClassicFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/ClassicFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/ClassicTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/ClassicTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/READ_BEFORE_REGENERATING.txt'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardAnalyzer.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardAnalyzer.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardAnalyzer.java 17 Aug 2012 14:55:14 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardAnalyzer.java 16 Dec 2014 11:32:10 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis.standard;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,198 +17,96 @@
* limitations under the License.
*/
-import org.apache.lucene.analysis.*;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.core.LowerCaseFilter;
+import org.apache.lucene.analysis.core.StopAnalyzer;
+import org.apache.lucene.analysis.core.StopFilter;
+import org.apache.lucene.analysis.util.CharArraySet;
+import org.apache.lucene.analysis.util.StopwordAnalyzerBase;
+import org.apache.lucene.analysis.util.WordlistLoader;
+import org.apache.lucene.util.Version;
-import java.io.File;
import java.io.IOException;
import java.io.Reader;
-import java.util.Set;
/**
* Filters {@link StandardTokenizer} with {@link StandardFilter}, {@link
- * LowerCaseFilter} and {@link StopFilter}, using a list of English stop words.
+ * LowerCaseFilter} and {@link StopFilter}, using a list of
+ * English stop words.
*
- * @version $Id$
+ *
+ * You may specify the {@link Version}
+ * compatibility when creating StandardAnalyzer:
+ *
+ * As of 3.4, Hiragana and Han characters are no longer wrongly split
+ * from their combining characters. If you use a previous version number,
+ * you get the exact broken behavior for backwards compatibility.
+ * As of 3.1, StandardTokenizer implements Unicode text segmentation,
+ * and StopFilter correctly handles Unicode 4.0 supplementary characters
+ * in stopwords. {@link ClassicTokenizer} and {@link ClassicAnalyzer}
+ * are the pre-3.1 implementations of StandardTokenizer and
+ * StandardAnalyzer.
+ *
*/
-public class StandardAnalyzer extends Analyzer {
- private Set stopSet;
+public final class StandardAnalyzer extends StopwordAnalyzerBase {
+
+ /** Default maximum allowed token length */
+ public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
- /**
- * Specifies whether deprecated acronyms should be replaced with HOST type.
- * This is false by default to support backward compatibility.
- *
- * @deprecated this should be removed in the next release (3.0).
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- */
- private boolean replaceInvalidAcronym = defaultReplaceInvalidAcronym;
+ private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
- private static boolean defaultReplaceInvalidAcronym;
+ /** An unmodifiable set containing some common English words that are usually not
+ useful for searching. */
+ public static final CharArraySet STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;
- // Default to true (fixed the bug), unless the system prop is set
- static {
- final String v = System.getProperty("org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym");
- if (v == null || v.equals("true"))
- defaultReplaceInvalidAcronym = true;
- else
- defaultReplaceInvalidAcronym = false;
+ /** Builds an analyzer with the given stop words.
+ * @param stopWords stop words */
+ public StandardAnalyzer(CharArraySet stopWords) {
+ super(stopWords);
}
/**
- *
- * @return true if new instances of StandardTokenizer will
- * replace mischaracterized acronyms
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- * @deprecated This will be removed (hardwired to true) in 3.0
+ * @deprecated Use {@link #StandardAnalyzer(CharArraySet)}
*/
- public static boolean getDefaultReplaceInvalidAcronym() {
- return defaultReplaceInvalidAcronym;
+ @Deprecated
+ public StandardAnalyzer(Version matchVersion, CharArraySet stopWords) {
+ super(matchVersion, stopWords);
}
- /**
- *
- * @param replaceInvalidAcronym Set to true to have new
- * instances of StandardTokenizer replace mischaracterized
- * acronyms by default. Set to false to preseve the
- * previous (before 2.4) buggy behavior. Alternatively,
- * set the system property
- * org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
- * to false.
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- * @deprecated This will be removed (hardwired to true) in 3.0
+ /** Builds an analyzer with the default stop words ({@link #STOP_WORDS_SET}).
*/
- public static void setDefaultReplaceInvalidAcronym(boolean replaceInvalidAcronym) {
- defaultReplaceInvalidAcronym = replaceInvalidAcronym;
- }
-
-
- /** An array containing some common English words that are usually not
- useful for searching. */
- public static final String[] STOP_WORDS = StopAnalyzer.ENGLISH_STOP_WORDS;
-
- /** Builds an analyzer with the default stop words ({@link #STOP_WORDS}). */
public StandardAnalyzer() {
- this(STOP_WORDS);
+ this(STOP_WORDS_SET);
}
- /** Builds an analyzer with the given stop words. */
- public StandardAnalyzer(Set stopWords) {
- stopSet = stopWords;
- }
-
- /** Builds an analyzer with the given stop words. */
- public StandardAnalyzer(String[] stopWords) {
- stopSet = StopFilter.makeStopSet(stopWords);
- }
-
- /** Builds an analyzer with the stop words from the given file.
- * @see WordlistLoader#getWordSet(File)
+ /**
+ * @deprecated Use {@link #StandardAnalyzer()}
*/
- public StandardAnalyzer(File stopwords) throws IOException {
- stopSet = WordlistLoader.getWordSet(stopwords);
+ @Deprecated
+ public StandardAnalyzer(Version matchVersion) {
+ this(matchVersion, STOP_WORDS_SET);
}
/** Builds an analyzer with the stop words from the given reader.
* @see WordlistLoader#getWordSet(Reader)
- */
+ * @param stopwords Reader to read stop words from */
public StandardAnalyzer(Reader stopwords) throws IOException {
- stopSet = WordlistLoader.getWordSet(stopwords);
+ this(loadStopwordSet(stopwords));
}
/**
- *
- * @param replaceInvalidAcronym Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- *
- * @deprecated Remove in 3.X and make true the only valid value
+ * @deprecated Use {@link #StandardAnalyzer()}
*/
- public StandardAnalyzer(boolean replaceInvalidAcronym) {
- this(STOP_WORDS);
- this.replaceInvalidAcronym = replaceInvalidAcronym;
+ @Deprecated
+ public StandardAnalyzer(Version matchVersion, Reader stopwords) throws IOException {
+ this(matchVersion, loadStopwordSet(stopwords, matchVersion));
}
/**
- * @param stopwords The stopwords to use
- * @param replaceInvalidAcronym Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- *
- * @deprecated Remove in 3.X and make true the only valid value
- */
- public StandardAnalyzer(Reader stopwords, boolean replaceInvalidAcronym) throws IOException{
- this(stopwords);
- this.replaceInvalidAcronym = replaceInvalidAcronym;
- }
-
- /**
- * @param stopwords The stopwords to use
- * @param replaceInvalidAcronym Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- *
- * @deprecated Remove in 3.X and make true the only valid value
- */
- public StandardAnalyzer(File stopwords, boolean replaceInvalidAcronym) throws IOException{
- this(stopwords);
- this.replaceInvalidAcronym = replaceInvalidAcronym;
- }
-
- /**
- *
- * @param stopwords The stopwords to use
- * @param replaceInvalidAcronym Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- *
- * @deprecated Remove in 3.X and make true the only valid value
- */
- public StandardAnalyzer(String [] stopwords, boolean replaceInvalidAcronym) throws IOException{
- this(stopwords);
- this.replaceInvalidAcronym = replaceInvalidAcronym;
- }
-
- /**
- * @param stopwords The stopwords to use
- * @param replaceInvalidAcronym Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- *
- * @deprecated Remove in 3.X and make true the only valid value
- */
- public StandardAnalyzer(Set stopwords, boolean replaceInvalidAcronym) throws IOException{
- this(stopwords);
- this.replaceInvalidAcronym = replaceInvalidAcronym;
- }
-
- /** Constructs a {@link StandardTokenizer} filtered by a {@link
- StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
- public TokenStream tokenStream(String fieldName, Reader reader) {
- StandardTokenizer tokenStream = new StandardTokenizer(reader, replaceInvalidAcronym);
- tokenStream.setMaxTokenLength(maxTokenLength);
- TokenStream result = new StandardFilter(tokenStream);
- result = new LowerCaseFilter(result);
- result = new StopFilter(result, stopSet);
- return result;
- }
-
- private static final class SavedStreams {
- StandardTokenizer tokenStream;
- TokenStream filteredTokenStream;
- }
-
- /** Default maximum allowed token length */
- public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
-
- private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
-
- /**
* Set maximum allowed token length. If a token is seen
* that exceeds this length then it is discarded. This
* setting only takes effect the next time tokenStream or
- * reusableTokenStream is called.
+ * tokenStream is called.
*/
public void setMaxTokenLength(int length) {
maxTokenLength = length;
@@ -220,45 +118,20 @@
public int getMaxTokenLength() {
return maxTokenLength;
}
-
- public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
- SavedStreams streams = (SavedStreams) getPreviousTokenStream();
- if (streams == null) {
- streams = new SavedStreams();
- setPreviousTokenStream(streams);
- streams.tokenStream = new StandardTokenizer(reader);
- streams.filteredTokenStream = new StandardFilter(streams.tokenStream);
- streams.filteredTokenStream = new LowerCaseFilter(streams.filteredTokenStream);
- streams.filteredTokenStream = new StopFilter(streams.filteredTokenStream, stopSet);
- } else {
- streams.tokenStream.reset(reader);
- }
- streams.tokenStream.setMaxTokenLength(maxTokenLength);
-
- streams.tokenStream.setReplaceInvalidAcronym(replaceInvalidAcronym);
- return streams.filteredTokenStream;
+ @Override
+ protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) {
+ final StandardTokenizer src = new StandardTokenizer(getVersion(), reader);
+ src.setMaxTokenLength(maxTokenLength);
+ TokenStream tok = new StandardFilter(getVersion(), src);
+ tok = new LowerCaseFilter(getVersion(), tok);
+ tok = new StopFilter(getVersion(), tok, stopwords);
+ return new TokenStreamComponents(src, tok) {
+ @Override
+ protected void setReader(final Reader reader) throws IOException {
+ src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength);
+ super.setReader(reader);
+ }
+ };
}
-
- /**
- *
- * @return true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- * @deprecated This will be removed (hardwired to true) in 3.0
- */
- public boolean isReplaceInvalidAcronym() {
- return replaceInvalidAcronym;
- }
-
- /**
- *
- * @param replaceInvalidAcronym Set to true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- * @deprecated This will be removed (hardwired to true) in 3.0
- */
- public void setReplaceInvalidAcronym(boolean replaceInvalidAcronym) {
- this.replaceInvalidAcronym = replaceInvalidAcronym;
- }
}
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardFilter.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardFilter.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardFilter.java 17 Aug 2012 14:55:14 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardFilter.java 16 Dec 2014 11:32:10 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis.standard;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,54 +17,73 @@
* limitations under the License.
*/
+import java.io.IOException;
+
import org.apache.lucene.analysis.TokenFilter;
-import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+import org.apache.lucene.util.Version;
-/** Normalizes tokens extracted with {@link StandardTokenizer}. */
-
-public final class StandardFilter extends TokenFilter {
-
-
- /** Construct filtering in . */
+/**
+ * Normalizes tokens extracted with {@link StandardTokenizer}.
+ */
+public class StandardFilter extends TokenFilter {
+ private final Version matchVersion;
+
public StandardFilter(TokenStream in) {
- super(in);
+ this(Version.LATEST, in);
}
- private static final String APOSTROPHE_TYPE = StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.APOSTROPHE];
- private static final String ACRONYM_TYPE = StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.ACRONYM];
-
- /** Returns the next token in the stream, or null at EOS.
- * Removes 's from the end of words.
- *
Removes dots from acronyms.
+ /**
+ * @deprecated Use {@link #StandardFilter(TokenStream)}
*/
- public final Token next(final Token reusableToken) throws java.io.IOException {
- assert reusableToken != null;
- Token nextToken = input.next(reusableToken);
+ @Deprecated
+ public StandardFilter(Version matchVersion, TokenStream in) {
+ super(in);
+ this.matchVersion = matchVersion;
+ }
+
+ private static final String APOSTROPHE_TYPE = ClassicTokenizer.TOKEN_TYPES[ClassicTokenizer.APOSTROPHE];
+ private static final String ACRONYM_TYPE = ClassicTokenizer.TOKEN_TYPES[ClassicTokenizer.ACRONYM];
- if (nextToken == null)
- return null;
+ // this filters uses attribute type
+ private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
+ private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
+
+ @Override
+ public final boolean incrementToken() throws IOException {
+ if (matchVersion.onOrAfter(Version.LUCENE_3_1))
+ return input.incrementToken(); // TODO: add some niceties for the new grammar
+ else
+ return incrementTokenClassic();
+ }
+
+ public final boolean incrementTokenClassic() throws IOException {
+ if (!input.incrementToken()) {
+ return false;
+ }
- char[] buffer = nextToken.termBuffer();
- final int bufferLength = nextToken.termLength();
- final String type = nextToken.type();
+ final char[] buffer = termAtt.buffer();
+ final int bufferLength = termAtt.length();
+ final String type = typeAtt.type();
- if (type == APOSTROPHE_TYPE && // remove 's
- bufferLength >= 2 &&
+ if (type == APOSTROPHE_TYPE && // remove 's
+ bufferLength >= 2 &&
buffer[bufferLength-2] == '\'' &&
(buffer[bufferLength-1] == 's' || buffer[bufferLength-1] == 'S')) {
// Strip last 2 characters off
- nextToken.setTermLength(bufferLength - 2);
- } else if (type == ACRONYM_TYPE) { // remove dots
+ termAtt.setLength(bufferLength - 2);
+ } else if (type == ACRONYM_TYPE) { // remove dots
int upto = 0;
for(int i=0;i This should be a good tokenizer for most European-language documents:
- *
- *
- * Splits words at punctuation characters, removing punctuation. However, a
- * dot that's not followed by whitespace is considered part of a token.
- * Splits words at hyphens, unless there's a number in the token, in which case
- * the whole token is interpreted as a product number and is not split.
- * Recognizes email addresses and internet hostnames as one token.
- *
- *
+/** A grammar-based tokenizer constructed with JFlex.
+ *
+ * As of Lucene version 3.1, this class implements the Word Break rules from the
+ * Unicode Text Segmentation algorithm, as specified in
+ * Unicode Standard Annex #29 .
+ *
* Many applications have specific tokenizer needs. If this tokenizer does
* not suit your application, please consider copying this source code
* directory to your project and maintaining your own grammar-based tokenizer.
+ *
+ *
+ *
You must specify the required {@link Version}
+ * compatibility when creating StandardTokenizer:
+ *
+ * As of 3.4, Hiragana and Han characters are no longer wrongly split
+ * from their combining characters. If you use a previous version number,
+ * you get the exact broken behavior for backwards compatibility.
+ * As of 3.1, StandardTokenizer implements Unicode text segmentation.
+ * If you use a previous version number, you get the exact behavior of
+ * {@link ClassicTokenizer} for backwards compatibility.
+ *
*/
-public class StandardTokenizer extends Tokenizer {
+public final class StandardTokenizer extends Tokenizer {
/** A private instance of the JFlex-constructed scanner */
- private final StandardTokenizerImpl scanner;
+ private StandardTokenizerInterface scanner;
public static final int ALPHANUM = 0;
+ /** @deprecated (3.1) */
+ @Deprecated
public static final int APOSTROPHE = 1;
+ /** @deprecated (3.1) */
+ @Deprecated
public static final int ACRONYM = 2;
+ /** @deprecated (3.1) */
+ @Deprecated
public static final int COMPANY = 3;
public static final int EMAIL = 4;
+ /** @deprecated (3.1) */
+ @Deprecated
public static final int HOST = 5;
public static final int NUM = 6;
+ /** @deprecated (3.1) */
+ @Deprecated
public static final int CJ = 7;
- /**
- * @deprecated this solves a bug where HOSTs that end with '.' are identified
- * as ACRONYMs. It is deprecated and will be removed in the next
- * release.
- */
+ /** @deprecated (3.1) */
+ @Deprecated
public static final int ACRONYM_DEP = 8;
+ public static final int SOUTHEAST_ASIAN = 9;
+ public static final int IDEOGRAPHIC = 10;
+ public static final int HIRAGANA = 11;
+ public static final int KATAKANA = 12;
+ public static final int HANGUL = 13;
+
/** String token types that correspond to token type int constants */
public static final String [] TOKEN_TYPES = new String [] {
"",
@@ -70,141 +97,152 @@
"",
"",
"",
- ""
+ "",
+ "",
+ "",
+ "",
+ "",
+ ""
};
+
+ private int skippedPositions;
- /** @deprecated Please use {@link #TOKEN_TYPES} instead */
- public static final String [] tokenImage = TOKEN_TYPES;
-
- /**
- * Specifies whether deprecated acronyms should be replaced with HOST type.
- * This is false by default to support backward compatibility.
- *
- * See http://issues.apache.org/jira/browse/LUCENE-1068
- *
- * @deprecated this should be removed in the next release (3.0).
- */
- private boolean replaceInvalidAcronym = false;
-
- void setInput(Reader reader) {
- this.input = reader;
- }
-
private int maxTokenLength = StandardAnalyzer.DEFAULT_MAX_TOKEN_LENGTH;
/** Set the max allowed token length. Any token longer
* than this is skipped. */
public void setMaxTokenLength(int length) {
+ if (length < 1) {
+ throw new IllegalArgumentException("maxTokenLength must be greater than zero");
+ }
this.maxTokenLength = length;
+ if (scanner instanceof StandardTokenizerImpl) {
+ scanner.setBufferSize(Math.min(length, 1024 * 1024)); // limit buffer size to 1M chars
+ }
}
/** @see #setMaxTokenLength */
public int getMaxTokenLength() {
return maxTokenLength;
}
- /**
- * Creates a new instance of the {@link StandardTokenizer}. Attaches the
- * input
to a newly created JFlex scanner.
- */
- public StandardTokenizer(Reader input) {
- this.input = input;
- this.scanner = new StandardTokenizerImpl(input);
- }
-
/**
* Creates a new instance of the {@link org.apache.lucene.analysis.standard.StandardTokenizer}. Attaches
* the input
to the newly created JFlex scanner.
*
* @param input The input reader
- * @param replaceInvalidAcronym Set to true to replace mischaracterized acronyms with HOST.
*
* See http://issues.apache.org/jira/browse/LUCENE-1068
*/
- public StandardTokenizer(Reader input, boolean replaceInvalidAcronym) {
- this.replaceInvalidAcronym = replaceInvalidAcronym;
- this.input = input;
- this.scanner = new StandardTokenizerImpl(input);
+ public StandardTokenizer(Reader input) {
+ this(Version.LATEST, input);
}
+ /**
+ * @deprecated Use {@link #StandardTokenizer(Reader)}
+ */
+ @Deprecated
+ public StandardTokenizer(Version matchVersion, Reader input) {
+ super(input);
+ init(matchVersion);
+ }
+
+ /**
+ * Creates a new StandardTokenizer with a given {@link org.apache.lucene.util.AttributeFactory}
+ */
+ public StandardTokenizer(AttributeFactory factory, Reader input) {
+ this(Version.LATEST, factory, input);
+ }
+
+ /**
+ * @deprecated Use {@link #StandardTokenizer(AttributeFactory, Reader)}
+ */
+ @Deprecated
+ public StandardTokenizer(Version matchVersion, AttributeFactory factory, Reader input) {
+ super(factory, input);
+ init(matchVersion);
+ }
+
+ private final void init(Version matchVersion) {
+ if (matchVersion.onOrAfter(Version.LUCENE_4_7)) {
+ this.scanner = new StandardTokenizerImpl(input);
+ } else if (matchVersion.onOrAfter(Version.LUCENE_4_0)) {
+ this.scanner = new StandardTokenizerImpl40(input);
+ } else if (matchVersion.onOrAfter(Version.LUCENE_3_4)) {
+ this.scanner = new StandardTokenizerImpl34(input);
+ } else if (matchVersion.onOrAfter(Version.LUCENE_3_1)) {
+ this.scanner = new StandardTokenizerImpl31(input);
+ } else {
+ this.scanner = new ClassicTokenizerImpl(input);
+ }
+ }
+
+ // this tokenizer generates three attributes:
+ // term offset, positionIncrement and type
+ private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
+ private final OffsetAttribute offsetAtt = addAttribute(OffsetAttribute.class);
+ private final PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);
+ private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
+
/*
* (non-Javadoc)
*
* @see org.apache.lucene.analysis.TokenStream#next()
*/
- public Token next(final Token reusableToken) throws IOException {
- assert reusableToken != null;
- int posIncr = 1;
+ @Override
+ public final boolean incrementToken() throws IOException {
+ clearAttributes();
+ skippedPositions = 0;
- while(true) {
- int tokenType = scanner.getNextToken();
+ while(true) {
+ int tokenType = scanner.getNextToken();
- if (tokenType == StandardTokenizerImpl.YYEOF) {
- return null;
- }
-
- if (scanner.yylength() <= maxTokenLength) {
- reusableToken.clear();
- reusableToken.setPositionIncrement(posIncr);
- scanner.getText(reusableToken);
- final int start = scanner.yychar();
- reusableToken.setStartOffset(start);
- reusableToken.setEndOffset(start+reusableToken.termLength());
- // This 'if' should be removed in the next release. For now, it converts
- // invalid acronyms to HOST. When removed, only the 'else' part should
- // remain.
- if (tokenType == StandardTokenizerImpl.ACRONYM_DEP) {
- if (replaceInvalidAcronym) {
- reusableToken.setType(StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.HOST]);
- reusableToken.setTermLength(reusableToken.termLength() - 1); // remove extra '.'
- } else {
- reusableToken.setType(StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.ACRONYM]);
- }
- } else {
- reusableToken.setType(StandardTokenizerImpl.TOKEN_TYPES[tokenType]);
- }
- return reusableToken;
- } else
- // When we skip a too-long term, we still increment the
- // position increment
- posIncr++;
+ if (tokenType == StandardTokenizerInterface.YYEOF) {
+ return false;
}
- }
- /*
- * (non-Javadoc)
- *
- * @see org.apache.lucene.analysis.TokenStream#reset()
- */
- public void reset() throws IOException {
- super.reset();
- scanner.yyreset(input);
+ if (scanner.yylength() <= maxTokenLength) {
+ posIncrAtt.setPositionIncrement(skippedPositions+1);
+ scanner.getText(termAtt);
+ final int start = scanner.yychar();
+ offsetAtt.setOffset(correctOffset(start), correctOffset(start+termAtt.length()));
+ // This 'if' should be removed in the next release. For now, it converts
+ // invalid acronyms to HOST. When removed, only the 'else' part should
+ // remain.
+ if (tokenType == StandardTokenizer.ACRONYM_DEP) {
+ typeAtt.setType(StandardTokenizer.TOKEN_TYPES[StandardTokenizer.HOST]);
+ termAtt.setLength(termAtt.length() - 1); // remove extra '.'
+ } else {
+ typeAtt.setType(StandardTokenizer.TOKEN_TYPES[tokenType]);
+ }
+ return true;
+ } else
+ // When we skip a too-long term, we still increment the
+ // position increment
+ skippedPositions++;
}
+ }
+
+ @Override
+ public final void end() throws IOException {
+ super.end();
+ // set final offset
+ int finalOffset = correctOffset(scanner.yychar() + scanner.yylength());
+ offsetAtt.setOffset(finalOffset, finalOffset);
+ // adjust any skipped tokens
+ posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement()+skippedPositions);
+ }
- public void reset(Reader reader) throws IOException {
- input = reader;
- reset();
- }
-
- /**
- * Prior to https://issues.apache.org/jira/browse/LUCENE-1068, StandardTokenizer mischaracterized as acronyms tokens like www.abc.com
- * when they should have been labeled as hosts instead.
- * @return true if StandardTokenizer now returns these tokens as Hosts, otherwise false
- *
- * @deprecated Remove in 3.X and make true the only valid value
- */
- public boolean isReplaceInvalidAcronym() {
- return replaceInvalidAcronym;
+ @Override
+ public void close() throws IOException {
+ super.close();
+ scanner.yyreset(input);
}
- /**
- *
- * @param replaceInvalidAcronym Set to true to replace mischaracterized acronyms as HOST.
- * @deprecated Remove in 3.X and make true the only valid value
- *
- * See https://issues.apache.org/jira/browse/LUCENE-1068
- */
- public void setReplaceInvalidAcronym(boolean replaceInvalidAcronym) {
- this.replaceInvalidAcronym = replaceInvalidAcronym;
+ @Override
+ public void reset() throws IOException {
+ super.reset();
+ scanner.yyreset(input);
+ skippedPositions = 0;
}
}
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java 17 Aug 2012 14:55:14 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java 16 Dec 2014 11:32:10 -0000 1.1.2.1
@@ -1,8 +1,8 @@
-/* The following code was generated by JFlex 1.4.1 on 9/4/08 6:49 PM */
+/* The following code was generated by JFlex 1.6.0 */
package org.apache.lucene.analysis.standard;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -19,98 +19,192 @@
* limitations under the License.
*/
-/*
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
-NOTE: if you change this file and need to regenerate the tokenizer,
- remember to use JRE 1.4 when running jflex (before Lucene 3.0).
- This grammar now uses constructs (eg :digit:) whose meaning can
- vary according to the JRE used to run jflex. See
- https://issues.apache.org/jira/browse/LUCENE-1126 for details
-
-*/
-
-import org.apache.lucene.analysis.Token;
-
-
/**
- * This class is a scanner generated by
- * JFlex 1.4.1
- * on 9/4/08 6:49 PM from the specification file
- * /tango/mike/src/lucene.standarddigit/src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex
+ * This class implements Word Break rules from the Unicode Text Segmentation
+ * algorithm, as specified in
+ * Unicode Standard Annex #29 .
+ *
+ * Tokens produced are of the following types:
+ *
+ * <ALPHANUM>: A sequence of alphabetic and numeric characters
+ * <NUM>: A number
+ * <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast
+ * Asian languages, including Thai, Lao, Myanmar, and Khmer
+ * <IDEOGRAPHIC>: A single CJKV ideographic character
+ * <HIRAGANA>: A single hiragana character
+ * <KATAKANA>: A sequence of katakana characters
+ * <HANGUL>: A sequence of Hangul characters
+ *
*/
-class StandardTokenizerImpl {
+public final class StandardTokenizerImpl implements StandardTokenizerInterface {
+
/** This character denotes the end of file */
public static final int YYEOF = -1;
/** initial size of the lookahead buffer */
- private static final int ZZ_BUFFERSIZE = 16384;
+ private int ZZ_BUFFERSIZE = 255;
/** lexical states */
public static final int YYINITIAL = 0;
+ /**
+ * ZZ_LEXSTATE[l] is the state in the DFA for the lexical state l
+ * ZZ_LEXSTATE[l+1] is the state in the DFA for the lexical state l
+ * at the beginning of a line
+ * l is of the form l = 2*k, k a non negative integer
+ */
+ private static final int ZZ_LEXSTATE[] = {
+ 0, 0
+ };
+
/**
* Translates characters to character classes
*/
private static final String ZZ_CMAP_PACKED =
- "\11\0\1\0\1\15\1\0\1\0\1\14\22\0\1\0\5\0\1\5"+
- "\1\3\4\0\1\11\1\7\1\4\1\11\12\2\6\0\1\6\32\12"+
- "\4\0\1\10\1\0\32\12\57\0\1\12\12\0\1\12\4\0\1\12"+
- "\5\0\27\12\1\0\37\12\1\0\u0128\12\2\0\22\12\34\0\136\12"+
- "\2\0\11\12\2\0\7\12\16\0\2\12\16\0\5\12\11\0\1\12"+
- "\213\0\1\12\13\0\1\12\1\0\3\12\1\0\1\12\1\0\24\12"+
- "\1\0\54\12\1\0\10\12\2\0\32\12\14\0\202\12\12\0\71\12"+
- "\2\0\2\12\2\0\2\12\3\0\46\12\2\0\2\12\67\0\46\12"+
- "\2\0\1\12\7\0\47\12\110\0\33\12\5\0\3\12\56\0\32\12"+
- "\5\0\13\12\25\0\12\2\7\0\143\12\1\0\1\12\17\0\2\12"+
- "\11\0\12\2\3\12\23\0\1\12\1\0\33\12\123\0\46\12\u015f\0"+
- "\65\12\3\0\1\12\22\0\1\12\7\0\12\12\4\0\12\2\25\0"+
- "\10\12\2\0\2\12\2\0\26\12\1\0\7\12\1\0\1\12\3\0"+
- "\4\12\42\0\2\12\1\0\3\12\4\0\12\2\2\12\23\0\6\12"+
- "\4\0\2\12\2\0\26\12\1\0\7\12\1\0\2\12\1\0\2\12"+
- "\1\0\2\12\37\0\4\12\1\0\1\12\7\0\12\2\2\0\3\12"+
- "\20\0\7\12\1\0\1\12\1\0\3\12\1\0\26\12\1\0\7\12"+
- "\1\0\2\12\1\0\5\12\3\0\1\12\22\0\1\12\17\0\1\12"+
- "\5\0\12\2\25\0\10\12\2\0\2\12\2\0\26\12\1\0\7\12"+
- "\1\0\2\12\2\0\4\12\3\0\1\12\36\0\2\12\1\0\3\12"+
- "\4\0\12\2\25\0\6\12\3\0\3\12\1\0\4\12\3\0\2\12"+
- "\1\0\1\12\1\0\2\12\3\0\2\12\3\0\3\12\3\0\10\12"+
- "\1\0\3\12\55\0\11\2\25\0\10\12\1\0\3\12\1\0\27\12"+
- "\1\0\12\12\1\0\5\12\46\0\2\12\4\0\12\2\25\0\10\12"+
- "\1\0\3\12\1\0\27\12\1\0\12\12\1\0\5\12\44\0\1\12"+
- "\1\0\2\12\4\0\12\2\25\0\10\12\1\0\3\12\1\0\27\12"+
- "\1\0\20\12\46\0\2\12\4\0\12\2\25\0\22\12\3\0\30\12"+
- "\1\0\11\12\1\0\1\12\2\0\7\12\71\0\1\1\60\12\1\1"+
- "\2\12\14\1\7\12\11\1\12\2\47\0\2\12\1\0\1\12\2\0"+
- "\2\12\1\0\1\12\2\0\1\12\6\0\4\12\1\0\7\12\1\0"+
- "\3\12\1\0\1\12\1\0\1\12\2\0\2\12\1\0\4\12\1\0"+
- "\2\12\11\0\1\12\2\0\5\12\1\0\1\12\11\0\12\2\2\0"+
- "\2\12\42\0\1\12\37\0\12\2\26\0\10\12\1\0\42\12\35\0"+
- "\4\12\164\0\42\12\1\0\5\12\1\0\2\12\25\0\12\2\6\0"+
- "\6\12\112\0\46\12\12\0\47\12\11\0\132\12\5\0\104\12\5\0"+
- "\122\12\6\0\7\12\1\0\77\12\1\0\1\12\1\0\4\12\2\0"+
- "\7\12\1\0\1\12\1\0\4\12\2\0\47\12\1\0\1\12\1\0"+
- "\4\12\2\0\37\12\1\0\1\12\1\0\4\12\2\0\7\12\1\0"+
- "\1\12\1\0\4\12\2\0\7\12\1\0\7\12\1\0\27\12\1\0"+
- "\37\12\1\0\1\12\1\0\4\12\2\0\7\12\1\0\47\12\1\0"+
- "\23\12\16\0\11\2\56\0\125\12\14\0\u026c\12\2\0\10\12\12\0"+
- "\32\12\5\0\113\12\225\0\64\12\54\0\12\2\46\0\12\2\6\0"+
- "\130\12\10\0\51\12\u0557\0\234\12\4\0\132\12\6\0\26\12\2\0"+
- "\6\12\2\0\46\12\2\0\6\12\2\0\10\12\1\0\1\12\1\0"+
- "\1\12\1\0\1\12\1\0\37\12\2\0\65\12\1\0\7\12\1\0"+
- "\1\12\3\0\3\12\1\0\7\12\3\0\4\12\2\0\6\12\4\0"+
- "\15\12\5\0\3\12\1\0\7\12\202\0\1\12\202\0\1\12\4\0"+
- "\1\12\2\0\12\12\1\0\1\12\3\0\5\12\6\0\1\12\1\0"+
- "\1\12\1\0\1\12\1\0\4\12\1\0\3\12\1\0\7\12\u0ecb\0"+
- "\2\12\52\0\5\12\12\0\1\13\124\13\10\13\2\13\2\13\132\13"+
- "\1\13\3\13\6\13\50\13\3\13\1\0\136\12\21\0\30\12\70\0"+
- "\20\13\u0100\0\200\13\200\0\u19b6\13\12\13\100\0\u51a6\13\132\13\u048d\12"+
- "\u0773\0\u2ba4\12\u215c\0\u012e\13\322\13\7\12\14\0\5\12\5\0\1\12"+
- "\1\0\12\12\1\0\15\12\1\0\5\12\1\0\1\12\1\0\2\12"+
- "\1\0\2\12\1\0\154\12\41\0\u016b\12\22\0\100\12\2\0\66\12"+
- "\50\0\14\12\164\0\3\12\1\0\1\12\1\0\207\12\23\0\12\2"+
- "\7\0\32\12\6\0\32\12\12\0\1\13\72\13\37\12\3\0\6\12"+
- "\2\0\6\12\2\0\6\12\2\0\3\12\43\0";
+ "\42\0\1\15\4\0\1\14\4\0\1\7\1\0\1\10\1\0\12\4"+
+ "\1\6\1\7\5\0\32\1\4\0\1\11\1\0\32\1\57\0\1\1"+
+ "\2\0\1\3\7\0\1\1\1\0\1\6\2\0\1\1\5\0\27\1"+
+ "\1\0\37\1\1\0\u01ca\1\4\0\14\1\5\0\1\6\10\0\5\1"+
+ "\7\0\1\1\1\0\1\1\21\0\160\3\5\1\1\0\2\1\2\0"+
+ "\4\1\1\7\7\0\1\1\1\6\3\1\1\0\1\1\1\0\24\1"+
+ "\1\0\123\1\1\0\213\1\1\0\7\3\236\1\11\0\46\1\2\0"+
+ "\1\1\7\0\47\1\1\0\1\7\7\0\55\3\1\0\1\3\1\0"+
+ "\2\3\1\0\2\3\1\0\1\3\10\0\33\16\5\0\3\16\1\1"+
+ "\1\6\13\0\5\3\7\0\2\7\2\0\13\3\1\0\1\3\3\0"+
+ "\53\1\25\3\12\4\1\0\1\4\1\7\1\0\2\1\1\3\143\1"+
+ "\1\0\1\1\10\3\1\0\6\3\2\1\2\3\1\0\4\3\2\1"+
+ "\12\4\3\1\2\0\1\1\17\0\1\3\1\1\1\3\36\1\33\3"+
+ "\2\0\131\1\13\3\1\1\16\0\12\4\41\1\11\3\2\1\2\0"+
+ "\1\7\1\0\1\1\5\0\26\1\4\3\1\1\11\3\1\1\3\3"+
+ "\1\1\5\3\22\0\31\1\3\3\104\0\1\1\1\0\13\1\67\0"+
+ "\33\3\1\0\4\3\66\1\3\3\1\1\22\3\1\1\7\3\12\1"+
+ "\2\3\2\0\12\4\1\0\7\1\1\0\7\1\1\0\3\3\1\0"+
+ "\10\1\2\0\2\1\2\0\26\1\1\0\7\1\1\0\1\1\3\0"+
+ "\4\1\2\0\1\3\1\1\7\3\2\0\2\3\2\0\3\3\1\1"+
+ "\10\0\1\3\4\0\2\1\1\0\3\1\2\3\2\0\12\4\2\1"+
+ "\17\0\3\3\1\0\6\1\4\0\2\1\2\0\26\1\1\0\7\1"+
+ "\1\0\2\1\1\0\2\1\1\0\2\1\2\0\1\3\1\0\5\3"+
+ "\4\0\2\3\2\0\3\3\3\0\1\3\7\0\4\1\1\0\1\1"+
+ "\7\0\12\4\2\3\3\1\1\3\13\0\3\3\1\0\11\1\1\0"+
+ "\3\1\1\0\26\1\1\0\7\1\1\0\2\1\1\0\5\1\2\0"+
+ "\1\3\1\1\10\3\1\0\3\3\1\0\3\3\2\0\1\1\17\0"+
+ "\2\1\2\3\2\0\12\4\21\0\3\3\1\0\10\1\2\0\2\1"+
+ "\2\0\26\1\1\0\7\1\1\0\2\1\1\0\5\1\2\0\1\3"+
+ "\1\1\7\3\2\0\2\3\2\0\3\3\10\0\2\3\4\0\2\1"+
+ "\1\0\3\1\2\3\2\0\12\4\1\0\1\1\20\0\1\3\1\1"+
+ "\1\0\6\1\3\0\3\1\1\0\4\1\3\0\2\1\1\0\1\1"+
+ "\1\0\2\1\3\0\2\1\3\0\3\1\3\0\14\1\4\0\5\3"+
+ "\3\0\3\3\1\0\4\3\2\0\1\1\6\0\1\3\16\0\12\4"+
+ "\21\0\3\3\1\0\10\1\1\0\3\1\1\0\27\1\1\0\12\1"+
+ "\1\0\5\1\3\0\1\1\7\3\1\0\3\3\1\0\4\3\7\0"+
+ "\2\3\1\0\2\1\6\0\2\1\2\3\2\0\12\4\22\0\2\3"+
+ "\1\0\10\1\1\0\3\1\1\0\27\1\1\0\12\1\1\0\5\1"+
+ "\2\0\1\3\1\1\7\3\1\0\3\3\1\0\4\3\7\0\2\3"+
+ "\7\0\1\1\1\0\2\1\2\3\2\0\12\4\1\0\2\1\17\0"+
+ "\2\3\1\0\10\1\1\0\3\1\1\0\51\1\2\0\1\1\7\3"+
+ "\1\0\3\3\1\0\4\3\1\1\10\0\1\3\10\0\2\1\2\3"+
+ "\2\0\12\4\12\0\6\1\2\0\2\3\1\0\22\1\3\0\30\1"+
+ "\1\0\11\1\1\0\1\1\2\0\7\1\3\0\1\3\4\0\6\3"+
+ "\1\0\1\3\1\0\10\3\22\0\2\3\15\0\60\20\1\21\2\20"+
+ "\7\21\5\0\7\20\10\21\1\0\12\4\47\0\2\20\1\0\1\20"+
+ "\2\0\2\20\1\0\1\20\2\0\1\20\6\0\4\20\1\0\7\20"+
+ "\1\0\3\20\1\0\1\20\1\0\1\20\2\0\2\20\1\0\4\20"+
+ "\1\21\2\20\6\21\1\0\2\21\1\20\2\0\5\20\1\0\1\20"+
+ "\1\0\6\21\2\0\12\4\2\0\4\20\40\0\1\1\27\0\2\3"+
+ "\6\0\12\4\13\0\1\3\1\0\1\3\1\0\1\3\4\0\2\3"+
+ "\10\1\1\0\44\1\4\0\24\3\1\0\2\3\5\1\13\3\1\0"+
+ "\44\3\11\0\1\3\71\0\53\20\24\21\1\20\12\4\6\0\6\20"+
+ "\4\21\4\20\3\21\1\20\3\21\2\20\7\21\3\20\4\21\15\20"+
+ "\14\21\1\20\1\21\12\4\4\21\2\20\46\1\1\0\1\1\5\0"+
+ "\1\1\2\0\53\1\1\0\4\1\u0100\2\111\1\1\0\4\1\2\0"+
+ "\7\1\1\0\1\1\1\0\4\1\2\0\51\1\1\0\4\1\2\0"+
+ "\41\1\1\0\4\1\2\0\7\1\1\0\1\1\1\0\4\1\2\0"+
+ "\17\1\1\0\71\1\1\0\4\1\2\0\103\1\2\0\3\3\40\0"+
+ "\20\1\20\0\125\1\14\0\u026c\1\2\0\21\1\1\0\32\1\5\0"+
+ "\113\1\3\0\3\1\17\0\15\1\1\0\4\1\3\3\13\0\22\1"+
+ "\3\3\13\0\22\1\2\3\14\0\15\1\1\0\3\1\1\0\2\3"+
+ "\14\0\64\20\40\21\3\0\1\20\4\0\1\20\1\21\2\0\12\4"+
+ "\41\0\4\3\1\0\12\4\6\0\130\1\10\0\51\1\1\3\1\1"+
+ "\5\0\106\1\12\0\35\1\3\0\14\3\4\0\14\3\12\0\12\4"+
+ "\36\20\2\0\5\20\13\0\54\20\4\0\21\21\7\20\2\21\6\0"+
+ "\12\4\1\20\3\0\2\20\40\0\27\1\5\3\4\0\65\20\12\21"+
+ "\1\0\35\21\2\0\1\3\12\4\6\0\12\4\6\0\16\20\122\0"+
+ "\5\3\57\1\21\3\7\1\4\0\12\4\21\0\11\3\14\0\3\3"+
+ "\36\1\15\3\2\1\12\4\54\1\16\3\14\0\44\1\24\3\10\0"+
+ "\12\4\3\0\3\1\12\4\44\1\122\0\3\3\1\0\25\3\4\1"+
+ "\1\3\4\1\3\3\2\1\11\0\300\1\47\3\25\0\4\3\u0116\1"+
+ "\2\0\6\1\2\0\46\1\2\0\6\1\2\0\10\1\1\0\1\1"+
+ "\1\0\1\1\1\0\1\1\1\0\37\1\2\0\65\1\1\0\7\1"+
+ "\1\0\1\1\3\0\3\1\1\0\7\1\3\0\4\1\2\0\6\1"+
+ "\4\0\15\1\5\0\3\1\1\0\7\1\17\0\4\3\10\0\2\10"+
+ "\12\0\1\10\2\0\1\6\2\0\5\3\20\0\2\11\3\0\1\7"+
+ "\17\0\1\11\13\0\5\3\1\0\12\3\1\0\1\1\15\0\1\1"+
+ "\20\0\15\1\63\0\41\3\21\0\1\1\4\0\1\1\2\0\12\1"+
+ "\1\0\1\1\3\0\5\1\6\0\1\1\1\0\1\1\1\0\1\1"+
+ "\1\0\4\1\1\0\13\1\2\0\4\1\5\0\5\1\4\0\1\1"+
+ "\21\0\51\1\u032d\0\64\1\u0716\0\57\1\1\0\57\1\1\0\205\1"+
+ "\6\0\4\1\3\3\2\1\14\0\46\1\1\0\1\1\5\0\1\1"+
+ "\2\0\70\1\7\0\1\1\17\0\1\3\27\1\11\0\7\1\1\0"+
+ "\7\1\1\0\7\1\1\0\7\1\1\0\7\1\1\0\7\1\1\0"+
+ "\7\1\1\0\7\1\1\0\40\3\57\0\1\1\120\0\32\12\1\0"+
+ "\131\12\14\0\326\12\57\0\1\1\1\0\1\12\31\0\11\12\6\3"+
+ "\1\0\5\5\2\0\3\12\1\1\1\1\4\0\126\13\2\0\2\3"+
+ "\2\5\3\13\133\5\1\0\4\5\5\0\51\1\3\0\136\2\21\0"+
+ "\33\1\65\0\20\5\320\0\57\5\1\0\130\5\250\0\u19b6\12\112\0"+
+ "\u51cd\12\63\0\u048d\1\103\0\56\1\2\0\u010d\1\3\0\20\1\12\4"+
+ "\2\1\24\0\57\1\4\3\1\0\12\3\1\0\31\1\7\0\1\3"+
+ "\120\1\2\3\45\0\11\1\2\0\147\1\2\0\4\1\1\0\4\1"+
+ "\14\0\13\1\115\0\12\1\1\3\3\1\1\3\4\1\1\3\27\1"+
+ "\5\3\30\0\64\1\14\0\2\3\62\1\21\3\13\0\12\4\6\0"+
+ "\22\3\6\1\3\0\1\1\4\0\12\4\34\1\10\3\2\0\27\1"+
+ "\15\3\14\0\35\2\3\0\4\3\57\1\16\3\16\0\1\1\12\4"+
+ "\46\0\51\1\16\3\11\0\3\1\1\3\10\1\2\3\2\0\12\4"+
+ "\6\0\33\20\1\21\4\0\60\20\1\21\1\20\3\21\2\20\2\21"+
+ "\5\20\2\21\1\20\1\21\1\20\30\0\5\20\13\1\5\3\2\0"+
+ "\3\1\2\3\12\0\6\1\2\0\6\1\2\0\6\1\11\0\7\1"+
+ "\1\0\7\1\221\0\43\1\10\3\1\0\2\3\2\0\12\4\6\0"+
+ "\u2ba4\2\14\0\27\2\4\0\61\2\u2104\0\u016e\12\2\0\152\12\46\0"+
+ "\7\1\14\0\5\1\5\0\1\16\1\3\12\16\1\0\15\16\1\0"+
+ "\5\16\1\0\1\16\1\0\2\16\1\0\2\16\1\0\12\16\142\1"+
+ "\41\0\u016b\1\22\0\100\1\2\0\66\1\50\0\14\1\4\0\20\3"+
+ "\1\7\2\0\1\6\1\7\13\0\7\3\14\0\2\11\30\0\3\11"+
+ "\1\7\1\0\1\10\1\0\1\7\1\6\32\0\5\1\1\0\207\1"+
+ "\2\0\1\3\7\0\1\10\4\0\1\7\1\0\1\10\1\0\12\4"+
+ "\1\6\1\7\5\0\32\1\4\0\1\11\1\0\32\1\13\0\70\5"+
+ "\2\3\37\2\3\0\6\2\2\0\6\2\2\0\6\2\2\0\3\2"+
+ "\34\0\3\3\4\0\14\1\1\0\32\1\1\0\23\1\1\0\2\1"+
+ "\1\0\17\1\2\0\16\1\42\0\173\1\105\0\65\1\210\0\1\3"+
+ "\202\0\35\1\3\0\61\1\57\0\37\1\21\0\33\1\65\0\36\1"+
+ "\2\0\44\1\4\0\10\1\1\0\5\1\52\0\236\1\2\0\12\4"+
+ "\u0356\0\6\1\2\0\1\1\1\0\54\1\1\0\2\1\3\0\1\1"+
+ "\2\0\27\1\252\0\26\1\12\0\32\1\106\0\70\1\6\0\2\1"+
+ "\100\0\1\1\3\3\1\0\2\3\5\0\4\3\4\1\1\0\3\1"+
+ "\1\0\33\1\4\0\3\3\4\0\1\3\40\0\35\1\203\0\66\1"+
+ "\12\0\26\1\12\0\23\1\215\0\111\1\u03b7\0\3\3\65\1\17\3"+
+ "\37\0\12\4\20\0\3\3\55\1\13\3\2\0\1\3\22\0\31\1"+
+ "\7\0\12\4\6\0\3\3\44\1\16\3\1\0\12\4\100\0\3\3"+
+ "\60\1\16\3\4\1\13\0\12\4\u04a6\0\53\1\15\3\10\0\12\4"+
+ "\u0936\0\u036f\1\221\0\143\1\u0b9d\0\u042f\1\u33d1\0\u0239\1\u04c7\0\105\1"+
+ "\13\0\1\1\56\3\20\0\4\3\15\1\u4060\0\1\5\1\13\u2163\0"+
+ "\5\3\3\0\26\3\2\0\7\3\36\0\4\3\224\0\3\3\u01bb\0"+
+ "\125\1\1\0\107\1\1\0\2\1\2\0\1\1\2\0\2\1\2\0"+
+ "\4\1\1\0\14\1\1\0\1\1\1\0\7\1\1\0\101\1\1\0"+
+ "\4\1\2\0\10\1\1\0\7\1\1\0\34\1\1\0\4\1\1\0"+
+ "\5\1\1\0\1\1\3\0\7\1\1\0\u0154\1\2\0\31\1\1\0"+
+ "\31\1\1\0\37\1\1\0\31\1\1\0\37\1\1\0\31\1\1\0"+
+ "\37\1\1\0\31\1\1\0\37\1\1\0\31\1\1\0\10\1\2\0"+
+ "\62\4\u1600\0\4\1\1\0\33\1\1\0\2\1\1\0\1\1\2\0"+
+ "\1\1\1\0\12\1\1\0\4\1\1\0\1\1\1\0\1\1\6\0"+
+ "\1\1\4\0\1\1\1\0\1\1\1\0\1\1\1\0\3\1\1\0"+
+ "\2\1\1\0\1\1\2\0\1\1\1\0\1\1\1\0\1\1\1\0"+
+ "\1\1\1\0\1\1\1\0\2\1\1\0\1\1\2\0\4\1\1\0"+
+ "\7\1\1\0\4\1\1\0\4\1\1\0\1\1\1\0\12\1\1\0"+
+ "\21\1\5\0\3\1\1\0\5\1\1\0\21\1\u032a\0\32\17\1\13"+
+ "\u0dff\0\ua6d7\12\51\0\u1035\12\13\0\336\12\u3fe2\0\u021e\12\uffff\0\uffff\0\uffff\0\uffff\0\uffff\0\uffff\0\uffff\0\uffff\0\uffff\0\uffff\0\uffff\0\u05ee\0"+
+ "\1\3\36\0\140\3\200\0\360\3\uffff\0\uffff\0\ufe12\0";
/**
* Translates characters to character classes
@@ -123,13 +217,12 @@
private static final int [] ZZ_ACTION = zzUnpackAction();
private static final String ZZ_ACTION_PACKED_0 =
- "\1\0\1\1\3\2\1\3\1\1\13\0\1\2\3\4"+
- "\2\0\1\5\1\0\1\5\3\4\6\5\1\6\1\4"+
- "\2\7\1\10\1\0\1\10\3\0\2\10\1\11\1\12"+
- "\1\4";
+ "\1\0\1\1\1\2\1\3\1\4\1\5\1\1\1\6"+
+ "\1\7\1\2\1\1\1\10\1\2\1\0\1\2\1\0"+
+ "\1\4\1\0\2\2\2\0\1\1\1\0";
private static int [] zzUnpackAction() {
- int [] result = new int[51];
+ int [] result = new int[24];
int offset = 0;
offset = zzUnpackAction(ZZ_ACTION_PACKED_0, offset, result);
return result;
@@ -154,16 +247,12 @@
private static final int [] ZZ_ROWMAP = zzUnpackRowMap();
private static final String ZZ_ROWMAP_PACKED_0 =
- "\0\0\0\16\0\34\0\52\0\70\0\16\0\106\0\124"+
- "\0\142\0\160\0\176\0\214\0\232\0\250\0\266\0\304"+
- "\0\322\0\340\0\356\0\374\0\u010a\0\u0118\0\u0126\0\u0134"+
- "\0\u0142\0\u0150\0\u015e\0\u016c\0\u017a\0\u0188\0\u0196\0\u01a4"+
- "\0\u01b2\0\u01c0\0\u01ce\0\u01dc\0\u01ea\0\u01f8\0\322\0\u0206"+
- "\0\u0214\0\u0222\0\u0230\0\u023e\0\u024c\0\u025a\0\124\0\214"+
- "\0\u0268\0\u0276\0\u0284";
+ "\0\0\0\22\0\44\0\66\0\110\0\132\0\154\0\176"+
+ "\0\220\0\242\0\264\0\306\0\330\0\352\0\374\0\u010e"+
+ "\0\u0120\0\154\0\u0132\0\u0144\0\u0156\0\264\0\u0168\0\u017a";
private static int [] zzUnpackRowMap() {
- int [] result = new int[51];
+ int [] result = new int[24];
int offset = 0;
offset = zzUnpackRowMap(ZZ_ROWMAP_PACKED_0, offset, result);
return result;
@@ -186,49 +275,33 @@
private static final int [] ZZ_TRANS = zzUnpackTrans();
private static final String ZZ_TRANS_PACKED_0 =
- "\1\2\1\3\1\4\7\2\1\5\1\6\1\7\1\2"+
- "\17\0\2\3\1\0\1\10\1\0\1\11\2\12\1\13"+
- "\1\3\4\0\1\3\1\4\1\0\1\14\1\0\1\11"+
- "\2\15\1\16\1\4\4\0\1\3\1\4\1\17\1\20"+
- "\1\21\1\22\2\12\1\13\1\23\20\0\1\2\1\0"+
- "\1\24\1\25\7\0\1\26\4\0\2\27\7\0\1\27"+
- "\4\0\1\30\1\31\7\0\1\32\5\0\1\33\7\0"+
- "\1\13\4\0\1\34\1\35\7\0\1\36\4\0\1\37"+
- "\1\40\7\0\1\41\4\0\1\42\1\43\7\0\1\44"+
- "\15\0\1\45\4\0\1\24\1\25\7\0\1\46\15\0"+
- "\1\47\4\0\2\27\7\0\1\50\4\0\1\3\1\4"+
- "\1\17\1\10\1\21\1\22\2\12\1\13\1\23\4\0"+
- "\2\24\1\0\1\51\1\0\1\11\2\52\1\0\1\24"+
- "\4\0\1\24\1\25\1\0\1\53\1\0\1\11\2\54"+
- "\1\55\1\25\4\0\1\24\1\25\1\0\1\51\1\0"+
- "\1\11\2\52\1\0\1\26\4\0\2\27\1\0\1\56"+
- "\2\0\1\56\2\0\1\27\4\0\2\30\1\0\1\52"+
- "\1\0\1\11\2\52\1\0\1\30\4\0\1\30\1\31"+
- "\1\0\1\54\1\0\1\11\2\54\1\55\1\31\4\0"+
- "\1\30\1\31\1\0\1\52\1\0\1\11\2\52\1\0"+
- "\1\32\5\0\1\33\1\0\1\55\2\0\3\55\1\33"+
- "\4\0\2\34\1\0\1\57\1\0\1\11\2\12\1\13"+
- "\1\34\4\0\1\34\1\35\1\0\1\60\1\0\1\11"+
- "\2\15\1\16\1\35\4\0\1\34\1\35\1\0\1\57"+
- "\1\0\1\11\2\12\1\13\1\36\4\0\2\37\1\0"+
- "\1\12\1\0\1\11\2\12\1\13\1\37\4\0\1\37"+
- "\1\40\1\0\1\15\1\0\1\11\2\15\1\16\1\40"+
- "\4\0\1\37\1\40\1\0\1\12\1\0\1\11\2\12"+
- "\1\13\1\41\4\0\2\42\1\0\1\13\2\0\3\13"+
- "\1\42\4\0\1\42\1\43\1\0\1\16\2\0\3\16"+
- "\1\43\4\0\1\42\1\43\1\0\1\13\2\0\3\13"+
- "\1\44\6\0\1\17\6\0\1\45\4\0\1\24\1\25"+
- "\1\0\1\61\1\0\1\11\2\52\1\0\1\26\4\0"+
- "\2\27\1\0\1\56\2\0\1\56\2\0\1\50\4\0"+
- "\2\24\7\0\1\24\4\0\2\30\7\0\1\30\4\0"+
- "\2\34\7\0\1\34\4\0\2\37\7\0\1\37\4\0"+
- "\2\42\7\0\1\42\4\0\2\62\7\0\1\62\4\0"+
- "\2\24\7\0\1\63\4\0\2\62\1\0\1\56\2\0"+
- "\1\56\2\0\1\62\4\0\2\24\1\0\1\61\1\0"+
- "\1\11\2\52\1\0\1\24\3\0";
+ "\1\2\1\3\1\4\1\2\1\5\1\6\3\2\1\7"+
+ "\1\10\1\11\2\2\1\12\1\13\2\14\23\0\3\3"+
+ "\1\15\1\0\1\16\1\0\1\16\1\17\2\0\1\16"+
+ "\1\0\1\12\2\0\1\3\1\0\1\3\2\4\1\15"+
+ "\1\0\1\16\1\0\1\16\1\17\2\0\1\16\1\0"+
+ "\1\12\2\0\1\4\1\0\2\3\2\5\2\0\2\20"+
+ "\1\21\2\0\1\20\1\0\1\12\2\0\1\5\3\0"+
+ "\1\6\1\0\1\6\3\0\1\17\7\0\1\6\1\0"+
+ "\2\3\1\22\1\5\1\23\3\0\1\22\4\0\1\12"+
+ "\2\0\1\22\3\0\1\10\15\0\1\10\3\0\1\11"+
+ "\15\0\1\11\1\0\2\3\1\12\1\15\1\0\1\16"+
+ "\1\0\1\16\1\17\2\0\1\24\1\25\1\12\2\0"+
+ "\1\12\3\0\1\26\13\0\1\27\1\0\1\26\3\0"+
+ "\1\14\14\0\2\14\1\0\2\3\2\15\2\0\2\30"+
+ "\1\17\2\0\1\30\1\0\1\12\2\0\1\15\1\0"+
+ "\2\3\1\16\12\0\1\3\2\0\1\16\1\0\2\3"+
+ "\1\17\1\15\1\23\3\0\1\17\4\0\1\12\2\0"+
+ "\1\17\3\0\1\20\1\5\14\0\1\20\1\0\2\3"+
+ "\1\21\1\5\1\23\3\0\1\21\4\0\1\12\2\0"+
+ "\1\21\3\0\1\23\1\0\1\23\3\0\1\17\7\0"+
+ "\1\23\1\0\2\3\1\24\1\15\4\0\1\17\4\0"+
+ "\1\12\2\0\1\24\3\0\1\25\12\0\1\24\2\0"+
+ "\1\25\3\0\1\27\13\0\1\27\1\0\1\27\3\0"+
+ "\1\30\1\15\14\0\1\30";
private static int [] zzUnpackTrans() {
- int [] result = new int[658];
+ int [] result = new int[396];
int offset = 0;
offset = zzUnpackTrans(ZZ_TRANS_PACKED_0, offset, result);
return result;
@@ -266,11 +339,11 @@
private static final int [] ZZ_ATTRIBUTE = zzUnpackAttribute();
private static final String ZZ_ATTRIBUTE_PACKED_0 =
- "\1\0\1\11\3\1\1\11\1\1\13\0\4\1\2\0"+
- "\1\1\1\0\17\1\1\0\1\1\3\0\5\1";
+ "\1\0\1\11\13\1\1\0\1\1\1\0\1\1\1\0"+
+ "\2\1\2\0\1\1\1\0";
private static int [] zzUnpackAttribute() {
- int [] result = new int[51];
+ int [] result = new int[24];
int offset = 0;
offset = zzUnpackAttribute(ZZ_ATTRIBUTE_PACKED_0, offset, result);
return result;
@@ -304,9 +377,6 @@
/** the textposition at the last accepting state */
private int zzMarkedPos;
- /** the textposition at the last state to be included in yytext */
- private int zzPushbackPos;
-
/** the current text position in the buffer */
private int zzCurrentPos;
@@ -337,57 +407,74 @@
/** zzAtEOF == true <=> the scanner is at the EOF */
private boolean zzAtEOF;
+ /** denotes if the user-EOF-code has already been executed */
+ private boolean zzEOFDone;
+
+ /**
+ * The number of occupied positions in zzBuffer beyond zzEndRead.
+ * When a lead/high surrogate has been read from the input stream
+ * into the final zzBuffer position, this will have a value of 1;
+ * otherwise, it will have a value of 0.
+ */
+ private int zzFinalHighSurrogate = 0;
+
/* user code: */
+ /** Alphanumeric sequences */
+ public static final int WORD_TYPE = StandardTokenizer.ALPHANUM;
+
+ /** Numbers */
+ public static final int NUMERIC_TYPE = StandardTokenizer.NUM;
+
+ /**
+ * Chars in class \p{Line_Break = Complex_Context} are from South East Asian
+ * scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept
+ * together as as a single token rather than broken up, because the logic
+ * required to break them at word boundaries is too complex for UAX#29.
+ *
+ * See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA
+ */
+ public static final int SOUTH_EAST_ASIAN_TYPE = StandardTokenizer.SOUTHEAST_ASIAN;
+
+ public static final int IDEOGRAPHIC_TYPE = StandardTokenizer.IDEOGRAPHIC;
+
+ public static final int HIRAGANA_TYPE = StandardTokenizer.HIRAGANA;
+
+ public static final int KATAKANA_TYPE = StandardTokenizer.KATAKANA;
+
+ public static final int HANGUL_TYPE = StandardTokenizer.HANGUL;
-public static final int ALPHANUM = StandardTokenizer.ALPHANUM;
-public static final int APOSTROPHE = StandardTokenizer.APOSTROPHE;
-public static final int ACRONYM = StandardTokenizer.ACRONYM;
-public static final int COMPANY = StandardTokenizer.COMPANY;
-public static final int EMAIL = StandardTokenizer.EMAIL;
-public static final int HOST = StandardTokenizer.HOST;
-public static final int NUM = StandardTokenizer.NUM;
-public static final int CJ = StandardTokenizer.CJ;
-/**
- * @deprecated this solves a bug where HOSTs that end with '.' are identified
- * as ACRONYMs. It is deprecated and will be removed in the next
- * release.
- */
-public static final int ACRONYM_DEP = StandardTokenizer.ACRONYM_DEP;
-
-public static final String [] TOKEN_TYPES = StandardTokenizer.TOKEN_TYPES;
-
-public final int yychar()
-{
+ public final int yychar()
+ {
return yychar;
-}
+ }
-/**
- * Fills Lucene token with the current token text.
- */
-final void getText(Token t) {
- t.setTermBuffer(zzBuffer, zzStartRead, zzMarkedPos-zzStartRead);
-}
+ /**
+ * Fills CharTermAttribute with the current token text.
+ */
+ public final void getText(CharTermAttribute t) {
+ t.copyBuffer(zzBuffer, zzStartRead, zzMarkedPos-zzStartRead);
+ }
+
+ /**
+ * Sets the scanner buffer size in chars
+ */
+ public final void setBufferSize(int numChars) {
+ ZZ_BUFFERSIZE = numChars;
+ char[] newZzBuffer = new char[ZZ_BUFFERSIZE];
+ System.arraycopy(zzBuffer, 0, newZzBuffer, 0, Math.min(zzBuffer.length, ZZ_BUFFERSIZE));
+ zzBuffer = newZzBuffer;
+ }
/**
* Creates a new scanner
- * There is also a java.io.InputStream version of this constructor.
*
* @param in the java.io.Reader to read input from.
*/
- StandardTokenizerImpl(java.io.Reader in) {
+ public StandardTokenizerImpl(java.io.Reader in) {
this.zzReader = in;
}
- /**
- * Creates a new scanner.
- * There is also java.io.Reader version of this constructor.
- *
- * @param in the java.io.Inputstream to read input from.
- */
- StandardTokenizerImpl(java.io.InputStream in) {
- this(new java.io.InputStreamReader(in));
- }
/**
* Unpacks the compressed character translation table.
@@ -396,10 +483,10 @@
* @return the unpacked character translation table
*/
private static char [] zzUnpackCMap(String packed) {
- char [] map = new char[0x10000];
+ char [] map = new char[0x110000];
int i = 0; /* index in packed string */
int j = 0; /* index in unpacked array */
- while (i < 1154) {
+ while (i < 2836) {
int count = packed.charAt(i++);
char value = packed.charAt(i++);
do map[j++] = value; while (--count > 0);
@@ -419,6 +506,8 @@
/* first: make room (if you can) */
if (zzStartRead > 0) {
+ zzEndRead += zzFinalHighSurrogate;
+ zzFinalHighSurrogate = 0;
System.arraycopy(zzBuffer, zzStartRead,
zzBuffer, 0,
zzEndRead-zzStartRead);
@@ -427,29 +516,35 @@
zzEndRead-= zzStartRead;
zzCurrentPos-= zzStartRead;
zzMarkedPos-= zzStartRead;
- zzPushbackPos-= zzStartRead;
zzStartRead = 0;
}
- /* is the buffer big enough? */
- if (zzCurrentPos >= zzBuffer.length) {
- /* if not: blow it up */
- char newBuffer[] = new char[zzCurrentPos*2];
- System.arraycopy(zzBuffer, 0, newBuffer, 0, zzBuffer.length);
- zzBuffer = newBuffer;
- }
- /* finally: fill the buffer with new input */
- int numRead = zzReader.read(zzBuffer, zzEndRead,
- zzBuffer.length-zzEndRead);
-
- if (numRead < 0) {
- return true;
+ /* fill the buffer with new input */
+ int requested = zzBuffer.length - zzEndRead - zzFinalHighSurrogate;
+ int totalRead = 0;
+ while (totalRead < requested) {
+ int numRead = zzReader.read(zzBuffer, zzEndRead + totalRead, requested - totalRead);
+ if (numRead == -1) {
+ break;
+ }
+ totalRead += numRead;
}
- else {
- zzEndRead+= numRead;
+
+ if (totalRead > 0) {
+ zzEndRead += totalRead;
+ if (totalRead == requested) { /* possibly more input available */
+ if (Character.isHighSurrogate(zzBuffer[zzEndRead - 1])) {
+ --zzEndRead;
+ zzFinalHighSurrogate = 1;
+ if (totalRead == 1) { return true; }
+ }
+ }
return false;
}
+
+ // totalRead = 0: End of stream
+ return true;
}
@@ -473,16 +568,22 @@
* cannot be reused (internal buffer is discarded and lost).
* Lexical state is set to ZZ_INITIAL .
*
+ * Internal scan buffer is resized down to its initial length, if it has grown.
+ *
* @param reader the new input stream
*/
public final void yyreset(java.io.Reader reader) {
zzReader = reader;
zzAtBOL = true;
zzAtEOF = false;
+ zzEOFDone = false;
zzEndRead = zzStartRead = 0;
- zzCurrentPos = zzMarkedPos = zzPushbackPos = 0;
+ zzCurrentPos = zzMarkedPos = 0;
+ zzFinalHighSurrogate = 0;
yyline = yychar = yycolumn = 0;
zzLexicalState = YYINITIAL;
+ if (zzBuffer.length > ZZ_BUFFERSIZE)
+ zzBuffer = new char[ZZ_BUFFERSIZE];
}
@@ -610,14 +711,22 @@
zzCurrentPosL = zzCurrentPos = zzStartRead = zzMarkedPosL;
- zzState = zzLexicalState;
+ zzState = ZZ_LEXSTATE[zzLexicalState];
+ // set up zzAction for empty match case:
+ int zzAttributes = zzAttrL[zzState];
+ if ( (zzAttributes & 1) == 1 ) {
+ zzAction = zzState;
+ }
+
zzForAction: {
while (true) {
- if (zzCurrentPosL < zzEndReadL)
- zzInput = zzBufferL[zzCurrentPosL++];
+ if (zzCurrentPosL < zzEndReadL) {
+ zzInput = Character.codePointAt(zzBufferL, zzCurrentPosL, zzEndReadL);
+ zzCurrentPosL += Character.charCount(zzInput);
+ }
else if (zzAtEOF) {
zzInput = YYEOF;
break zzForAction;
@@ -637,14 +746,15 @@
break zzForAction;
}
else {
- zzInput = zzBufferL[zzCurrentPosL++];
+ zzInput = Character.codePointAt(zzBufferL, zzCurrentPosL, zzEndReadL);
+ zzCurrentPosL += Character.charCount(zzInput);
}
}
int zzNext = zzTransL[ zzRowMapL[zzState] + zzCMapL[zzInput] ];
if (zzNext == -1) break zzForAction;
zzState = zzNext;
- int zzAttributes = zzAttrL[zzState];
+ zzAttributes = zzAttrL[zzState];
if ( (zzAttributes & 1) == 1 ) {
zzAction = zzState;
zzMarkedPosL = zzCurrentPosL;
@@ -658,50 +768,44 @@
zzMarkedPos = zzMarkedPosL;
switch (zzAction < 0 ? zzAction : ZZ_ACTION[zzAction]) {
- case 4:
- { return HOST;
+ case 1:
+ { /* Break so we don't hit fall-through warning: */ break; /* Not numeric, word, ideographic, hiragana, or SE Asian -- ignore it. */
}
+ case 9: break;
+ case 2:
+ { return WORD_TYPE;
+ }
+ case 10: break;
+ case 3:
+ { return HANGUL_TYPE;
+ }
case 11: break;
- case 9:
- { return ACRONYM;
+ case 4:
+ { return NUMERIC_TYPE;
}
case 12: break;
- case 8:
- { return ACRONYM_DEP;
+ case 5:
+ { return KATAKANA_TYPE;
}
case 13: break;
- case 1:
- { /* ignore */
+ case 6:
+ { return IDEOGRAPHIC_TYPE;
}
case 14: break;
- case 5:
- { return NUM;
+ case 7:
+ { return HIRAGANA_TYPE;
}
case 15: break;
- case 3:
- { return CJ;
+ case 8:
+ { return SOUTH_EAST_ASIAN_TYPE;
}
case 16: break;
- case 2:
- { return ALPHANUM;
- }
- case 17: break;
- case 7:
- { return COMPANY;
- }
- case 18: break;
- case 6:
- { return APOSTROPHE;
- }
- case 19: break;
- case 10:
- { return EMAIL;
- }
- case 20: break;
default:
if (zzInput == YYEOF && zzStartRead == zzCurrentPos) {
zzAtEOF = true;
- return YYEOF;
+ {
+ return StandardTokenizerInterface.YYEOF;
+ }
}
else {
zzScanError(ZZ_NO_MATCH);
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex 17 Aug 2012 14:55:14 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex 16 Dec 2014 11:32:10 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.analysis.standard;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,120 +17,186 @@
* limitations under the License.
*/
-/*
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
-NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
- the tokenizer, remember to use JRE 1.4 to run jflex (before
- Lucene 3.0). This grammar now uses constructs (eg :digit:,
- :letter:) whose meaning can vary according to the JRE used to
- run jflex. See
- https://issues.apache.org/jira/browse/LUCENE-1126 for details.
-
-*/
-
-import org.apache.lucene.analysis.Token;
-
+/**
+ * This class implements Word Break rules from the Unicode Text Segmentation
+ * algorithm, as specified in
+ * Unicode Standard Annex #29 .
+ *
+ * Tokens produced are of the following types:
+ *
+ * <ALPHANUM>: A sequence of alphabetic and numeric characters
+ * <NUM>: A number
+ * <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast
+ * Asian languages, including Thai, Lao, Myanmar, and Khmer
+ * <IDEOGRAPHIC>: A single CJKV ideographic character
+ * <HIRAGANA>: A single hiragana character
+ * <KATAKANA>: A sequence of katakana characters
+ * <HANGUL>: A sequence of Hangul characters
+ *
+ */
%%
-%class StandardTokenizerImpl
-%unicode
+%unicode 6.3
%integer
+%final
+%public
+%class StandardTokenizerImpl
+%implements StandardTokenizerInterface
%function getNextToken
-%pack
%char
+%buffer 255
+// UAX#29 WB4. X (Extend | Format)* --> X
+//
+HangulEx = [\p{Script:Hangul}&&[\p{WB:ALetter}\p{WB:Hebrew_Letter}]] [\p{WB:Format}\p{WB:Extend}]*
+HebrewOrALetterEx = [\p{WB:HebrewLetter}\p{WB:ALetter}] [\p{WB:Format}\p{WB:Extend}]*
+NumericEx = [\p{WB:Numeric}[\p{Blk:HalfAndFullForms}&&\p{Nd}]] [\p{WB:Format}\p{WB:Extend}]*
+KatakanaEx = \p{WB:Katakana} [\p{WB:Format}\p{WB:Extend}]*
+MidLetterEx = [\p{WB:MidLetter}\p{WB:MidNumLet}\p{WB:SingleQuote}] [\p{WB:Format}\p{WB:Extend}]*
+MidNumericEx = [\p{WB:MidNum}\p{WB:MidNumLet}\p{WB:SingleQuote}] [\p{WB:Format}\p{WB:Extend}]*
+ExtendNumLetEx = \p{WB:ExtendNumLet} [\p{WB:Format}\p{WB:Extend}]*
+HanEx = \p{Script:Han} [\p{WB:Format}\p{WB:Extend}]*
+HiraganaEx = \p{Script:Hiragana} [\p{WB:Format}\p{WB:Extend}]*
+SingleQuoteEx = \p{WB:Single_Quote} [\p{WB:Format}\p{WB:Extend}]*
+DoubleQuoteEx = \p{WB:Double_Quote} [\p{WB:Format}\p{WB:Extend}]*
+HebrewLetterEx = \p{WB:Hebrew_Letter} [\p{WB:Format}\p{WB:Extend}]*
+RegionalIndicatorEx = \p{WB:RegionalIndicator} [\p{WB:Format}\p{WB:Extend}]*
+ComplexContextEx = \p{LB:Complex_Context} [\p{WB:Format}\p{WB:Extend}]*
+
%{
+ /** Alphanumeric sequences */
+ public static final int WORD_TYPE = StandardTokenizer.ALPHANUM;
+
+ /** Numbers */
+ public static final int NUMERIC_TYPE = StandardTokenizer.NUM;
+
+ /**
+ * Chars in class \p{Line_Break = Complex_Context} are from South East Asian
+ * scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept
+ * together as as a single token rather than broken up, because the logic
+ * required to break them at word boundaries is too complex for UAX#29.
+ *
+ * See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA
+ */
+ public static final int SOUTH_EAST_ASIAN_TYPE = StandardTokenizer.SOUTHEAST_ASIAN;
+
+ public static final int IDEOGRAPHIC_TYPE = StandardTokenizer.IDEOGRAPHIC;
+
+ public static final int HIRAGANA_TYPE = StandardTokenizer.HIRAGANA;
+
+ public static final int KATAKANA_TYPE = StandardTokenizer.KATAKANA;
+
+ public static final int HANGUL_TYPE = StandardTokenizer.HANGUL;
-public static final int ALPHANUM = StandardTokenizer.ALPHANUM;
-public static final int APOSTROPHE = StandardTokenizer.APOSTROPHE;
-public static final int ACRONYM = StandardTokenizer.ACRONYM;
-public static final int COMPANY = StandardTokenizer.COMPANY;
-public static final int EMAIL = StandardTokenizer.EMAIL;
-public static final int HOST = StandardTokenizer.HOST;
-public static final int NUM = StandardTokenizer.NUM;
-public static final int CJ = StandardTokenizer.CJ;
-/**
- * @deprecated this solves a bug where HOSTs that end with '.' are identified
- * as ACRONYMs. It is deprecated and will be removed in the next
- * release.
- */
-public static final int ACRONYM_DEP = StandardTokenizer.ACRONYM_DEP;
-
-public static final String [] TOKEN_TYPES = StandardTokenizer.TOKEN_TYPES;
-
-public final int yychar()
-{
+ public final int yychar()
+ {
return yychar;
-}
+ }
-/**
- * Fills Lucene token with the current token text.
- */
-final void getText(Token t) {
- t.setTermBuffer(zzBuffer, zzStartRead, zzMarkedPos-zzStartRead);
-}
+ /**
+ * Fills CharTermAttribute with the current token text.
+ */
+ public final void getText(CharTermAttribute t) {
+ t.copyBuffer(zzBuffer, zzStartRead, zzMarkedPos-zzStartRead);
+ }
+
+ /**
+ * Sets the scanner buffer size in chars
+ */
+ public final void setBufferSize(int numChars) {
+ ZZ_BUFFERSIZE = numChars;
+ char[] newZzBuffer = new char[ZZ_BUFFERSIZE];
+ System.arraycopy(zzBuffer, 0, newZzBuffer, 0, Math.min(zzBuffer.length, ZZ_BUFFERSIZE));
+ zzBuffer = newZzBuffer;
+ }
%}
-THAI = [\u0E00-\u0E59]
+%%
-// basic word: a sequence of digits & letters (includes Thai to enable ThaiAnalyzer to function)
-ALPHANUM = ({LETTER}|{THAI}|[:digit:])+
+// UAX#29 WB1. sot ÷
+// WB2. ÷ eot
+//
+<> { return StandardTokenizerInterface.YYEOF; }
-// internal apostrophes: O'Reilly, you're, O'Reilly's
-// use a post-filter to remove possesives
-APOSTROPHE = {ALPHA} ("'" {ALPHA})+
+// UAX#29 WB8. Numeric × Numeric
+// WB11. Numeric (MidNum | MidNumLet | Single_Quote) × Numeric
+// WB12. Numeric × (MidNum | MidNumLet | Single_Quote) Numeric
+// WB13a. (ALetter | Hebrew_Letter | Numeric | Katakana | ExtendNumLet) × ExtendNumLet
+// WB13b. ExtendNumLet × (ALetter | Hebrew_Letter | Numeric | Katakana)
+//
+{ExtendNumLetEx}* {NumericEx} ( ( {ExtendNumLetEx}* | {MidNumericEx} ) {NumericEx} )* {ExtendNumLetEx}*
+ { return NUMERIC_TYPE; }
-// acronyms: U.S.A., I.B.M., etc.
-// use a post-filter to remove dots
-ACRONYM = {LETTER} "." ({LETTER} ".")+
+// subset of the below for typing purposes only!
+{HangulEx}+
+ { return HANGUL_TYPE; }
+
+{KatakanaEx}+
+ { return KATAKANA_TYPE; }
-ACRONYM_DEP = {ALPHANUM} "." ({ALPHANUM} ".")+
+// UAX#29 WB5. (ALetter | Hebrew_Letter) × (ALetter | Hebrew_Letter)
+// WB6. (ALetter | Hebrew_Letter) × (MidLetter | MidNumLet | Single_Quote) (ALetter | Hebrew_Letter)
+// WB7. (ALetter | Hebrew_Letter) (MidLetter | MidNumLet | Single_Quote) × (ALetter | Hebrew_Letter)
+// WB7a. Hebrew_Letter × Single_Quote
+// WB7b. Hebrew_Letter × Double_Quote Hebrew_Letter
+// WB7c. Hebrew_Letter Double_Quote × Hebrew_Letter
+// WB9. (ALetter | Hebrew_Letter) × Numeric
+// WB10. Numeric × (ALetter | Hebrew_Letter)
+// WB13. Katakana × Katakana
+// WB13a. (ALetter | Hebrew_Letter | Numeric | Katakana | ExtendNumLet) × ExtendNumLet
+// WB13b. ExtendNumLet × (ALetter | Hebrew_Letter | Numeric | Katakana)
+//
+{ExtendNumLetEx}* ( {KatakanaEx} ( {ExtendNumLetEx}* {KatakanaEx} )*
+ | ( {HebrewLetterEx} ( {SingleQuoteEx} | {DoubleQuoteEx} {HebrewLetterEx} )
+ | {NumericEx} ( ( {ExtendNumLetEx}* | {MidNumericEx} ) {NumericEx} )*
+ | {HebrewOrALetterEx} ( ( {ExtendNumLetEx}* | {MidLetterEx} ) {HebrewOrALetterEx} )*
+ )+
+ )
+({ExtendNumLetEx}+ ( {KatakanaEx} ( {ExtendNumLetEx}* {KatakanaEx} )*
+ | ( {HebrewLetterEx} ( {SingleQuoteEx} | {DoubleQuoteEx} {HebrewLetterEx} )
+ | {NumericEx} ( ( {ExtendNumLetEx}* | {MidNumericEx} ) {NumericEx} )*
+ | {HebrewOrALetterEx} ( ( {ExtendNumLetEx}* | {MidLetterEx} ) {HebrewOrALetterEx} )*
+ )+
+ )
+)*
+{ExtendNumLetEx}*
+ { return WORD_TYPE; }
-// company names like AT&T and Excite@Home.
-COMPANY = {ALPHA} ("&"|"@") {ALPHA}
-// email addresses
-EMAIL = {ALPHANUM} (("."|"-"|"_") {ALPHANUM})* "@" {ALPHANUM} (("."|"-") {ALPHANUM})+
+// From UAX #29:
+//
+// [C]haracters with the Line_Break property values of Contingent_Break (CB),
+// Complex_Context (SA/South East Asian), and XX (Unknown) are assigned word
+// boundary property values based on criteria outside of the scope of this
+// annex. That means that satisfactory treatment of languages like Chinese
+// or Thai requires special handling.
+//
+// In Unicode 6.3, only one character has the \p{Line_Break = Contingent_Break}
+// property: U+FFFC (  ) OBJECT REPLACEMENT CHARACTER.
+//
+// In the ICU implementation of UAX#29, \p{Line_Break = Complex_Context}
+// character sequences (from South East Asian scripts like Thai, Myanmar, Khmer,
+// Lao, etc.) are kept together. This grammar does the same below.
+//
+// See also the Unicode Line Breaking Algorithm:
+//
+// http://www.unicode.org/reports/tr14/#SA
+//
+{ComplexContextEx}+ { return SOUTH_EAST_ASIAN_TYPE; }
-// hostname
-HOST = {ALPHANUM} ((".") {ALPHANUM})+
+// UAX#29 WB14. Any ÷ Any
+//
+{HanEx} { return IDEOGRAPHIC_TYPE; }
+{HiraganaEx} { return HIRAGANA_TYPE; }
-// floating point, serial, model numbers, ip addresses, etc.
-// every other segment must have at least one digit
-NUM = ({ALPHANUM} {P} {HAS_DIGIT}
- | {HAS_DIGIT} {P} {ALPHANUM}
- | {ALPHANUM} ({P} {HAS_DIGIT} {P} {ALPHANUM})+
- | {HAS_DIGIT} ({P} {ALPHANUM} {P} {HAS_DIGIT})+
- | {ALPHANUM} {P} {HAS_DIGIT} ({P} {ALPHANUM} {P} {HAS_DIGIT})+
- | {HAS_DIGIT} {P} {ALPHANUM} ({P} {HAS_DIGIT} {P} {ALPHANUM})+)
-// punctuation
-P = ("_"|"-"|"/"|"."|",")
-
-// at least one digit
-HAS_DIGIT = ({LETTER}|[:digit:])* [:digit:] ({LETTER}|[:digit:])*
-
-ALPHA = ({LETTER})+
-
-// From the JFlex manual: "the expression that matches everything of not matched by is !(!|)"
-LETTER = !(![:letter:]|{CJ})
-
-// Chinese and Japanese (but NOT Korean, which is included in [:letter:])
-CJ = [\u3100-\u312f\u3040-\u309F\u30A0-\u30FF\u31F0-\u31FF\u3300-\u337f\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff65-\uff9f]
-
-WHITESPACE = \r\n | [ \r\n\t\f]
-
-%%
-
-{ALPHANUM} { return ALPHANUM; }
-{APOSTROPHE} { return APOSTROPHE; }
-{ACRONYM} { return ACRONYM; }
-{COMPANY} { return COMPANY; }
-{EMAIL} { return EMAIL; }
-{HOST} { return HOST; }
-{NUM} { return NUM; }
-{CJ} { return CJ; }
-{ACRONYM_DEP} { return ACRONYM_DEP; }
-
-/** Ignore the rest */
-. | {WHITESPACE} { /* ignore */ }
+// UAX#29 WB3. CR × LF
+// WB3a. (Newline | CR | LF) ÷
+// WB3b. ÷ (Newline | CR | LF)
+// WB13c. Regional_Indicator × Regional_Indicator
+// WB14. Any ÷ Any
+//
+{RegionalIndicatorEx} {RegionalIndicatorEx}+ | [^]
+ { /* Break so we don't hit fall-through warning: */ break; /* Not numeric, word, ideographic, hiragana, or SE Asian -- ignore it. */ }
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/StandardTokenizerInterface.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/UAX29URLEmailAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerImpl.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/package.html
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/analysis/standard/package.html,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/package.html 17 Aug 2012 14:55:14 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/analysis/standard/package.html 16 Dec 2014 11:32:10 -0000 1.1.2.1
@@ -17,10 +17,53 @@
-->
-
-
+
-A fast grammar-based tokenizer constructed with JFlex.
+
+Fast, general-purpose grammar-based tokenizers.
+
+The org.apache.lucene.analysis.standard
package contains three
+ fast grammar-based tokenizers constructed with JFlex:
+
+ {@link org.apache.lucene.analysis.standard.StandardTokenizer}:
+ as of Lucene 3.1, implements the Word Break rules from the Unicode Text
+ Segmentation algorithm, as specified in
+ Unicode Standard Annex #29 .
+ Unlike UAX29URLEmailTokenizer
, URLs and email addresses are
+ not tokenized as single tokens, but are instead split up into
+ tokens according to the UAX#29 word break rules.
+
+ {@link org.apache.lucene.analysis.standard.StandardAnalyzer StandardAnalyzer} includes
+ {@link org.apache.lucene.analysis.standard.StandardTokenizer StandardTokenizer},
+ {@link org.apache.lucene.analysis.standard.StandardFilter StandardFilter},
+ {@link org.apache.lucene.analysis.core.LowerCaseFilter LowerCaseFilter}
+ and {@link org.apache.lucene.analysis.core.StopFilter StopFilter}.
+ When the Version
specified in the constructor is lower than
+ 3.1, the {@link org.apache.lucene.analysis.standard.ClassicTokenizer ClassicTokenizer}
+ implementation is invoked.
+ {@link org.apache.lucene.analysis.standard.ClassicTokenizer ClassicTokenizer}:
+ this class was formerly (prior to Lucene 3.1) named
+ StandardTokenizer
. (Its tokenization rules are not
+ based on the Unicode Text Segmentation algorithm.)
+ {@link org.apache.lucene.analysis.standard.ClassicAnalyzer ClassicAnalyzer} includes
+ {@link org.apache.lucene.analysis.standard.ClassicTokenizer ClassicTokenizer},
+ {@link org.apache.lucene.analysis.standard.StandardFilter StandardFilter},
+ {@link org.apache.lucene.analysis.core.LowerCaseFilter LowerCaseFilter}
+ and {@link org.apache.lucene.analysis.core.StopFilter StopFilter}.
+
+ {@link org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer UAX29URLEmailTokenizer}:
+ implements the Word Break rules from the Unicode Text Segmentation
+ algorithm, as specified in
+ Unicode Standard Annex #29 .
+ URLs and email addresses are also tokenized according to the relevant RFCs.
+
+ {@link org.apache.lucene.analysis.standard.UAX29URLEmailAnalyzer UAX29URLEmailAnalyzer} includes
+ {@link org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer UAX29URLEmailTokenizer},
+ {@link org.apache.lucene.analysis.standard.StandardFilter StandardFilter},
+ {@link org.apache.lucene.analysis.core.LowerCaseFilter LowerCaseFilter}
+ and {@link org.apache.lucene.analysis.core.StopFilter StopFilter}.
+
+
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std31/ASCIITLD.jflex-macro'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std31/SUPPLEMENTARY.jflex-macro'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std31/StandardTokenizerImpl31.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std31/StandardTokenizerImpl31.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std31/UAX29URLEmailTokenizerImpl31.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std31/UAX29URLEmailTokenizerImpl31.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std31/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std34/ASCIITLD.jflex-macro'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std34/SUPPLEMENTARY.jflex-macro'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std34/StandardTokenizerImpl34.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std34/StandardTokenizerImpl34.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std34/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std36/ASCIITLD.jflex-macro'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std36/SUPPLEMENTARY.jflex-macro'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std36/UAX29URLEmailTokenizerImpl36.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std36/UAX29URLEmailTokenizerImpl36.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std36/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std40/ASCIITLD.jflex-macro'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std40/SUPPLEMENTARY.jflex-macro'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std40/StandardTokenizerImpl40.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std40/StandardTokenizerImpl40.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std40/UAX29URLEmailTokenizerImpl40.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std40/UAX29URLEmailTokenizerImpl40.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/standard/std40/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sv/SwedishAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sv/SwedishLightStemFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sv/SwedishLightStemFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sv/SwedishLightStemmer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/sv/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/FSTSynonymFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/SlowSynonymFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/SlowSynonymFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/SlowSynonymMap.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/SolrSynonymParser.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/SynonymFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/SynonymFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/SynonymMap.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/WordnetSynonymParser.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/synonym/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/th/ThaiAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/th/ThaiTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/th/ThaiTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/th/ThaiWordFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/th/ThaiWordFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/th/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/CharTermAttribute.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/CharTermAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/FlagsAttribute.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/FlagsAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/KeywordAttribute.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/KeywordAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/OffsetAttribute.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/OffsetAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/PackedTokenAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/PayloadAttribute.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/PayloadAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/PositionIncrementAttribute.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/PositionIncrementAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/PositionLengthAttribute.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/PositionLengthAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/TermToBytesRefAttribute.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/TypeAttribute.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/TypeAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tokenattributes/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tr/ApostropheFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tr/ApostropheFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tr/TurkishAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tr/TurkishLowerCaseFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tr/TurkishLowerCaseFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/tr/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/AbstractAnalysisFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/AnalysisSPILoader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/CharArrayIterator.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/CharArrayMap.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/CharArraySet.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/CharFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/CharTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/CharacterUtils.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/ClasspathResourceLoader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/ElisionFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/ElisionFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/FilesystemResourceLoader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/FilteringTokenFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/MultiTermAwareComponent.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/OpenStringBuilder.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/ResourceLoader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/ResourceLoaderAware.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/RollingCharBuffer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/SegmentingTokenizerBase.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/StemmerUtil.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/StopwordAnalyzerBase.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/TokenFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/TokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/WordlistLoader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/util/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.jflex'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/analysis/wikipedia/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/BlockTermState.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/Codec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/CodecUtil.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/DocValuesConsumer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/DocValuesFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/DocValuesProducer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/FieldInfosFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/FieldInfosReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/FieldInfosWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/FieldsConsumer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/FieldsProducer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/FilterCodec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/LiveDocsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/MappingMultiDocsAndPositionsEnum.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/MappingMultiDocsEnum.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/MultiLevelSkipListReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/MultiLevelSkipListWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/NormsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/PostingsBaseFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/PostingsConsumer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/PostingsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/PostingsReaderBase.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/PostingsWriterBase.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/SegmentInfoFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/SegmentInfoReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/SegmentInfoWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/StoredFieldsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/StoredFieldsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/StoredFieldsWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/TermStats.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/TermVectorsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/TermVectorsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/TermVectorsWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/TermsConsumer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/blocktree/BlockTreeTermsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/blocktree/FieldReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/blocktree/IntersectTermsEnum.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/blocktree/IntersectTermsEnumFrame.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/blocktree/SegmentTermsEnumFrame.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/blocktree/Stats.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/blocktree/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/CompressingStoredFieldsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/CompressingStoredFieldsIndexReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/CompressingStoredFieldsIndexWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/CompressingStoredFieldsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/CompressingStoredFieldsWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/CompressingTermVectorsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/CompressingTermVectorsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/CompressingTermVectorsWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/CompressionMode.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/Compressor.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/Decompressor.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/LZ4.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/compressing/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xCodec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xFieldInfosFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xFieldInfosReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xFields.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xNormsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xNormsProducer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xPostingsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xSegmentInfoFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xSegmentInfoReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xSkipListReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xStoredFieldsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xStoredFieldsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xTermVectorsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/Lucene3xTermVectorsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/SegmentTermDocs.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/SegmentTermEnum.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/SegmentTermPositions.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/TermBuffer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/TermInfo.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/TermInfosReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/TermInfosReaderIndex.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene3x/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/BitVector.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40Codec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40DocValuesFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40DocValuesReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40FieldInfosFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40FieldInfosReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40LiveDocsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40NormsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40PostingsBaseFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40PostingsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40PostingsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40SegmentInfoFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40SegmentInfoReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40SegmentInfoWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40SkipListReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40StoredFieldsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40StoredFieldsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40StoredFieldsWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40TermVectorsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40TermVectorsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/Lucene40TermVectorsWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene40/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/ForUtil.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/Lucene41Codec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/Lucene41PostingsBaseFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/Lucene41PostingsReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/Lucene41PostingsWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/Lucene41SkipReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/Lucene41SkipWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/Lucene41StoredFieldsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene41/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene410/Lucene410Codec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene410/Lucene410DocValuesConsumer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene410/Lucene410DocValuesFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene410/Lucene410DocValuesProducer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene410/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene42/Lucene42Codec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene42/Lucene42DocValuesProducer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene42/Lucene42FieldInfosFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene42/Lucene42FieldInfosReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene42/Lucene42NormsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene42/Lucene42TermVectorsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene42/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene45/Lucene45Codec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene45/Lucene45DocValuesConsumer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene45/Lucene45DocValuesFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene45/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene46/Lucene46Codec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene46/Lucene46FieldInfosFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene46/Lucene46FieldInfosReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene46/Lucene46FieldInfosWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene46/Lucene46SegmentInfoFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene46/Lucene46SegmentInfoReader.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene46/Lucene46SegmentInfoWriter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene46/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene49/Lucene49Codec.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene49/Lucene49DocValuesConsumer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene49/Lucene49DocValuesFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene49/Lucene49DocValuesProducer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene49/Lucene49NormsConsumer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene49/Lucene49NormsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene49/Lucene49NormsProducer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/lucene49/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/perfield/PerFieldDocValuesFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/perfield/PerFieldPostingsFormat.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/codecs/perfield/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/collation/CollationAttributeFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/collation/CollationKeyAnalyzer.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/collation/CollationKeyFilter.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/collation/CollationKeyFilterFactory.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/collation/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/collation/tokenattributes/CollatedTermAttributeImpl.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/collation/tokenattributes/package.html'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/AbstractField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/BinaryDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/ByteDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/CompressionTools.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/DateField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/document/DateTools.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/document/DateTools.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/document/DateTools.java 17 Aug 2012 14:54:53 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/document/DateTools.java 16 Dec 2014 11:31:59 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.document;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,10 +17,16 @@
* limitations under the License.
*/
+import org.apache.lucene.search.NumericRangeQuery; // for javadocs
+import org.apache.lucene.search.PrefixQuery;
+import org.apache.lucene.search.TermRangeQuery;
+import org.apache.lucene.util.NumericUtils; // for javadocs
+
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
+import java.util.Locale;
import java.util.TimeZone;
/**
@@ -31,35 +37,40 @@
*
* This class also helps you to limit the resolution of your dates. Do not
* save dates with a finer resolution than you really need, as then
- * RangeQuery and PrefixQuery will require more memory and become slower.
+ * {@link TermRangeQuery} and {@link PrefixQuery} will require more memory and become slower.
*
- *
Compared to {@link DateField} the strings generated by the methods
- * in this class take slightly more space, unless your selected resolution
- * is set to Resolution.DAY
or lower.
+ *
+ * Another approach is {@link NumericUtils}, which provides
+ * a sortable binary representation (prefix encoded) of numeric values, which
+ * date/time are.
+ * For indexing a {@link Date} or {@link Calendar}, just get the unix timestamp as
+ * long
using {@link Date#getTime} or {@link Calendar#getTimeInMillis} and
+ * index this as a numeric value with {@link LongField}
+ * and use {@link NumericRangeQuery} to query it.
*/
public class DateTools {
- private final static TimeZone GMT = TimeZone.getTimeZone("GMT");
+ final static TimeZone GMT = TimeZone.getTimeZone("GMT");
- private static final SimpleDateFormat YEAR_FORMAT = new SimpleDateFormat("yyyy");
- private static final SimpleDateFormat MONTH_FORMAT = new SimpleDateFormat("yyyyMM");
- private static final SimpleDateFormat DAY_FORMAT = new SimpleDateFormat("yyyyMMdd");
- private static final SimpleDateFormat HOUR_FORMAT = new SimpleDateFormat("yyyyMMddHH");
- private static final SimpleDateFormat MINUTE_FORMAT = new SimpleDateFormat("yyyyMMddHHmm");
- private static final SimpleDateFormat SECOND_FORMAT = new SimpleDateFormat("yyyyMMddHHmmss");
- private static final SimpleDateFormat MILLISECOND_FORMAT = new SimpleDateFormat("yyyyMMddHHmmssSSS");
- static {
- // times need to be normalized so the value doesn't depend on the
- // location the index is created/used:
- YEAR_FORMAT.setTimeZone(GMT);
- MONTH_FORMAT.setTimeZone(GMT);
- DAY_FORMAT.setTimeZone(GMT);
- HOUR_FORMAT.setTimeZone(GMT);
- MINUTE_FORMAT.setTimeZone(GMT);
- SECOND_FORMAT.setTimeZone(GMT);
- MILLISECOND_FORMAT.setTimeZone(GMT);
- }
+ private static final ThreadLocal TL_CAL = new ThreadLocal() {
+ @Override
+ protected Calendar initialValue() {
+ return Calendar.getInstance(GMT, Locale.ROOT);
+ }
+ };
+ //indexed by format length
+ private static final ThreadLocal TL_FORMATS = new ThreadLocal() {
+ @Override
+ protected SimpleDateFormat[] initialValue() {
+ SimpleDateFormat[] arr = new SimpleDateFormat[Resolution.MILLISECOND.formatLen+1];
+ for (Resolution resolution : Resolution.values()) {
+ arr[resolution.formatLen] = (SimpleDateFormat)resolution.format.clone();
+ }
+ return arr;
+ }
+ };
+
// cannot create, the class has static methods only
private DateTools() {}
@@ -70,7 +81,7 @@
* @param resolution the desired resolution, see
* {@link #round(Date, DateTools.Resolution)}
* @return a string in format yyyyMMddHHmmssSSS
or shorter,
- * depeding on resolution
; using GMT as timezone
+ * depending on resolution
; using GMT as timezone
*/
public static String dateToString(Date date, Resolution resolution) {
return timeToString(date.getTime(), resolution);
@@ -83,49 +94,11 @@
* @param resolution the desired resolution, see
* {@link #round(long, DateTools.Resolution)}
* @return a string in format yyyyMMddHHmmssSSS
or shorter,
- * depeding on resolution
; using GMT as timezone
+ * depending on resolution
; using GMT as timezone
*/
public static String timeToString(long time, Resolution resolution) {
- Calendar cal = Calendar.getInstance(GMT);
-
- //protected in JDK's prior to 1.4
- //cal.setTimeInMillis(round(time, resolution));
-
- cal.setTime(new Date(round(time, resolution)));
-
- String result;
- if (resolution == Resolution.YEAR) {
- synchronized (YEAR_FORMAT) {
- result = YEAR_FORMAT.format(cal.getTime());
- }
- } else if (resolution == Resolution.MONTH) {
- synchronized (MONTH_FORMAT) {
- result = MONTH_FORMAT.format(cal.getTime());
- }
- } else if (resolution == Resolution.DAY) {
- synchronized (DAY_FORMAT) {
- result = DAY_FORMAT.format(cal.getTime());
- }
- } else if (resolution == Resolution.HOUR) {
- synchronized (HOUR_FORMAT) {
- result = HOUR_FORMAT.format(cal.getTime());
- }
- } else if (resolution == Resolution.MINUTE) {
- synchronized (MINUTE_FORMAT) {
- result = MINUTE_FORMAT.format(cal.getTime());
- }
- } else if (resolution == Resolution.SECOND) {
- synchronized (SECOND_FORMAT) {
- result = SECOND_FORMAT.format(cal.getTime());
- }
- } else if (resolution == Resolution.MILLISECOND) {
- synchronized (MILLISECOND_FORMAT) {
- result = MILLISECOND_FORMAT.format(cal.getTime());
- }
- } else {
- throw new IllegalArgumentException("unknown resolution " + resolution);
- }
- return result;
+ final Date date = new Date(round(time, resolution));
+ return TL_FORMATS.get()[resolution.formatLen].format(date);
}
/**
@@ -153,39 +126,11 @@
* expected format
*/
public static Date stringToDate(String dateString) throws ParseException {
- Date date;
- if (dateString.length() == 4) {
- synchronized (YEAR_FORMAT) {
- date = YEAR_FORMAT.parse(dateString);
- }
- } else if (dateString.length() == 6) {
- synchronized (MONTH_FORMAT) {
- date = MONTH_FORMAT.parse(dateString);
- }
- } else if (dateString.length() == 8) {
- synchronized (DAY_FORMAT) {
- date = DAY_FORMAT.parse(dateString);
- }
- } else if (dateString.length() == 10) {
- synchronized (HOUR_FORMAT) {
- date = HOUR_FORMAT.parse(dateString);
- }
- } else if (dateString.length() == 12) {
- synchronized (MINUTE_FORMAT) {
- date = MINUTE_FORMAT.parse(dateString);
- }
- } else if (dateString.length() == 14) {
- synchronized (SECOND_FORMAT) {
- date = SECOND_FORMAT.parse(dateString);
- }
- } else if (dateString.length() == 17) {
- synchronized (MILLISECOND_FORMAT) {
- date = MILLISECOND_FORMAT.parse(dateString);
- }
- } else {
- throw new ParseException("Input is not valid date string: " + dateString, 0);
+ try {
+ return TL_FORMATS.get()[dateString.length()].parse(dateString);
+ } catch (Exception e) {
+ throw new ParseException("Input is not a valid date string: " + dateString, 0);
}
- return date;
}
/**
@@ -211,71 +156,68 @@
* @return the date with all values more precise than resolution
* set to 0 or 1, expressed as milliseconds since January 1, 1970, 00:00:00 GMT
*/
+ @SuppressWarnings("fallthrough")
public static long round(long time, Resolution resolution) {
- Calendar cal = Calendar.getInstance(GMT);
-
- // protected in JDK's prior to 1.4
- //cal.setTimeInMillis(time);
+ final Calendar calInstance = TL_CAL.get();
+ calInstance.setTimeInMillis(time);
- cal.setTime(new Date(time));
-
- if (resolution == Resolution.YEAR) {
- cal.set(Calendar.MONTH, 0);
- cal.set(Calendar.DAY_OF_MONTH, 1);
- cal.set(Calendar.HOUR_OF_DAY, 0);
- cal.set(Calendar.MINUTE, 0);
- cal.set(Calendar.SECOND, 0);
- cal.set(Calendar.MILLISECOND, 0);
- } else if (resolution == Resolution.MONTH) {
- cal.set(Calendar.DAY_OF_MONTH, 1);
- cal.set(Calendar.HOUR_OF_DAY, 0);
- cal.set(Calendar.MINUTE, 0);
- cal.set(Calendar.SECOND, 0);
- cal.set(Calendar.MILLISECOND, 0);
- } else if (resolution == Resolution.DAY) {
- cal.set(Calendar.HOUR_OF_DAY, 0);
- cal.set(Calendar.MINUTE, 0);
- cal.set(Calendar.SECOND, 0);
- cal.set(Calendar.MILLISECOND, 0);
- } else if (resolution == Resolution.HOUR) {
- cal.set(Calendar.MINUTE, 0);
- cal.set(Calendar.SECOND, 0);
- cal.set(Calendar.MILLISECOND, 0);
- } else if (resolution == Resolution.MINUTE) {
- cal.set(Calendar.SECOND, 0);
- cal.set(Calendar.MILLISECOND, 0);
- } else if (resolution == Resolution.SECOND) {
- cal.set(Calendar.MILLISECOND, 0);
- } else if (resolution == Resolution.MILLISECOND) {
- // don't cut off anything
- } else {
- throw new IllegalArgumentException("unknown resolution " + resolution);
+ switch (resolution) {
+ //NOTE: switch statement fall-through is deliberate
+ case YEAR:
+ calInstance.set(Calendar.MONTH, 0);
+ case MONTH:
+ calInstance.set(Calendar.DAY_OF_MONTH, 1);
+ case DAY:
+ calInstance.set(Calendar.HOUR_OF_DAY, 0);
+ case HOUR:
+ calInstance.set(Calendar.MINUTE, 0);
+ case MINUTE:
+ calInstance.set(Calendar.SECOND, 0);
+ case SECOND:
+ calInstance.set(Calendar.MILLISECOND, 0);
+ case MILLISECOND:
+ // don't cut off anything
+ break;
+ default:
+ throw new IllegalArgumentException("unknown resolution " + resolution);
}
- return cal.getTime().getTime();
+ return calInstance.getTimeInMillis();
}
/** Specifies the time granularity. */
- public static class Resolution {
+ public static enum Resolution {
- public static final Resolution YEAR = new Resolution("year");
- public static final Resolution MONTH = new Resolution("month");
- public static final Resolution DAY = new Resolution("day");
- public static final Resolution HOUR = new Resolution("hour");
- public static final Resolution MINUTE = new Resolution("minute");
- public static final Resolution SECOND = new Resolution("second");
- public static final Resolution MILLISECOND = new Resolution("millisecond");
+ /** Limit a date's resolution to year granularity. */
+ YEAR(4),
+ /** Limit a date's resolution to month granularity. */
+ MONTH(6),
+ /** Limit a date's resolution to day granularity. */
+ DAY(8),
+ /** Limit a date's resolution to hour granularity. */
+ HOUR(10),
+ /** Limit a date's resolution to minute granularity. */
+ MINUTE(12),
+ /** Limit a date's resolution to second granularity. */
+ SECOND(14),
+ /** Limit a date's resolution to millisecond granularity. */
+ MILLISECOND(17);
- private String resolution;
+ final int formatLen;
+ final SimpleDateFormat format;//should be cloned before use, since it's not threadsafe
- private Resolution() {
+ Resolution(int formatLen) {
+ this.formatLen = formatLen;
+ // formatLen 10's place: 11111111
+ // formatLen 1's place: 12345678901234567
+ this.format = new SimpleDateFormat("yyyyMMddHHmmssSSS".substring(0,formatLen),Locale.ROOT);
+ this.format.setTimeZone(GMT);
}
-
- private Resolution(String resolution) {
- this.resolution = resolution;
- }
-
+
+ /** this method returns the name of the resolution
+ * in lowercase (for backwards compatibility) */
+ @Override
public String toString() {
- return resolution;
+ return super.toString().toLowerCase(Locale.ROOT);
}
}
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/DerefBytesDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/document/Document.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/document/Document.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/document/Document.java 17 Aug 2012 14:54:53 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/document/Document.java 16 Dec 2014 11:31:58 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.document;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,64 +17,39 @@
* limitations under the License.
*/
-import java.util.*; // for javadoc
-import org.apache.lucene.search.ScoreDoc; // for javadoc
-import org.apache.lucene.search.Searcher; // for javadoc
+import java.util.*;
+
import org.apache.lucene.index.IndexReader; // for javadoc
+import org.apache.lucene.index.IndexableField;
+import org.apache.lucene.search.IndexSearcher; // for javadoc
+import org.apache.lucene.search.ScoreDoc; // for javadoc
+import org.apache.lucene.util.BytesRef;
/** Documents are the unit of indexing and search.
*
* A Document is a set of fields. Each field has a name and a textual value.
- * A field may be {@link Fieldable#isStored() stored} with the document, in which
+ * A field may be {@link org.apache.lucene.index.IndexableFieldType#stored() stored} with the document, in which
* case it is returned with search hits on the document. Thus each document
* should typically contain one or more stored fields which uniquely identify
* it.
*
- * Note that fields which are not {@link Fieldable#isStored() stored} are
+ *
Note that fields which are not {@link org.apache.lucene.index.IndexableFieldType#stored() stored} are
* not available in documents retrieved from the index, e.g. with {@link
- * ScoreDoc#doc}, {@link Searcher#doc(int)} or {@link
- * IndexReader#document(int)}.
+ * ScoreDoc#doc} or {@link IndexReader#document(int)}.
*/
-public final class Document implements java.io.Serializable {
- List fields = new ArrayList();
- private float boost = 1.0f;
+public final class Document implements Iterable {
+ private final List fields = new ArrayList<>();
+
/** Constructs a new document with no fields. */
public Document() {}
-
- /** Sets a boost factor for hits on any field of this document. This value
- * will be multiplied into the score of all hits on this document.
- *
- * The default value is 1.0.
- *
- *
Values are multiplied into the value of {@link Fieldable#getBoost()} of
- * each field in this document. Thus, this method in effect sets a default
- * boost for the fields of this document.
- *
- * @see Fieldable#setBoost(float)
- */
- public void setBoost(float boost) {
- this.boost = boost;
+ @Override
+ public Iterator iterator() {
+ return fields.iterator();
}
- /** Returns, at indexing time, the boost factor as set by {@link #setBoost(float)}.
- *
- * Note that once a document is indexed this value is no longer available
- * from the index. At search time, for retrieved documents, this method always
- * returns 1. This however does not mean that the boost value set at indexing
- * time was ignored - it was just combined with other indexing time factors and
- * stored elsewhere, for better indexing and search performance. (For more
- * information see the "norm(t,d)" part of the scoring formula in
- * {@link org.apache.lucene.search.Similarity Similarity}.)
- *
- * @see #setBoost(float)
- */
- public float getBoost() {
- return boost;
- }
-
/**
*
Adds a field to a document. Several fields may be added with
* the same name. In this case, if the fields are indexed, their text is
@@ -85,7 +60,7 @@
* a document has to be deleted from an index and a new changed version of that
* document has to be added.
*/
- public final void add(Fieldable field) {
+ public final void add(IndexableField field) {
fields.add(field);
}
@@ -100,9 +75,9 @@
* document has to be added.
*/
public final void removeField(String name) {
- Iterator it = fields.iterator();
+ Iterator it = fields.iterator();
while (it.hasNext()) {
- Fieldable field = (Fieldable)it.next();
+ IndexableField field = it.next();
if (field.name().equals(name)) {
it.remove();
return;
@@ -120,210 +95,157 @@
* document has to be added.
*/
public final void removeFields(String name) {
- Iterator it = fields.iterator();
+ Iterator it = fields.iterator();
while (it.hasNext()) {
- Fieldable field = (Fieldable)it.next();
+ IndexableField field = it.next();
if (field.name().equals(name)) {
it.remove();
}
}
}
- /** Returns a field with the given name if any exist in this document, or
- * null. If multiple fields exists with this name, this method returns the
- * first value added.
- * Do not use this method with lazy loaded fields.
- */
- public final Field getField(String name) {
- for (int i = 0; i < fields.size(); i++) {
- Field field = (Field)fields.get(i);
- if (field.name().equals(name))
- return field;
+
+ /**
+ * Returns an array of byte arrays for of the fields that have the name specified
+ * as the method parameter. This method returns an empty
+ * array when there are no matching fields. It never
+ * returns null.
+ *
+ * @param name the name of the field
+ * @return a BytesRef[]
of binary field values
+ */
+ public final BytesRef[] getBinaryValues(String name) {
+ final List result = new ArrayList<>();
+ for (IndexableField field : fields) {
+ if (field.name().equals(name)) {
+ final BytesRef bytes = field.binaryValue();
+ if (bytes != null) {
+ result.add(bytes);
+ }
+ }
}
+
+ return result.toArray(new BytesRef[result.size()]);
+ }
+
+ /**
+ * Returns an array of bytes for the first (or only) field that has the name
+ * specified as the method parameter. This method will return null
+ * if no binary fields with the specified name are available.
+ * There may be non-binary fields with the same name.
+ *
+ * @param name the name of the field.
+ * @return a BytesRef
containing the binary field value or null
+ */
+ public final BytesRef getBinaryValue(String name) {
+ for (IndexableField field : fields) {
+ if (field.name().equals(name)) {
+ final BytesRef bytes = field.binaryValue();
+ if (bytes != null) {
+ return bytes;
+ }
+ }
+ }
return null;
}
-
- /** Returns a field with the given name if any exist in this document, or
+ /** Returns a field with the given name if any exist in this document, or
* null. If multiple fields exists with this name, this method returns the
* first value added.
*/
- public Fieldable getFieldable(String name) {
- for (int i = 0; i < fields.size(); i++) {
- Fieldable field = (Fieldable)fields.get(i);
- if (field.name().equals(name))
- return field;
- }
- return null;
- }
-
- /** Returns the string value of the field with the given name if any exist in
- * this document, or null. If multiple fields exist with this name, this
- * method returns the first value added. If only binary fields with this name
- * exist, returns null.
- */
- public final String get(String name) {
- for (int i = 0; i < fields.size(); i++) {
- Fieldable field = (Fieldable)fields.get(i);
- if (field.name().equals(name) && (!field.isBinary()))
- return field.stringValue();
+ public final IndexableField getField(String name) {
+ for (IndexableField field : fields) {
+ if (field.name().equals(name)) {
+ return field;
+ }
}
return null;
}
- /** Returns an Enumeration of all the fields in a document.
- * @deprecated use {@link #getFields()} instead
+ /**
+ * Returns an array of {@link IndexableField}s with the given name.
+ * This method returns an empty array when there are no
+ * matching fields. It never returns null.
+ *
+ * @param name the name of the field
+ * @return a IndexableField[]
array
*/
- public final Enumeration fields() {
- return new Enumeration() {
- final Iterator iter = fields.iterator();
- public boolean hasMoreElements() {
- return iter.hasNext();
+ public IndexableField[] getFields(String name) {
+ List result = new ArrayList<>();
+ for (IndexableField field : fields) {
+ if (field.name().equals(name)) {
+ result.add(field);
}
- public Object nextElement() {
- return iter.next();
- }
- };
- }
+ }
+ return result.toArray(new IndexableField[result.size()]);
+ }
+
/** Returns a List of all the fields in a document.
- * Note that fields which are not {@link Fieldable#isStored() stored} are
+ *
Note that fields which are not stored are
* not available in documents retrieved from the
- * index, e.g. {@link Searcher#doc(int)} or {@link
+ * index, e.g. {@link IndexSearcher#doc(int)} or {@link
* IndexReader#document(int)}.
*/
- public final List getFields() {
+ public final List getFields() {
return fields;
}
-
- private final static Field[] NO_FIELDS = new Field[0];
- /**
- * Returns an array of {@link Field}s with the given name.
- * Do not use with lazy loaded fields.
- * This method returns an empty array when there are no
- * matching fields. It never returns null.
- *
- * @param name the name of the field
- * @return a Field[]
array
- */
- public final Field[] getFields(String name) {
- List result = new ArrayList();
- for (int i = 0; i < fields.size(); i++) {
- Field field = (Field)fields.get(i);
- if (field.name().equals(name)) {
- result.add(field);
- }
- }
-
- if (result.size() == 0)
- return NO_FIELDS;
-
- return (Field[])result.toArray(new Field[result.size()]);
- }
-
-
- private final static Fieldable[] NO_FIELDABLES = new Fieldable[0];
-
- /**
- * Returns an array of {@link Fieldable}s with the given name.
- * This method returns an empty array when there are no
- * matching fields. It never returns null.
- *
- * @param name the name of the field
- * @return a Fieldable[]
array
- */
- public Fieldable[] getFieldables(String name) {
- List result = new ArrayList();
- for (int i = 0; i < fields.size(); i++) {
- Fieldable field = (Fieldable)fields.get(i);
- if (field.name().equals(name)) {
- result.add(field);
- }
- }
-
- if (result.size() == 0)
- return NO_FIELDABLES;
-
- return (Fieldable[])result.toArray(new Fieldable[result.size()]);
- }
-
-
private final static String[] NO_STRINGS = new String[0];
/**
* Returns an array of values of the field specified as the method parameter.
* This method returns an empty array when there are no
* matching fields. It never returns null.
+ * For {@link IntField}, {@link LongField}, {@link
+ * FloatField} and {@link DoubleField} it returns the string value of the number. If you want
+ * the actual numeric field instances back, use {@link #getFields}.
* @param name the name of the field
* @return a String[]
of field values
*/
public final String[] getValues(String name) {
- List result = new ArrayList();
- for (int i = 0; i < fields.size(); i++) {
- Fieldable field = (Fieldable)fields.get(i);
- if (field.name().equals(name) && (!field.isBinary()))
+ List result = new ArrayList<>();
+ for (IndexableField field : fields) {
+ if (field.name().equals(name) && field.stringValue() != null) {
result.add(field.stringValue());
+ }
}
- if (result.size() == 0)
+ if (result.size() == 0) {
return NO_STRINGS;
+ }
- return (String[])result.toArray(new String[result.size()]);
+ return result.toArray(new String[result.size()]);
}
- private final static byte[][] NO_BYTES = new byte[0][];
-
- /**
- * Returns an array of byte arrays for of the fields that have the name specified
- * as the method parameter. This method returns an empty
- * array when there are no matching fields. It never
- * returns null.
- *
- * @param name the name of the field
- * @return a byte[][]
of binary field values
- */
- public final byte[][] getBinaryValues(String name) {
- List result = new ArrayList();
- for (int i = 0; i < fields.size(); i++) {
- Fieldable field = (Fieldable)fields.get(i);
- if (field.name().equals(name) && (field.isBinary()))
- result.add(field.binaryValue());
+ /** Returns the string value of the field with the given name if any exist in
+ * this document, or null. If multiple fields exist with this name, this
+ * method returns the first value added. If only binary fields with this name
+ * exist, returns null.
+ * For {@link IntField}, {@link LongField}, {@link
+ * FloatField} and {@link DoubleField} it returns the string value of the number. If you want
+ * the actual numeric field instance back, use {@link #getField}.
+ */
+ public final String get(String name) {
+ for (IndexableField field : fields) {
+ if (field.name().equals(name) && field.stringValue() != null) {
+ return field.stringValue();
+ }
}
-
- if (result.size() == 0)
- return NO_BYTES;
-
- return (byte[][])result.toArray(new byte[result.size()][]);
- }
-
- /**
- * Returns an array of bytes for the first (or only) field that has the name
- * specified as the method parameter. This method will return null
- * if no binary fields with the specified name are available.
- * There may be non-binary fields with the same name.
- *
- * @param name the name of the field.
- * @return a byte[]
containing the binary field value or null
- */
- public final byte[] getBinaryValue(String name) {
- for (int i=0; i < fields.size(); i++) {
- Fieldable field = (Fieldable)fields.get(i);
- if (field.name().equals(name) && (field.isBinary()))
- return field.binaryValue();
- }
return null;
}
/** Prints the fields of a document for human consumption. */
+ @Override
public final String toString() {
- StringBuffer buffer = new StringBuffer();
+ StringBuilder buffer = new StringBuilder();
buffer.append("Document<");
for (int i = 0; i < fields.size(); i++) {
- Fieldable field = (Fieldable)fields.get(i);
+ IndexableField field = fields.get(i);
buffer.append(field.toString());
- if (i != fields.size()-1)
+ if (i != fields.size()-1) {
buffer.append(" ");
+ }
}
buffer.append(">");
return buffer.toString();
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/DocumentStoredFieldVisitor.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/DoubleDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/DoubleField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/document/Field.java
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/document/Field.java,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/document/Field.java 17 Aug 2012 14:54:53 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/document/Field.java 16 Dec 2014 11:31:59 -0000 1.1.2.1
@@ -1,6 +1,6 @@
package org.apache.lucene.document;
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -17,79 +17,660 @@
* limitations under the License.
*/
-import org.apache.lucene.analysis.TokenStream;
-import org.apache.lucene.index.IndexWriter; // for javadoc
-import org.apache.lucene.util.Parameter;
-
+import java.io.IOException;
import java.io.Reader;
-import java.io.Serializable;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.NumericTokenStream;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
+import org.apache.lucene.document.FieldType.NumericType;
+import org.apache.lucene.index.IndexWriter; // javadocs
+import org.apache.lucene.index.IndexableField;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.index.FieldInvertState; // javadocs
+
/**
- A field is a section of a Document. Each field has two parts, a name and a
- value. Values may be free text, provided as a String or as a Reader, or they
- may be atomic keywords, which are not further processed. Such keywords may
- be used to represent dates, urls, etc. Fields are optionally stored in the
- index, so that they may be returned with hits on the document.
- */
+ * Expert: directly create a field for a document. Most
+ * users should use one of the sugar subclasses: {@link
+ * IntField}, {@link LongField}, {@link FloatField}, {@link
+ * DoubleField}, {@link BinaryDocValuesField}, {@link
+ * NumericDocValuesField}, {@link SortedDocValuesField}, {@link
+ * StringField}, {@link TextField}, {@link StoredField}.
+ *
+ *
A field is a section of a Document. Each field has three
+ * parts: name, type and value. Values may be text
+ * (String, Reader or pre-analyzed TokenStream), binary
+ * (byte[]), or numeric (a Number). Fields are optionally stored in the
+ * index, so that they may be returned with hits on the document.
+ *
+ *
+ * NOTE: the field type is an {@link IndexableFieldType}. Making changes
+ * to the state of the IndexableFieldType will impact any
+ * Field it is used in. It is strongly recommended that no
+ * changes be made after Field instantiation.
+ */
+public class Field implements IndexableField {
-public final class Field extends AbstractField implements Fieldable, Serializable {
+ /**
+ * Field's type
+ */
+ protected final FieldType type;
+ /**
+ * Field's name
+ */
+ protected final String name;
+
+ /** Field's value */
+ protected Object fieldsData;
+
+ /** Pre-analyzed tokenStream for indexed fields; this is
+ * separate from fieldsData because you are allowed to
+ * have both; eg maybe field has a String value but you
+ * customize how it's tokenized */
+ protected TokenStream tokenStream;
+
+ /**
+ * Field's boost
+ * @see #boost()
+ */
+ protected float boost = 1.0f;
+
+ /**
+ * Expert: creates a field with no initial value.
+ * Intended only for custom Field subclasses.
+ * @param name field name
+ * @param type field type
+ * @throws IllegalArgumentException if either the name or type
+ * is null.
+ */
+ protected Field(String name, FieldType type) {
+ if (name == null) {
+ throw new IllegalArgumentException("name cannot be null");
+ }
+ this.name = name;
+ if (type == null) {
+ throw new IllegalArgumentException("type cannot be null");
+ }
+ this.type = type;
+ }
+
+ /**
+ * Create field with Reader value.
+ * @param name field name
+ * @param reader reader value
+ * @param type field type
+ * @throws IllegalArgumentException if either the name or type
+ * is null, or if the field's type is stored(), or
+ * if tokenized() is false.
+ * @throws NullPointerException if the reader is null
+ */
+ public Field(String name, Reader reader, FieldType type) {
+ if (name == null) {
+ throw new IllegalArgumentException("name cannot be null");
+ }
+ if (type == null) {
+ throw new IllegalArgumentException("type cannot be null");
+ }
+ if (reader == null) {
+ throw new NullPointerException("reader cannot be null");
+ }
+ if (type.stored()) {
+ throw new IllegalArgumentException("fields with a Reader value cannot be stored");
+ }
+ if (type.indexed() && !type.tokenized()) {
+ throw new IllegalArgumentException("non-tokenized fields must use String values");
+ }
+
+ this.name = name;
+ this.fieldsData = reader;
+ this.type = type;
+ }
+
+ /**
+ * Create field with TokenStream value.
+ * @param name field name
+ * @param tokenStream TokenStream value
+ * @param type field type
+ * @throws IllegalArgumentException if either the name or type
+ * is null, or if the field's type is stored(), or
+ * if tokenized() is false, or if indexed() is false.
+ * @throws NullPointerException if the tokenStream is null
+ */
+ public Field(String name, TokenStream tokenStream, FieldType type) {
+ if (name == null) {
+ throw new IllegalArgumentException("name cannot be null");
+ }
+ if (tokenStream == null) {
+ throw new NullPointerException("tokenStream cannot be null");
+ }
+ if (!type.indexed() || !type.tokenized()) {
+ throw new IllegalArgumentException("TokenStream fields must be indexed and tokenized");
+ }
+ if (type.stored()) {
+ throw new IllegalArgumentException("TokenStream fields cannot be stored");
+ }
+
+ this.name = name;
+ this.fieldsData = null;
+ this.tokenStream = tokenStream;
+ this.type = type;
+ }
- /** Specifies whether and how a field should be stored. */
- public static final class Store extends Parameter implements Serializable {
+ /**
+ * Create field with binary value.
+ *
+ * NOTE: the provided byte[] is not copied so be sure
+ * not to change it until you're done with this field.
+ * @param name field name
+ * @param value byte array pointing to binary content (not copied)
+ * @param type field type
+ * @throws IllegalArgumentException if the field name is null,
+ * or the field's type is indexed()
+ * @throws NullPointerException if the type is null
+ */
+ public Field(String name, byte[] value, FieldType type) {
+ this(name, value, 0, value.length, type);
+ }
- private Store(String name) {
- super(name);
+ /**
+ * Create field with binary value.
+ *
+ *
NOTE: the provided byte[] is not copied so be sure
+ * not to change it until you're done with this field.
+ * @param name field name
+ * @param value byte array pointing to binary content (not copied)
+ * @param offset starting position of the byte array
+ * @param length valid length of the byte array
+ * @param type field type
+ * @throws IllegalArgumentException if the field name is null,
+ * or the field's type is indexed()
+ * @throws NullPointerException if the type is null
+ */
+ public Field(String name, byte[] value, int offset, int length, FieldType type) {
+ this(name, new BytesRef(value, offset, length), type);
+ }
+
+ /**
+ * Create field with binary value.
+ *
+ *
NOTE: the provided BytesRef is not copied so be sure
+ * not to change it until you're done with this field.
+ * @param name field name
+ * @param bytes BytesRef pointing to binary content (not copied)
+ * @param type field type
+ * @throws IllegalArgumentException if the field name is null,
+ * or the field's type is indexed()
+ * @throws NullPointerException if the type is null
+ */
+ public Field(String name, BytesRef bytes, FieldType type) {
+ if (name == null) {
+ throw new IllegalArgumentException("name cannot be null");
}
+ if (bytes == null) {
+ throw new IllegalArgumentException("bytes cannot be null");
+ }
+ if (type.indexed()) {
+ throw new IllegalArgumentException("Fields with BytesRef values cannot be indexed");
+ }
+ this.fieldsData = bytes;
+ this.type = type;
+ this.name = name;
+ }
- /** Store the original field value in the index in a compressed form. This is
- * useful for long documents and for binary valued fields.
+ // TODO: allow direct construction of int, long, float, double value too..?
+
+ /**
+ * Create field with String value.
+ * @param name field name
+ * @param value string value
+ * @param type field type
+ * @throws IllegalArgumentException if either the name or value
+ * is null, or if the field's type is neither indexed() nor stored(),
+ * or if indexed() is false but storeTermVectors() is true.
+ * @throws NullPointerException if the type is null
+ */
+ public Field(String name, String value, FieldType type) {
+ if (name == null) {
+ throw new IllegalArgumentException("name cannot be null");
+ }
+ if (value == null) {
+ throw new IllegalArgumentException("value cannot be null");
+ }
+ if (!type.stored() && !type.indexed()) {
+ throw new IllegalArgumentException("it doesn't make sense to have a field that "
+ + "is neither indexed nor stored");
+ }
+ this.type = type;
+ this.name = name;
+ this.fieldsData = value;
+ }
+
+ /**
+ * The value of the field as a String, or null. If null, the Reader value or
+ * binary value is used. Exactly one of stringValue(), readerValue(), and
+ * getBinaryValue() must be set.
+ */
+ @Override
+ public String stringValue() {
+ if (fieldsData instanceof String || fieldsData instanceof Number) {
+ return fieldsData.toString();
+ } else {
+ return null;
+ }
+ }
+
+ /**
+ * The value of the field as a Reader, or null. If null, the String value or
+ * binary value is used. Exactly one of stringValue(), readerValue(), and
+ * getBinaryValue() must be set.
+ */
+ @Override
+ public Reader readerValue() {
+ return fieldsData instanceof Reader ? (Reader) fieldsData : null;
+ }
+
+ /**
+ * The TokenStream for this field to be used when indexing, or null. If null,
+ * the Reader value or String value is analyzed to produce the indexed tokens.
+ */
+ public TokenStream tokenStreamValue() {
+ return tokenStream;
+ }
+
+ /**
+ *
+ * Expert: change the value of this field. This can be used during indexing to
+ * re-use a single Field instance to improve indexing speed by avoiding GC
+ * cost of new'ing and reclaiming Field instances. Typically a single
+ * {@link Document} instance is re-used as well. This helps most on small
+ * documents.
+ *
+ *
+ *
+ * Each Field instance should only be used once within a single
+ * {@link Document} instance. See ImproveIndexingSpeed for details.
+ *
+ */
+ public void setStringValue(String value) {
+ if (!(fieldsData instanceof String)) {
+ throw new IllegalArgumentException("cannot change value type from " + fieldsData.getClass().getSimpleName() + " to String");
+ }
+ if (value == null) {
+ throw new IllegalArgumentException("value cannot be null");
+ }
+ fieldsData = value;
+ }
+
+ /**
+ * Expert: change the value of this field. See
+ * {@link #setStringValue(String)}.
+ */
+ public void setReaderValue(Reader value) {
+ if (!(fieldsData instanceof Reader)) {
+ throw new IllegalArgumentException("cannot change value type from " + fieldsData.getClass().getSimpleName() + " to Reader");
+ }
+ fieldsData = value;
+ }
+
+ /**
+ * Expert: change the value of this field. See
+ * {@link #setStringValue(String)}.
+ */
+ public void setBytesValue(byte[] value) {
+ setBytesValue(new BytesRef(value));
+ }
+
+ /**
+ * Expert: change the value of this field. See
+ * {@link #setStringValue(String)}.
+ *
+ * NOTE: the provided BytesRef is not copied so be sure
+ * not to change it until you're done with this field.
+ */
+ public void setBytesValue(BytesRef value) {
+ if (!(fieldsData instanceof BytesRef)) {
+ throw new IllegalArgumentException("cannot change value type from " + fieldsData.getClass().getSimpleName() + " to BytesRef");
+ }
+ if (type.indexed()) {
+ throw new IllegalArgumentException("cannot set a BytesRef value on an indexed field");
+ }
+ if (value == null) {
+ throw new IllegalArgumentException("value cannot be null");
+ }
+ fieldsData = value;
+ }
+
+ /**
+ * Expert: change the value of this field. See
+ * {@link #setStringValue(String)}.
+ */
+ public void setByteValue(byte value) {
+ if (!(fieldsData instanceof Byte)) {
+ throw new IllegalArgumentException("cannot change value type from " + fieldsData.getClass().getSimpleName() + " to Byte");
+ }
+ fieldsData = Byte.valueOf(value);
+ }
+
+ /**
+ * Expert: change the value of this field. See
+ * {@link #setStringValue(String)}.
+ */
+ public void setShortValue(short value) {
+ if (!(fieldsData instanceof Short)) {
+ throw new IllegalArgumentException("cannot change value type from " + fieldsData.getClass().getSimpleName() + " to Short");
+ }
+ fieldsData = Short.valueOf(value);
+ }
+
+ /**
+ * Expert: change the value of this field. See
+ * {@link #setStringValue(String)}.
+ */
+ public void setIntValue(int value) {
+ if (!(fieldsData instanceof Integer)) {
+ throw new IllegalArgumentException("cannot change value type from " + fieldsData.getClass().getSimpleName() + " to Integer");
+ }
+ fieldsData = Integer.valueOf(value);
+ }
+
+ /**
+ * Expert: change the value of this field. See
+ * {@link #setStringValue(String)}.
+ */
+ public void setLongValue(long value) {
+ if (!(fieldsData instanceof Long)) {
+ throw new IllegalArgumentException("cannot change value type from " + fieldsData.getClass().getSimpleName() + " to Long");
+ }
+ fieldsData = Long.valueOf(value);
+ }
+
+ /**
+ * Expert: change the value of this field. See
+ * {@link #setStringValue(String)}.
+ */
+ public void setFloatValue(float value) {
+ if (!(fieldsData instanceof Float)) {
+ throw new IllegalArgumentException("cannot change value type from " + fieldsData.getClass().getSimpleName() + " to Float");
+ }
+ fieldsData = Float.valueOf(value);
+ }
+
+ /**
+ * Expert: change the value of this field. See
+ * {@link #setStringValue(String)}.
+ */
+ public void setDoubleValue(double value) {
+ if (!(fieldsData instanceof Double)) {
+ throw new IllegalArgumentException("cannot change value type from " + fieldsData.getClass().getSimpleName() + " to Double");
+ }
+ fieldsData = Double.valueOf(value);
+ }
+
+ /**
+ * Expert: sets the token stream to be used for indexing and causes
+ * isIndexed() and isTokenized() to return true. May be combined with stored
+ * values from stringValue() or getBinaryValue()
+ */
+ public void setTokenStream(TokenStream tokenStream) {
+ if (!type.indexed() || !type.tokenized()) {
+ throw new IllegalArgumentException("TokenStream fields must be indexed and tokenized");
+ }
+ if (type.numericType() != null) {
+ throw new IllegalArgumentException("cannot set private TokenStream on numeric fields");
+ }
+ this.tokenStream = tokenStream;
+ }
+
+ @Override
+ public String name() {
+ return name;
+ }
+
+ /**
+ * {@inheritDoc}
+ *
+ * The default value is 1.0f
(no boost).
+ * @see #setBoost(float)
+ */
+ @Override
+ public float boost() {
+ return boost;
+ }
+
+ /**
+ * Sets the boost factor on this field.
+ * @throws IllegalArgumentException if this field is not indexed,
+ * or if it omits norms.
+ * @see #boost()
+ */
+ public void setBoost(float boost) {
+ if (boost != 1.0f) {
+ if (type.indexed() == false || type.omitNorms()) {
+ throw new IllegalArgumentException("You cannot set an index-time boost on an unindexed field, or one that omits norms");
+ }
+ }
+ this.boost = boost;
+ }
+
+ @Override
+ public Number numericValue() {
+ if (fieldsData instanceof Number) {
+ return (Number) fieldsData;
+ } else {
+ return null;
+ }
+ }
+
+ @Override
+ public BytesRef binaryValue() {
+ if (fieldsData instanceof BytesRef) {
+ return (BytesRef) fieldsData;
+ } else {
+ return null;
+ }
+ }
+
+ /** Prints a Field for human consumption. */
+ @Override
+ public String toString() {
+ StringBuilder result = new StringBuilder();
+ result.append(type.toString());
+ result.append('<');
+ result.append(name);
+ result.append(':');
+
+ if (fieldsData != null) {
+ result.append(fieldsData);
+ }
+
+ result.append('>');
+ return result.toString();
+ }
+
+ /** Returns the {@link FieldType} for this field. */
+ @Override
+ public FieldType fieldType() {
+ return type;
+ }
+
+ @Override
+ public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse) throws IOException {
+ if (!fieldType().indexed()) {
+ return null;
+ }
+
+ final NumericType numericType = fieldType().numericType();
+ if (numericType != null) {
+ if (!(reuse instanceof NumericTokenStream && ((NumericTokenStream)reuse).getPrecisionStep() == type.numericPrecisionStep())) {
+ // lazy init the TokenStream as it is heavy to instantiate
+ // (attributes,...) if not needed (stored field loading)
+ reuse = new NumericTokenStream(type.numericPrecisionStep());
+ }
+ final NumericTokenStream nts = (NumericTokenStream) reuse;
+ // initialize value in TokenStream
+ final Number val = (Number) fieldsData;
+ switch (numericType) {
+ case INT:
+ nts.setIntValue(val.intValue());
+ break;
+ case LONG:
+ nts.setLongValue(val.longValue());
+ break;
+ case FLOAT:
+ nts.setFloatValue(val.floatValue());
+ break;
+ case DOUBLE:
+ nts.setDoubleValue(val.doubleValue());
+ break;
+ default:
+ throw new AssertionError("Should never get here");
+ }
+ return reuse;
+ }
+
+ if (!fieldType().tokenized()) {
+ if (stringValue() == null) {
+ throw new IllegalArgumentException("Non-Tokenized Fields must have a String value");
+ }
+ if (!(reuse instanceof StringTokenStream)) {
+ // lazy init the TokenStream as it is heavy to instantiate
+ // (attributes,...) if not needed (stored field loading)
+ reuse = new StringTokenStream();
+ }
+ ((StringTokenStream) reuse).setValue(stringValue());
+ return reuse;
+ }
+
+ if (tokenStream != null) {
+ return tokenStream;
+ } else if (readerValue() != null) {
+ return analyzer.tokenStream(name(), readerValue());
+ } else if (stringValue() != null) {
+ return analyzer.tokenStream(name(), stringValue());
+ }
+
+ throw new IllegalArgumentException("Field must have either TokenStream, String, Reader or Number value; got " + this);
+ }
+
+ static final class StringTokenStream extends TokenStream {
+ private final CharTermAttribute termAttribute = addAttribute(CharTermAttribute.class);
+ private final OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);
+ private boolean used = false;
+ private String value = null;
+
+ /** Creates a new TokenStream that returns a String as single token.
+ *
Warning: Does not initialize the value, you must call
+ * {@link #setValue(String)} afterwards!
*/
- public static final Store COMPRESS = new Store("COMPRESS");
+ StringTokenStream() {
+ }
+
+ /** Sets the string value. */
+ void setValue(String value) {
+ this.value = value;
+ }
+ @Override
+ public boolean incrementToken() {
+ if (used) {
+ return false;
+ }
+ clearAttributes();
+ termAttribute.append(value);
+ offsetAttribute.setOffset(0, value.length());
+ used = true;
+ return true;
+ }
+
+ @Override
+ public void end() throws IOException {
+ super.end();
+ final int finalOffset = value.length();
+ offsetAttribute.setOffset(finalOffset, finalOffset);
+ }
+
+ @Override
+ public void reset() {
+ used = false;
+ }
+
+ @Override
+ public void close() {
+ value = null;
+ }
+ }
+
+ /** Specifies whether and how a field should be stored. */
+ public static enum Store {
+
/** Store the original field value in the index. This is useful for short texts
* like a document's title which should be displayed with the results. The
* value is stored in its original form, i.e. no analyzer is used before it is
* stored.
*/
- public static final Store YES = new Store("YES");
+ YES,
- /** Do not store the field value in the index. */
- public static final Store NO = new Store("NO");
+ /** Do not store the field's value in the index. */
+ NO
}
- /** Specifies whether and how a field should be indexed. */
- public static final class Index extends Parameter implements Serializable {
+ //
+ // Deprecated transition API below:
+ //
- private Index(String name) {
- super(name);
- }
+ /** Specifies whether and how a field should be indexed.
+ *
+ * @deprecated This is here only to ease transition from
+ * the pre-4.0 APIs. */
+ @Deprecated
+ public static enum Index {
/** Do not index the field value. This field can thus not be searched,
* but one can still access its contents provided it is
* {@link Field.Store stored}. */
- public static final Index NO = new Index("NO");
+ NO {
+ @Override
+ public boolean isIndexed() { return false; }
+ @Override
+ public boolean isAnalyzed() { return false; }
+ @Override
+ public boolean omitNorms() { return true; }
+ },
/** Index the tokens produced by running the field's
* value through an Analyzer. This is useful for
* common text. */
- public static final Index ANALYZED = new Index("ANALYZED");
+ ANALYZED {
+ @Override
+ public boolean isIndexed() { return true; }
+ @Override
+ public boolean isAnalyzed() { return true; }
+ @Override
+ public boolean omitNorms() { return false; }
+ },
- /** @deprecated this has been renamed to {@link #ANALYZED} */
- public static final Index TOKENIZED = ANALYZED;
-
/** Index the field's value without using an Analyzer, so it can be searched.
* As no analyzer is used the value will be stored as a single term. This is
* useful for unique Ids like product numbers.
*/
- public static final Index NOT_ANALYZED = new Index("NOT_ANALYZED");
+ NOT_ANALYZED {
+ @Override
+ public boolean isIndexed() { return true; }
+ @Override
+ public boolean isAnalyzed() { return false; }
+ @Override
+ public boolean omitNorms() { return false; }
+ },
- /** @deprecated This has been renamed to {@link #NOT_ANALYZED} */
- public static final Index UN_TOKENIZED = NOT_ANALYZED;
-
/** Expert: Index the field's value without an Analyzer,
- * and also disable the storing of norms. Note that you
+ * and also disable the indexing of norms. Note that you
* can also separately enable/disable norms by calling
- * {@link #setOmitNorms}. No norms means that
+ * {@link FieldType#setOmitNorms}. No norms means that
* index-time field and document boosting and field
* length normalization are disabled. The benefit is
* less memory usage as norms take up one byte of RAM
@@ -100,48 +681,118 @@
* above described effect on a field, all instances of
* that field must be indexed with NOT_ANALYZED_NO_NORMS
* from the beginning. */
- public static final Index NOT_ANALYZED_NO_NORMS = new Index("NOT_ANALYZED_NO_NORMS");
+ NOT_ANALYZED_NO_NORMS {
+ @Override
+ public boolean isIndexed() { return true; }
+ @Override
+ public boolean isAnalyzed() { return false; }
+ @Override
+ public boolean omitNorms() { return true; }
+ },
- /** @deprecated This has been renamed to
- * {@link #NOT_ANALYZED_NO_NORMS} */
- public static final Index NO_NORMS = NOT_ANALYZED_NO_NORMS;
-
/** Expert: Index the tokens produced by running the
* field's value through an Analyzer, and also
* separately disable the storing of norms. See
* {@link #NOT_ANALYZED_NO_NORMS} for what norms are
* and why you may want to disable them. */
- public static final Index ANALYZED_NO_NORMS = new Index("ANALYZED_NO_NORMS");
+ ANALYZED_NO_NORMS {
+ @Override
+ public boolean isIndexed() { return true; }
+ @Override
+ public boolean isAnalyzed() { return true; }
+ @Override
+ public boolean omitNorms() { return true; }
+ };
+
+ /** Get the best representation of the index given the flags. */
+ public static Index toIndex(boolean indexed, boolean analyzed) {
+ return toIndex(indexed, analyzed, false);
+ }
+
+ /** Expert: Get the best representation of the index given the flags. */
+ public static Index toIndex(boolean indexed, boolean analyzed, boolean omitNorms) {
+
+ // If it is not indexed nothing else matters
+ if (!indexed) {
+ return Index.NO;
+ }
+
+ // typical, non-expert
+ if (!omitNorms) {
+ if (analyzed) {
+ return Index.ANALYZED;
+ }
+ return Index.NOT_ANALYZED;
+ }
+
+ // Expert: Norms omitted
+ if (analyzed) {
+ return Index.ANALYZED_NO_NORMS;
+ }
+ return Index.NOT_ANALYZED_NO_NORMS;
+ }
+
+ public abstract boolean isIndexed();
+ public abstract boolean isAnalyzed();
+ public abstract boolean omitNorms();
}
- /** Specifies whether and how a field should have term vectors. */
- public static final class TermVector extends Parameter implements Serializable {
+ /** Specifies whether and how a field should have term vectors.
+ *
+ * @deprecated This is here only to ease transition from
+ * the pre-4.0 APIs. */
+ @Deprecated
+ public static enum TermVector {
- private TermVector(String name) {
- super(name);
- }
-
/** Do not store term vectors.
*/
- public static final TermVector NO = new TermVector("NO");
+ NO {
+ @Override
+ public boolean isStored() { return false; }
+ @Override
+ public boolean withPositions() { return false; }
+ @Override
+ public boolean withOffsets() { return false; }
+ },
/** Store the term vectors of each document. A term vector is a list
- * of the document's terms and their number of occurences in that document. */
- public static final TermVector YES = new TermVector("YES");
+ * of the document's terms and their number of occurrences in that document. */
+ YES {
+ @Override
+ public boolean isStored() { return true; }
+ @Override
+ public boolean withPositions() { return false; }
+ @Override
+ public boolean withOffsets() { return false; }
+ },
/**
* Store the term vector + token position information
*
* @see #YES
*/
- public static final TermVector WITH_POSITIONS = new TermVector("WITH_POSITIONS");
+ WITH_POSITIONS {
+ @Override
+ public boolean isStored() { return true; }
+ @Override
+ public boolean withPositions() { return true; }
+ @Override
+ public boolean withOffsets() { return false; }
+ },
/**
* Store the term vector + Token offset information
*
* @see #YES
*/
- public static final TermVector WITH_OFFSETS = new TermVector("WITH_OFFSETS");
+ WITH_OFFSETS {
+ @Override
+ public boolean isStored() { return true; }
+ @Override
+ public boolean withPositions() { return false; }
+ @Override
+ public boolean withOffsets() { return true; }
+ },
/**
* Store the term vector + Token position and offset information
@@ -150,90 +801,100 @@
* @see #WITH_POSITIONS
* @see #WITH_OFFSETS
*/
- public static final TermVector WITH_POSITIONS_OFFSETS = new TermVector("WITH_POSITIONS_OFFSETS");
- }
-
-
- /** The value of the field as a String, or null. If null, the Reader value,
- * binary value, or TokenStream value is used. Exactly one of stringValue(),
- * readerValue(), getBinaryValue(), and tokenStreamValue() must be set. */
- public String stringValue() { return fieldsData instanceof String ? (String)fieldsData : null; }
-
- /** The value of the field as a Reader, or null. If null, the String value,
- * binary value, or TokenStream value is used. Exactly one of stringValue(),
- * readerValue(), getBinaryValue(), and tokenStreamValue() must be set. */
- public Reader readerValue() { return fieldsData instanceof Reader ? (Reader)fieldsData : null; }
-
- /** The value of the field in Binary, or null. If null, the Reader value,
- * String value, or TokenStream value is used. Exactly one of stringValue(),
- * readerValue(), getBinaryValue(), and tokenStreamValue() must be set.
- * @deprecated This method must allocate a new byte[] if
- * the {@link AbstractField#getBinaryOffset()} is non-zero
- * or {@link AbstractField#getBinaryLength()} is not the
- * full length of the byte[]. Please use {@link
- * AbstractField#getBinaryValue()} instead, which simply
- * returns the byte[].
- */
- public byte[] binaryValue() {
- if (!isBinary)
- return null;
- final byte[] data = (byte[]) fieldsData;
- if (binaryOffset == 0 && data.length == binaryLength)
- return data; //Optimization
-
- final byte[] ret = new byte[binaryLength];
- System.arraycopy(data, binaryOffset, ret, 0, binaryLength);
- return ret;
- }
-
- /** The value of the field as a TokesStream, or null. If null, the Reader value,
- * String value, or binary value is used. Exactly one of stringValue(),
- * readerValue(), getBinaryValue(), and tokenStreamValue() must be set. */
- public TokenStream tokenStreamValue() { return fieldsData instanceof TokenStream ? (TokenStream)fieldsData : null; }
-
+ WITH_POSITIONS_OFFSETS {
+ @Override
+ public boolean isStored() { return true; }
+ @Override
+ public boolean withPositions() { return true; }
+ @Override
+ public boolean withOffsets() { return true; }
+ };
- /**
Expert: change the value of this field. This can
- * be used during indexing to re-use a single Field
- * instance to improve indexing speed by avoiding GC cost
- * of new'ing and reclaiming Field instances. Typically
- * a single {@link Document} instance is re-used as
- * well. This helps most on small documents.
- *
- * Note that you should only use this method after the
- * Field has been consumed (ie, the {@link Document}
- * containing this Field has been added to the index).
- * Also, each Field instance should only be used once
- * within a single {@link Document} instance. See ImproveIndexingSpeed
- * for details.
*/
- public void setValue(String value) {
- fieldsData = value;
- }
+ /** Get the best representation of a TermVector given the flags. */
+ public static TermVector toTermVector(boolean stored, boolean withOffsets, boolean withPositions) {
- /** Expert: change the value of this field. See setValue(String) . */
- public void setValue(Reader value) {
- fieldsData = value;
- }
+ // If it is not stored, nothing else matters.
+ if (!stored) {
+ return TermVector.NO;
+ }
- /** Expert: change the value of this field. See setValue(String) . */
- public void setValue(byte[] value) {
- fieldsData = value;
- binaryLength = value.length;
- binaryOffset = 0;
+ if (withOffsets) {
+ if (withPositions) {
+ return Field.TermVector.WITH_POSITIONS_OFFSETS;
+ }
+ return Field.TermVector.WITH_OFFSETS;
+ }
+
+ if (withPositions) {
+ return Field.TermVector.WITH_POSITIONS;
+ }
+ return Field.TermVector.YES;
+ }
+
+ public abstract boolean isStored();
+ public abstract boolean withPositions();
+ public abstract boolean withOffsets();
}
- /** Expert: change the value of this field. See setValue(String) . */
- public void setValue(byte[] value, int offset, int length) {
- fieldsData = value;
- binaryLength = length;
- binaryOffset = offset;
+ /** Translates the pre-4.0 enums for specifying how a
+ * field should be indexed into the 4.0 {@link FieldType}
+ * approach.
+ *
+ * @deprecated This is here only to ease transition from
+ * the pre-4.0 APIs.
+ */
+ @Deprecated
+ public static final FieldType translateFieldType(Store store, Index index, TermVector termVector) {
+ final FieldType ft = new FieldType();
+
+ ft.setStored(store == Store.YES);
+
+ switch(index) {
+ case ANALYZED:
+ ft.setIndexed(true);
+ ft.setTokenized(true);
+ break;
+ case ANALYZED_NO_NORMS:
+ ft.setIndexed(true);
+ ft.setTokenized(true);
+ ft.setOmitNorms(true);
+ break;
+ case NOT_ANALYZED:
+ ft.setIndexed(true);
+ ft.setTokenized(false);
+ break;
+ case NOT_ANALYZED_NO_NORMS:
+ ft.setIndexed(true);
+ ft.setTokenized(false);
+ ft.setOmitNorms(true);
+ break;
+ case NO:
+ break;
+ }
+
+ switch(termVector) {
+ case NO:
+ break;
+ case YES:
+ ft.setStoreTermVectors(true);
+ break;
+ case WITH_POSITIONS:
+ ft.setStoreTermVectors(true);
+ ft.setStoreTermVectorPositions(true);
+ break;
+ case WITH_OFFSETS:
+ ft.setStoreTermVectors(true);
+ ft.setStoreTermVectorOffsets(true);
+ break;
+ case WITH_POSITIONS_OFFSETS:
+ ft.setStoreTermVectors(true);
+ ft.setStoreTermVectorPositions(true);
+ ft.setStoreTermVectorOffsets(true);
+ break;
+ }
+ ft.freeze();
+ return ft;
}
-
-
- /** Expert: change the value of this field. See setValue(String) . */
- public void setValue(TokenStream value) {
- fieldsData = value;
- }
/**
* Create a field by specifying its name, value and how it will
@@ -246,11 +907,13 @@
* be tokenized before indexing
* @throws NullPointerException if name or value is null
* @throws IllegalArgumentException if the field is neither stored nor indexed
- */
+ *
+ * @deprecated Use {@link StringField}, {@link TextField} instead. */
+ @Deprecated
public Field(String name, String value, Store store, Index index) {
- this(name, value, store, index, TermVector.NO);
+ this(name, value, translateFieldType(store, index, TermVector.NO));
}
-
+
/**
* Create a field by specifying its name, value and how it will
* be saved in the index.
@@ -267,168 +930,96 @@
* the field is neither stored nor indexed
* the field is not indexed but termVector is TermVector.YES
*
- */
- public Field(String name, String value, Store store, Index index, TermVector termVector) {
- if (name == null)
- throw new NullPointerException("name cannot be null");
- if (value == null)
- throw new NullPointerException("value cannot be null");
- if (name.length() == 0 && value.length() == 0)
- throw new IllegalArgumentException("name and value cannot both be empty");
- if (index == Index.NO && store == Store.NO)
- throw new IllegalArgumentException("it doesn't make sense to have a field that "
- + "is neither indexed nor stored");
- if (index == Index.NO && termVector != TermVector.NO)
- throw new IllegalArgumentException("cannot store term vector information "
- + "for a field that is not indexed");
-
- this.name = name.intern(); // field names are interned
- this.fieldsData = value;
-
- if (store == Store.YES){
- this.isStored = true;
- this.isCompressed = false;
- }
- else if (store == Store.COMPRESS) {
- this.isStored = true;
- this.isCompressed = true;
- }
- else if (store == Store.NO){
- this.isStored = false;
- this.isCompressed = false;
- }
- else
- throw new IllegalArgumentException("unknown store parameter " + store);
-
- if (index == Index.NO) {
- this.isIndexed = false;
- this.isTokenized = false;
- } else if (index == Index.ANALYZED) {
- this.isIndexed = true;
- this.isTokenized = true;
- } else if (index == Index.NOT_ANALYZED) {
- this.isIndexed = true;
- this.isTokenized = false;
- } else if (index == Index.NOT_ANALYZED_NO_NORMS) {
- this.isIndexed = true;
- this.isTokenized = false;
- this.omitNorms = true;
- } else if (index == Index.ANALYZED_NO_NORMS) {
- this.isIndexed = true;
- this.isTokenized = true;
- this.omitNorms = true;
- } else {
- throw new IllegalArgumentException("unknown index parameter " + index);
- }
-
- this.isBinary = false;
-
- setStoreTermVector(termVector);
+ *
+ * @deprecated Use {@link StringField}, {@link TextField} instead. */
+ @Deprecated
+ public Field(String name, String value, Store store, Index index, TermVector termVector) {
+ this(name, value, translateFieldType(store, index, termVector));
}
/**
* Create a tokenized and indexed field that is not stored. Term vectors will
* not be stored. The Reader is read only when the Document is added to the index,
- * i.e. you may not close the Reader until {@link IndexWriter#addDocument(Document)}
+ * i.e. you may not close the Reader until {@link IndexWriter#addDocument}
* has been called.
*
* @param name The name of the field
* @param reader The reader with the content
* @throws NullPointerException if name or reader is null
+ *
+ * @deprecated Use {@link TextField} instead.
*/
+ @Deprecated
public Field(String name, Reader reader) {
this(name, reader, TermVector.NO);
}
/**
* Create a tokenized and indexed field that is not stored, optionally with
* storing term vectors. The Reader is read only when the Document is added to the index,
- * i.e. you may not close the Reader until {@link IndexWriter#addDocument(Document)}
+ * i.e. you may not close the Reader until {@link IndexWriter#addDocument}
* has been called.
*
* @param name The name of the field
* @param reader The reader with the content
* @param termVector Whether term vector should be stored
* @throws NullPointerException if name or reader is null
+ *
+ * @deprecated Use {@link TextField} instead.
*/
+ @Deprecated
public Field(String name, Reader reader, TermVector termVector) {
- if (name == null)
- throw new NullPointerException("name cannot be null");
- if (reader == null)
- throw new NullPointerException("reader cannot be null");
-
- this.name = name.intern(); // field names are interned
- this.fieldsData = reader;
-
- this.isStored = false;
- this.isCompressed = false;
-
- this.isIndexed = true;
- this.isTokenized = true;
-
- this.isBinary = false;
-
- setStoreTermVector(termVector);
+ this(name, reader, translateFieldType(Store.NO, Index.ANALYZED, termVector));
}
/**
* Create a tokenized and indexed field that is not stored. Term vectors will
* not be stored. This is useful for pre-analyzed fields.
* The TokenStream is read only when the Document is added to the index,
- * i.e. you may not close the TokenStream until {@link IndexWriter#addDocument(Document)}
+ * i.e. you may not close the TokenStream until {@link IndexWriter#addDocument}
* has been called.
*
* @param name The name of the field
* @param tokenStream The TokenStream with the content
* @throws NullPointerException if name or tokenStream is null
+ *
+ * @deprecated Use {@link TextField} instead
*/
+ @Deprecated
public Field(String name, TokenStream tokenStream) {
this(name, tokenStream, TermVector.NO);
}
-
+
/**
* Create a tokenized and indexed field that is not stored, optionally with
* storing term vectors. This is useful for pre-analyzed fields.
* The TokenStream is read only when the Document is added to the index,
- * i.e. you may not close the TokenStream until {@link IndexWriter#addDocument(Document)}
+ * i.e. you may not close the TokenStream until {@link IndexWriter#addDocument}
* has been called.
*
* @param name The name of the field
* @param tokenStream The TokenStream with the content
* @param termVector Whether term vector should be stored
* @throws NullPointerException if name or tokenStream is null
+ *
+ * @deprecated Use {@link TextField} instead
*/
+ @Deprecated
public Field(String name, TokenStream tokenStream, TermVector termVector) {
- if (name == null)
- throw new NullPointerException("name cannot be null");
- if (tokenStream == null)
- throw new NullPointerException("tokenStream cannot be null");
-
- this.name = name.intern(); // field names are interned
- this.fieldsData = tokenStream;
-
- this.isStored = false;
- this.isCompressed = false;
-
- this.isIndexed = true;
- this.isTokenized = true;
-
- this.isBinary = false;
-
- setStoreTermVector(termVector);
+ this(name, tokenStream, translateFieldType(Store.NO, Index.ANALYZED, termVector));
}
-
/**
* Create a stored field with binary value. Optionally the value may be compressed.
*
* @param name The name of the field
* @param value The binary value
- * @param store How value
should be stored (compressed or not)
- * @throws IllegalArgumentException if store is Store.NO
+ *
+ * @deprecated Use {@link StoredField} instead.
*/
- public Field(String name, byte[] value, Store store) {
- this(name, value, 0, value.length, store);
+ @Deprecated
+ public Field(String name, byte[] value) {
+ this(name, value, translateFieldType(Store.YES, Index.NO, TermVector.NO));
}
/**
@@ -438,39 +1029,11 @@
* @param value The binary value
* @param offset Starting offset in value where this Field's bytes are
* @param length Number of bytes to use for this Field, starting at offset
- * @param store How value
should be stored (compressed or not)
- * @throws IllegalArgumentException if store is Store.NO
+ *
+ * @deprecated Use {@link StoredField} instead.
*/
- public Field(String name, byte[] value, int offset, int length, Store store) {
-
- if (name == null)
- throw new IllegalArgumentException("name cannot be null");
- if (value == null)
- throw new IllegalArgumentException("value cannot be null");
-
- this.name = name.intern();
- fieldsData = value;
-
- if (store == Store.YES) {
- isStored = true;
- isCompressed = false;
- }
- else if (store == Store.COMPRESS) {
- isStored = true;
- isCompressed = true;
- }
- else if (store == Store.NO)
- throw new IllegalArgumentException("binary values can't be unstored");
- else
- throw new IllegalArgumentException("unknown store parameter " + store);
-
- isIndexed = false;
- isTokenized = false;
-
- isBinary = true;
- binaryLength = length;
- binaryOffset = offset;
-
- setStoreTermVector(TermVector.NO);
+ @Deprecated
+ public Field(String name, byte[] value, int offset, int length) {
+ this(name, value, offset, length, translateFieldType(Store.YES, Index.NO, TermVector.NO));
}
}
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/FieldSelector.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/FieldSelectorResult.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/FieldType.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/Fieldable.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/FloatDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/FloatField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/IntDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/IntField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/LoadFirstFieldSelector.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/LongDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/LongField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/MapFieldSelector.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/NumberTools.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/NumericDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/PackedLongDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1.2.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/SetBasedFieldSelector.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/ShortDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/SortedBytesDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/SortedDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/SortedNumericDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/SortedSetDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/StoredField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/StraightBytesDocValuesField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/StringField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Fisheye: Tag 1.1 refers to a dead (removed) revision in file `3rdParty_sources/lucene/org/apache/lucene/document/TextField.java'.
Fisheye: No comparison available. Pass `N' to diff?
Index: 3rdParty_sources/lucene/org/apache/lucene/document/package.html
===================================================================
RCS file: /usr/local/cvsroot/3rdParty_sources/lucene/org/apache/lucene/document/package.html,v
diff -u -r1.1 -r1.1.2.1
--- 3rdParty_sources/lucene/org/apache/lucene/document/package.html 17 Aug 2012 14:54:53 -0000 1.1
+++ 3rdParty_sources/lucene/org/apache/lucene/document/package.html 16 Dec 2014 11:31:58 -0000 1.1.2.1
@@ -22,33 +22,26 @@
The logical representation of a {@link org.apache.lucene.document.Document} for indexing and searching.
The document package provides the user level logical representation of content to be indexed and searched. The
-package also provides utilities for working with {@link org.apache.lucene.document.Document}s and {@link org.apache.lucene.document.Fieldable}s.
-Document and Fieldable
-A {@link org.apache.lucene.document.Document} is a collection of {@link org.apache.lucene.document.Fieldable}s. A
- {@link org.apache.lucene.document.Fieldable} is a logical representation of a user's content that needs to be indexed or stored.
- {@link org.apache.lucene.document.Fieldable}s have a number of properties that tell Lucene how to treat the content (like indexed, tokenized,
- stored, etc.) See the {@link org.apache.lucene.document.Field} implementation of {@link org.apache.lucene.document.Fieldable}
+package also provides utilities for working with {@link org.apache.lucene.document.Document}s and {@link org.apache.lucene.index.IndexableField}s.
+Document and IndexableField
+A {@link org.apache.lucene.document.Document} is a collection of {@link org.apache.lucene.index.IndexableField}s. A
+ {@link org.apache.lucene.index.IndexableField} is a logical representation of a user's content that needs to be indexed or stored.
+ {@link org.apache.lucene.index.IndexableField}s have a number of properties that tell Lucene how to treat the content (like indexed, tokenized,
+ stored, etc.) See the {@link org.apache.lucene.document.Field} implementation of {@link org.apache.lucene.index.IndexableField}
for specifics on these properties.
Note: it is common to refer to {@link org.apache.lucene.document.Document}s having {@link org.apache.lucene.document.Field}s, even though technically they have
-{@link org.apache.lucene.document.Fieldable}s.
+{@link org.apache.lucene.index.IndexableField}s.
Working with Documents
First and foremost, a {@link org.apache.lucene.document.Document} is something created by the user application. It is your job
to create Documents based on the content of the files you are working with in your application (Word, txt, PDF, Excel or any other format.)
How this is done is completely up to you. That being said, there are many tools available in other projects that can make
- the process of taking a file and converting it into a Lucene {@link org.apache.lucene.document.Document}. To see an example of this,
- take a look at the Lucene demo and the associated source code
- for extracting content from HTML.
+ the process of taking a file and converting it into a Lucene {@link org.apache.lucene.document.Document}.
-The {@link org.apache.lucene.document.DateTools} and {@link org.apache.lucene.document.NumberTools} classes are utility
-classes to make dates, times and longs searchable (remember, Lucene only searches text).
-The {@link org.apache.lucene.document.FieldSelector} class provides a mechanism to tell Lucene how to load Documents from
-storage. If no FieldSelector is used, all Fieldables on a Document will be loaded. As an example of the FieldSelector usage, consider
- the common use case of
-displaying search results on a web page and then having users click through to see the full document. In this scenario, it is often
- the case that there are many small fields and one or two large fields (containing the contents of the original file). Before the FieldSelector,
-the full Document had to be loaded, including the large fields, in order to display the results. Now, using the FieldSelector, one
-can {@link org.apache.lucene.document.FieldSelectorResult#LAZY_LOAD} the large fields, thus only loading the large fields
-when a user clicks on the actual link to view the original content.
+The {@link org.apache.lucene.document.DateTools} is a utility class to make dates and times searchable
+(remember, Lucene only searches text). {@link org.apache.lucene.document.IntField}, {@link org.apache.lucene.document.LongField},
+{@link org.apache.lucene.document.FloatField} and {@link org.apache.lucene.document.DoubleField} are a special helper class
+to simplify indexing of numeric values (and also dates) for fast range range queries with {@link org.apache.lucene.search.NumericRangeQuery}
+(using a special sortable string representation of numeric values).