Class CollationIterator

java.lang.Object
com.ibm.icu.impl.coll.CollationIterator
Direct Known Subclasses:
CollationDataBuilder.DataBuilderCollationIterator, IterCollationIterator, UTF16CollationIterator

public abstract class CollationIterator extends Object
Collation element iterator and abstract character iterator. When a method returns a code point value, it must be in 0..10FFFF, except it can be negative as a sentinel value.
  • Field Details

  • Constructor Details

    • CollationIterator

      public CollationIterator(CollationData d)
      Partially constructs the iterator. In Java, we cache partially constructed iterators and finish their setup when starting to work on text (via reset(boolean) and the setText(numeric, ...) methods of subclasses). This avoids memory allocations for iterators that remain unused.

      In C++, there is only one constructor, and iterators are stack-allocated as needed.

    • CollationIterator

      public CollationIterator(CollationData d, boolean numeric)
  • Method Details

    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • resetToOffset

      public abstract void resetToOffset(int newOffset)
      Resets the iterator state and sets the position to the specified offset. Subclasses must implement, and must call the parent class method, or CollationIterator.reset().
    • getOffset

      public abstract int getOffset()
    • nextCE

      public final long nextCE()
      Returns the next collation element.
    • fetchCEs

      public final int fetchCEs()
      Fetches all CEs.
      Returns:
      getCEsLength()
    • setCurrentCE

      final void setCurrentCE(long ce)
      Overwrites the current CE (the last one returned by nextCE()).
    • previousCE

      public final long previousCE(UVector32 offsets)
      Returns the previous collation element.
    • getCEsLength

      public final int getCEsLength()
    • getCE

      public final long getCE(int i)
    • getCEs

      public final long[] getCEs()
    • clearCEs

      final void clearCEs()
    • clearCEsIfNoneRemaining

      public final void clearCEsIfNoneRemaining()
    • nextCodePoint

      public abstract int nextCodePoint()
      Returns the next code point (with post-increment). Public for identical-level comparison and for testing.
    • previousCodePoint

      public abstract int previousCodePoint()
      Returns the previous code point (with pre-decrement). Public for identical-level comparison and for testing.
    • reset

      protected final void reset()
    • reset

      protected final void reset(boolean numeric)
      Resets the state as well as the numeric setting, and completes the initialization. Only exists in Java where we reset cached CollationIterator instances rather than stack-allocating temporary ones. (See also the constructor comments.)
    • handleNextCE32

      protected long handleNextCE32()
      Returns the next code point and its local CE32 value. Returns Collation.FALLBACK_CE32 at the end of the text (cinvalid input: '<'0) or when c's CE32 value is to be looked up in the base data (fallback). The code point is used for fallbacks, context and implicit weights. It is ignored when the returned CE32 is not special (e.g., FFFD_CE32). Returns the code point in bits 63..32 (signed) and the CE32 in bits 31..0.
    • makeCodePointAndCE32Pair

      protected long makeCodePointAndCE32Pair(int c, int ce32)
    • handleGetTrailSurrogate

      protected char handleGetTrailSurrogate()
      Called when handleNextCE32() returns a LEAD_SURROGATE_TAG for a lead surrogate code unit. Returns the trail surrogate in that case and advances past it, if a trail surrogate follows the lead surrogate. Otherwise returns any other code unit and does not advance.
    • forbidSurrogateCodePoints

      protected boolean forbidSurrogateCodePoints()
      Returns:
      false if surrogate code points U+D800..U+DFFF map to their own implicit primary weights (for UTF-16), or true if they map to CE(U+FFFD) (for UTF-8)
    • forwardNumCodePoints

      protected abstract void forwardNumCodePoints(int num)
    • backwardNumCodePoints

      protected abstract void backwardNumCodePoints(int num)
    • getDataCE32

      protected int getDataCE32(int c)
      Returns the CE32 from the data trie. Normally the same as data.getCE32(), but overridden in the builder. Call this only when the faster data.getCE32() cannot be used.
    • getCE32FromBuilderData

      protected int getCE32FromBuilderData(int ce32)
    • appendCEsFromCE32

      protected final void appendCEsFromCE32(CollationData d, int c, int ce32, boolean forward)
    • isSurrogate

      private static final boolean isSurrogate(int c)
    • isLeadSurrogate

      protected static final boolean isLeadSurrogate(int c)
    • isTrailSurrogate

      protected static final boolean isTrailSurrogate(int c)
    • nextCEFromCE32

      private final long nextCEFromCE32(CollationData d, int c, int ce32)
    • getCE32FromPrefix

      private final int getCE32FromPrefix(CollationData d, int ce32)
    • nextSkippedCodePoint

      private final int nextSkippedCodePoint()
    • backwardNumSkipped

      private final void backwardNumSkipped(int n)
    • nextCE32FromContraction

      private final int nextCE32FromContraction(CollationData d, int contractionCE32, CharSequence trieChars, int trieOffset, int ce32, int c)
    • nextCE32FromDiscontiguousContraction

      private final int nextCE32FromDiscontiguousContraction(CollationData d, CharsTrie suffixes, int ce32, int lookAhead, int c)
    • previousCEUnsafe

      private final long previousCEUnsafe(int c, UVector32 offsets)
      Returns the previous CE when data.isUnsafeBackward(c, isNumeric).
    • appendNumericCEs

      private final void appendNumericCEs(int ce32, boolean forward)
      Turns a string of digits (bytes 0..9) into a sequence of CEs that will sort in numeric order. Starts from this ce32's digit value and consumes the following/preceding digits. The digits string must not be empty and must not have leading zeros.
    • appendNumericSegmentCEs

      private final void appendNumericSegmentCEs(CharSequence digits)
      Turns 1..254 digits into a sequence of CEs. Called by appendNumericCEs() for each segment of at most 254 digits.