fi.pelam.csv.table

DetectingTableReader

final class DetectingTableReader[RT, CT, M <: LocaleTableMetadata[M]] extends AnyRef

This CSV format detection heuristic tries to read the input CSV with varying parameters until a Table is produced with no errors or all combinations are exhausted. In the latter case the Table with least errors is returned.

For simpler usage you can skip initialMetadata in the constructor by using the apply method defined in the companion object.

The interface of this class is similar to the one in TableReader, but this class creates multiple TableReader instances under the hood.

Example on detection of CSV format

This example detects and parses a weird CSV format in which the separator is the one used at least in the finnish locale, but numeric data is formatted in english style. The column types are defined on the first row and the row type is defined by the first column.

import fi.pelam.csv.table._
import fi.pelam.csv.cell._
import TableReaderConfig._

val validColTypes = Set("header", "model", "price")

// Setup a DetectingTableReader which will try combinations of CSV formatting types
// to understand the data.
val reader = DetectingTableReader[String, String](

  tableReaderMaker = { (metadata) => new TableReader(

    // An implicit from the object TableReaderConfig converts the string
    // to a function providing streams.
    openStream =
      "header;model;price\n" +
      "data;300D;1,234.0\n" +
      "data;SLS AMG;234,567.89",

    // Make correct metadata end up in the final Table
    tableMetadata = metadata,

    // First column specifies row types
    rowTyper = makeRowTyper({
      case (CellKey(_, 0), rowType) => rowType
    }),

    // Column type is specified by the first row.
    // Type names are checked and error is generated for unknown
    // column types by errorOnUndefinedCol.
    // This strictness is what enables the correct detection of CSV format.
    colTyper = errorOnUndefinedCol(makeColTyper({
      case (CellKey(0, _), colType) if validColTypes.contains(colType) => colType
    })),

    cellUpgrader = makeCellUpgrader({
      case CellType("data", "price") => DoubleCell.parserForLocale(metadata.dataLocale)
    }))
  }
)

val table = reader.readOrThrow()

// Get values from cells in column with type "name" on rows with type "data."
table.getSingleCol("data", "model").map(_.value).toList
// Will give List("300D", "SLS AMG")

// Get values from cells in column with type "number" on rows with type "data."
table.getSingleCol("number", "price").map(_.value).toList)
// Will give List(1234, 234567.89)
RT

The client specific row type.

CT

The client specific column type.

M

The type of the metadata parameter. Must be a sub type of LocaleTableMetadata. This is used to manage the character set, separator, cellTypeLocale and dataLocale combinations when attempting to read the CSV data from the input stream.

Source
DetectingTableReader.scala
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DetectingTableReader
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DetectingTableReader(initialMetadata: M, tableReaderMaker: (M) ⇒ TableReader[RT, CT, M], locales: Seq[Locale] = CsvConstants.commonLocales, charsets: Seq[Charset] = CsvConstants.commonCharsets, separators: Seq[Char] = CsvConstants.commonSeparators)

    initialMetadata

    base metadata instance. Copies with different format parameters will be created from this using LocaleTableMetadata.withFormatParameters. Idea is that you client have custom metadata subclass and use this parameter to set initial values for custom fields in the metadata.

    tableReaderMaker

    user provided method that constructs a TableReader using locales, separator and charset specified by LocaleTableMetadata parameter.

    locales

    List of locales to try for cellTypeLocale and dataLocale. The default is CsvConstants.commonLocales.

    charsets

    List of charsets to try. Default is CsvConstants.commonCharsets.

    separators

    List of separators to try. Default is CsvConstants.commonSeparators.

Type Members

  1. type ResultTable = Table[RT, CT, M]

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  5. val charsets: Seq[Charset]

    List of charsets to try.

    List of charsets to try. Default is CsvConstants.commonCharsets.

  6. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  11. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  12. val initialMetadata: M

    base metadata instance.

    base metadata instance. Copies with different format parameters will be created from this using LocaleTableMetadata.withFormatParameters. Idea is that you client have custom metadata subclass and use this parameter to set initial values for custom fields in the metadata.

  13. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  14. val locales: Seq[Locale]

    List of locales to try for cellTypeLocale and dataLocale.

    List of locales to try for cellTypeLocale and dataLocale. The default is CsvConstants.commonLocales.

  15. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  16. final def notify(): Unit

    Definition Classes
    AnyRef
  17. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  18. def read(): (ResultTable, TableReadingErrors)

    The main method in this class.

    The main method in this class. Can be called several times. The input stream is may be opened and closed many times per each call.

    If there are no errors TableReadingErrors.noErrors is true.

    returns

    a pair with a .TableReader and TableReadingErrors.

  19. def readOrThrow(): ResultTable

    This method extends the basic read method with exception based error handling, which may be useful in smaller applications that don't expect or handle errors in input.

    This method extends the basic read method with exception based error handling, which may be useful in smaller applications that don't expect or handle errors in input.

    A RuntimeException will be thrown when error is encountered.

  20. val separators: Seq[Char]

    List of separators to try.

    List of separators to try. Default is CsvConstants.commonSeparators.

  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  22. val tableReaderMaker: (M) ⇒ TableReader[RT, CT, M]

    user provided method that constructs a TableReader using locales, separator and charset specified by LocaleTableMetadata parameter.

  23. def toString(): String

    Definition Classes
    AnyRef → Any
  24. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped