fi.pelam.csv.table

TableReader

class TableReader[RT, CT, M <: TableMetadata] extends AnyRef

This class is part of the the higher level API for reading, writing and processing CSV data.

The simpler stream based API is enough for many scenarios, but if several different sets of data will be pulled from the same CSV file and the structure of the CSV file is not rigid, this API may be a better fit.

The result of calling read method on this class will be an instance of Table class. The table is an immutable data structure for holding and processing data in a spreadsheet program like format.

TableWriter is the counterpart of this class for writing Table out to disk for example.

Code Example

This example parses a small bit of CSV data in which column types are defined on the first row.

import fi.pelam.csv.table._
import fi.pelam.csv.cell._
import TableReaderConfig._

// Create a TableReader that parses a small bit of CSV data in which the
// column types are defined on the first row.
val reader = new TableReader[String, String, SimpleMetadata](

  // An implicit from the object TableReaderConfig converts the string
  // to a function providing streams.
  openStream =
    "product,price,number\n" +
    "apple,0.99,3\n" +
    "orange,1.25,2\n" +
    "banana,0.80,4\n",

  // The first row is the header, the rest are data.
  rowTyper = makeRowTyper({
    case (CellKey(0, _), _) => "header"
    case _ => "data"
  }),

  // First row defines column types.
  colTyper = makeColTyper({
    case (CellKey(0, _), colType) => colType
  }),

  // Convert cells on the "data" rows in the "number" column to integer cells.
  // Convert cells on the "data" rows in the "price" column to decimal cells.
  cellUpgrader = makeCellUpgrader({
    case CellType("data", "number") => IntegerCell.defaultParser
    case CellType("data", "price") => DoubleCell.defaultParser
  }))

// Get values from cells in column with type "product" on rows with type "data."
table.getSingleCol("data", "product").map(_.value).toList
// Will give List("apple", "orange", "banana")

// Get values from cells in column with type "price" on rows with type "data."
table.getSingleCol("data", "price").map(_.value).toList)
// Will give List(0.99, 1.25, 0.8)

CSV format detection heuristics

One simple detection heuristic is implemented in DetectingTableReader

Since deducing whether correct parameters like character set were used in reading a CSV file without any extra knowledge is impossible, this class supports implementing a custom format detection algorithm by client code.

The table reading is split to stages to allow implementing format detection heuristics that lock some variables during the earlier stages and then proceeding to later stages. Unfortunately there is currently no example or implementation of this idea.

Locking some variables and then proceeding results in more efficient algorithm than exhaustive search of the full set of combinations (character set, locale, separator etc).

The actual detection heuristic is handled outside this class. The idea is that the detection heuristic class uses this repeatedly with varying parameters until some criterion is met. The criterion for ending detection could be that zero errors is detected. If no combination of parameters gives zero errors, then the heuristic could just pick the solution which gave errors in the latest stage and then the fewest errors.

Stages

RT

The client specific row type.

CT

The client specific column type.

M

The type of the metadata parameter. Must be a sub type of TableMetadata. This specifies the character set and separator to use when reading the CSV data from the input stream.

Source
TableReader.scala
Note

This is about the internal structure of TableReader processing.

The table reading is split into four stages.

The table reading process may fail and terminate at each phase. Then an incomplete Table object will be returned together with the errors detected so far.

The table reading is split to stages to allow implementing format detection heuristics in a structured manner.

  • csvReadingStage Parse CSV byte data to cells. Depends on charset and separator provided via the metadata parameter.
  • rowTypeDetectionStage Detect row types (hard coded or based on cell contents). The rowTyper parameter is used in this stage.
  • colTypeDetectionStage Detect column types (hard coded or based on row types and cell contents). The colTyper parameter is used in this stage.
  • cellUpgradeStage Upgrade cells based on cell types, which are combined from row and column types. The cellUpgrader parameter is used in this stage.
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. TableReader
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new TableReader(openStream: StreamOpener, tableMetadata: M = SimpleMetadata(), rowTyper: RowTyper[RT] = PartialFunction.empty, colTyper: ColTyper[RT, CT] = PartialFunction.empty, cellUpgrader: CellUpgrader[RT, CT] = PartialFunction.empty)

    rowTyper

    Partial function used in rowTypeDetectionStage

    colTyper

    Partial function used in colTypeDetectionStage

    cellUpgrader

    Partial function used in cellUpgradeStage

Type Members

  1. type ResultTable = Table[RT, CT, M]

    The Table type returned by Read

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  5. val cellUpgrader: CellUpgrader[RT, CT]

    Partial function used in cellUpgradeStage

  6. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. val colTyper: ColTyper[RT, CT]

    Partial function used in colTypeDetectionStage

  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  17. val openStream: StreamOpener

  18. def read(): (ResultTable, TableReadingErrors)

    The main method in this class.

    The main method in this class. Can be called several times. The input stream is opened and closed once per each call.

    If there are no errors TableReadingErrors.noErrors is true.

    returns

    a pair with a .TableReader and TableReadingErrors.

  19. def readOrThrow(): ResultTable

    This method extends the basic read method with exception based error handling, which may be useful in smaller applications that don't expect or handle errors in input.

    This method extends the basic read method with exception based error handling, which may be useful in smaller applications that don't expect or handle errors in input.

    A RuntimeException will be thrown when error is encountered.

  20. val rowTyper: RowTyper[RT]

    Partial function used in rowTypeDetectionStage

  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  22. val tableMetadata: M

  23. def toString(): String

    Definition Classes
    TableReader → AnyRef → Any
  24. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped