Available options in the spark.read.option()

Annoyingly, the documentation for the option method is in the docs for the json method. The docs on that method say the options are as follows (key — value — description):

  • primitivesAsString — true/false (default false) — infers all primitive values as a string type

  • prefersDecimal — true/false (default false) — infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles.

  • allowComments — true/false (default false) — ignores Java/C++ style comment in JSON records

  • allowUnquotedFieldNames — true/false (default false) — allows unquoted JSON field names

  • allowSingleQuotes — true/false (default true) — allows single quotes in addition to double quotes

  • allowNumericLeadingZeros — true/false (default false) — allows leading zeros in numbers (e.g. 00012)

  • allowBackslashEscapingAnyCharacter — true/false (default false) — allows accepting quoting of all character using backslash quoting mechanism

  • allowUnquotedControlChars — true/false (default false) — allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not.

  • mode — PERMISSIVE/DROPMALFORMED/FAILFAST (default PERMISSIVE) — allows a mode for dealing with corrupt records during parsing.

    • PERMISSIVE : when it meets a corrupted record, puts the malformed
      string into a field configured by columnNameOfCorruptRecord, and sets
      other fields to null. To keep corrupt records, an user can set a
      string type field named columnNameOfCorruptRecord in an user-defined
      schema. If a schema does not have the field, it drops corrupt records
      during parsing. When inferring a schema, it implicitly adds a
      columnNameOfCorruptRecord field in an output schema.
    • DROPMALFORMED : ignores the whole corrupted records.
    • FAILFAST : throws an exception when it meets corrupted records.

Leave a Comment