Weird problem with validating sum types from JSON

I’m slowly adding validations to a legacy ruby project. And that went mostly quite well, except for one piece of external data we are ingesting. I’ve narrowed the problem part down to this:

{
  "response": [
    {
      "type": "DataSegment",
      "data": "SFRUUC8xLjEgMjAwIE9LDQpEYXRlOiBXZWQsIDA0IEF1ZyAyMDIxIDEyOjQwOjUzIEdNVA0KU2VydmVyOiBBcGFjaGUvMi40LjcgKFVidW50dSkNClgtUG93ZXJlZC1CeTogUEhQLzUuNC41DQpFeHBpcmVzOiBXZWQsIDA0IEF1ZyAyMDIxIDEyOjQwOjUzICswMDAwDQpDYWNoZS1Db250cm9sOiBuby1zdG9yZSwgbm8tY2FjaGUsIG11c3QtcmV2YWxpZGF0ZSwgcHJlLWNoZWNrPTAsIHBvc3QtY2hlY2s9MCwgbWF4LWFnZT0wDQpMYXN0LU1vZGlmaWVkOiBXZWQsIDA0IEF1ZyAyMDIxIDEyOjQwOjUzICswMDAwDQpYLUZyYW1lLU9wdGlvbnM6IFNBTUVPUklHSU4NClgtQ29udGVudC1TZWN1cml0eS1Qb2xpY3k6IGFsbG93ICdzZWxmJyA7IG9wdGlvbnMgaW5saW5lLXNjcmlwdCBldmFsLXNjcmlwdDsgZnJhbWUtYW5jZXN0b3JzICdzZWxmJzsgaW1nLXNyYyAnc2VsZicgZGF0YToNClgtV2ViS2l0LUNTUDogZGVmYXVsdC1zcmMgJ3NlbGYnOyBzY3JpcHQtc3JjICdzZWxmJyAndW5zYWZlLWlubGluZScgJ3Vuc2FmZS1ldmFsJzsgc3R5bGUtc3JjICdzZWxmJyAndW5zYWZlLWlubGluZScNClByYWdtYTogbm8tY2FjaGUNClZhcnk6IEFjY2VwdC1FbmNvZGluZw0KQ29udGVudC1MZW5ndGg6IDI3MjM3DQpDb25uZWN0aW9uOiBjbG9zZQ0KQ29udGVudC1UeXBlOiB0ZXh0L2h0bWw7IGNoYXJzZXQ9dXRmLTgNCg0KPCFET0NUWVBFIGh0bWw+CjxodG1sIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hodG1sIj4KPGhlYWQ+CjxtZXRhIGh0dHAtZXF1aXY9IkNvbnRlbnQtVHlwZSIgY29udGVudD0idGV4dC9odG1sOyBjaGFyc2V0PVVURi04IiAvPgo8dGl0bGU+cGhwTXlBZG1pbiBzZXR1cDwvdGl0bGU+CjxsaW5rIGhyZWY9Ii4uL2Zhdmljb24uaWNvIiByZWw=",
      "length": 884
    },
    {
      "type": "SnipSegment",
      "length": 394
    },
    {
      "type": "HighlightSegment",
      "data": "PHNjcmlwdCB0eXBlPSJ0ZXh0L2phdmFzY3JpcHQiIHNyYz0iLi4vanMvY29uZmlnLmpzIj4=",
      "length": 53
    }
  ]
}

I wanted to validate each Segment type individually, because only the SnipSegment can be without data.
After going through the docs and this forum, I came up with a solution that I thought should work

DataSchema = Dry::Schema.JSON do
  required(:type).filled(:string, eql?: 'DataSegment')
  required(:data).filled(:string)
  required(:length).filled(:integer)
end

SnipSchema = Dry::Schema.JSON do
  required(:type).filled(:string, eql?: 'SnipSegment')
  required(:length).filled(:integer)
end

HighlightSchema = Dry::Schema.JSON do
  required(:type).filled(:string, eql?: 'HighlightSegment')
  required(:data).filled(:string)
  required(:length).filled(:integer)
end

class MySchema < Dry::Validation::Contract
  json do
    required(:response).value(:array).each do
      schema(DataSchema | SnipSchema | HighlightSchema)
    end
  end
end

But I get a very strange error hash back:

{:response=>
  {0=>{:or=>[{:type=>["type is missing"], :data=>["data is missing"], :length=>["length is missing"]}, {:type=>["type is missing"], :length=>["length is missing"]}, {:type=>["type is missing"], :data=>["data is missing"], :length=>["length is missing"]}]},
   1=>{:or=>[{:type=>["type is missing"], :data=>["data is missing"], :length=>["length is missing"]}, {:type=>["type is missing"], :length=>["length is missing"]}, {:type=>["type is missing"], :data=>["data is missing"], :length=>["length is missing"]}]},
   2=>{:or=>[{:type=>["type is missing"], :data=>["data is missing"], :length=>["length is missing"]}, {:type=>["type is missing"], :length=>["length is missing"]}, {:type=>["type is missing"], :data=>["data is missing"], :length=>["length is missing"]}]}}}

The schema(DataSchema | SnipSchema | HighlightSchema) part receives the hash with string keys, even though I am specifying JSON everywhere.
Do I need to explicitly tell the segment schemata to symbolize keys, even though they’re of type Dry::Schema.JSON? What’s the canonical way to handle such situations?

Please try latest dry-schema and dry-validation. I couldn’t reproduce this with latest versions.

I was able to reproduce this with the newest gems, the repo with the code is here: GitHub - rickenharp/dry-validation-problem

But I also found an existing bug report of this phenomenon, because if I change the data to symbolized keys, it suddenly works.

Ah yeah, this makes sense since you used schema and it does not imply hash coercion. This is kind of a gotcha. Try this:

class MySchema < Dry::Validation::Contract
  json do
    required(:response).array(:hash, DataSchema | SnipSchema | HighlightSchema)
  end
end
1 Like

Great, this works! Thanks for the quick help!

1 Like