Catch unexpected Hash keys in dry-validation

Hi,

This topic is related to https://github.com/dry-rb/dry-validation/issues/179 on trying to get an error on unexpected Hash keys during validation with a dry-validation schema.

It would be nice if the expected keys are automatically derived from the required & optional keys mentioned in the schema. It would even be better if this would work on nested schemes as well.

The proposed ‘input’ predicate solution introduces a problem

  • the unexpected hash keys are either already removed from the data param presented to the input predicate (weak/symbolized/… Dry::Type hash schema)
  • or the raised Error is catched and ignored due to the ‘Safe’ class of dry-types ( form.hash/json.hash with strict Dry::Type hash schema --> the Safe class catches the error)
  • or an error is raised and further error/hint processing is stopped (with strict Dry::Type hash schema)

If I understand it correctly :

So only the last option could work, however:

  • the UnknownKeysError is not catched so it will stop all further processing including any other errors/hints which could already be detected
  • there is no way to combine for example symbolize keys with detection of unexpected keys, so additional combinations of Dry:Types Hash schemes should be added

Did I get this correctly?

If so, I tried an alternative approach in this POC commit https://github.com/pvmeerbe/dry-types/commit/e2ac4c07eec4eff692ee1b0eaf13566c80bf08c5

The idea is to always collect unexpected keys during parsing withDry:Types Hash schemes.
The current POC gathers all these keys (even from nested schemes) and returns it as a separate Hash structure

Some thoughts on this:

  • The storage under the __ignored_data key feels clumsy, alternative could be to return an additional hash structure as second parameter with this info or even construct some kind of result object which contains the parsed data/unexpected keys/success-failure status ? However I’ve got no idea where this Schema#try method is used and whether changing this behavior would affect other code. for sure the dry-validation pipeline handling of Input processing should change
  • I guess always collecting this data would have an impact on performance? I guess you don’t need this ignored_data info in many cases, so this seems overhead

Perhaps someone can give alternative approaches/improvements/comments on this topic?

Looks like you did get it right :slight_smile:

This feature is on the todo list for 1.0.0. We have the infrastructure in place for applying a predicate that will check the whole input for unexpected keys, the problem that must be solved is how to represent errors for this type of failure. Messages structure is based on the hash keys, so this doesn’t really work in case of unexpected keys, as this is the type of failure that’s associated with the whole input. This means we probably want to extend Result API with a method that returns failures associated with the whole input (and really, this doesn’t have to be limited to unexpected keys, ie imagine an array as the input that must have at least 4 elements, but the input has 3 elements).

Does this make sense?

As far as I can see

So using the following custom input predicate & the POC above I was able to get the error (although non-very useful message since I cannot pass in which keys are not supported as I cannot pass in variables into the message ) :

class BaseValidator < Dry::Validation.Schema

    def self.messages
         super.merge(en: {
             errors: {
                 only_defined_keys_present?: "non-supported key is provided",
                 changes_requested?: "at least one property to update must be defined",
          }
      })
    end

    def changes_requested?(input)
      input.keys.present?
    end

    def only_defined_keys_present?(input)
      input[:__ignored_data].nil?
    end           
    
    define! do
            # top level validation on being a hash and reporting non-supported keys
            input(:hash?, :only_defined_keys_present?, :changes_requested?)
     end
end

In fact I only used the existing infrastructure, but had to use the ugly trick with the __ignore_data key to get the ignored_data into the ‘input’ rule.

So I don’t really understand why the Result API needs to be extended…

Or do you mean implement the unexpected Hash keys check in a separate executor step (aka ‘unexpected keys’ step) & build new error items in this process? If so, somehow the unexpected Hash keys still need to be kept separately as they are already removed during the ProcessInput executor step.
Doing this ‘unexpected keys’ step before the ProcessInput step seems difficult as the input data structure is not verified/coerced yet…

I think the problem lies in the executor steps. What if these are modified to:

  1. execute the ProcessInput step with input hash & which returns coerced data hash + a structure of ignored_data it encountered during parsing
  2. execute the ApplyRules step with coerced data hash + ignored_data
  3. execute the ApplyChecks step with coerced data hash

This way one can choose to use the ignored_data hash inside a global input rule + the rest of the infrastructure is not affected.

The Dry::Types::Safe#call & Dry::Types::Hash#try should also change since we need a way to get this ignored_data structure. This may have bigger impacts… Always returning an additional hash structure which contains ignored keys is not so easy as the #try routines are shared over multiple Dry::Types, for example the Array::Member…

Just realized my proposed approach won’t work for normal Dry::Validation.Schema schema’s as they will by default use the NOOP Input processor (https://github.com/dry-rb/dry-validation/blob/master/lib/dry/validation/schema/class_interface.rb#L11). I guess a more generic solution is needed.

Did I correctly understand : when using the default Dry::Validation.Schema input keys not specified as ‘required’ or ‘optional’ will be ignored by validation and still be present in the output ?

I.e. when using default Dry::Validation.Schema:

  • InputProcessing step : the given input data will simply be returned unaltered by the NOOP processor
  • InputRule step : the unaltered input data is provided to do custom handling
  • Rule step: only the specified ‘required’ and ‘optional’ key validation will be handled, thus ignoring unspecified keys
  • calling output on the result of the validation still contains the unspecified keys

If so , this seems strange as I expected the output to contain only the ‘trusted’ data based on the specified validation rules which is the case when using Form or JSON validation schemas (it actually feels like a bug to me…).

I was able to get ‘trusted’ output by adding the following to my schema :

configure do |config|
      config.input_processor = :sanitizer
end

However this solution has 2 disadvantages :

  • I cannot act upon unspecified keys, similar to the case with Form & JSON + I cannot reuse the same patch as this time different Dry::Types are used…
  • specifying uncoercable input data is not safely handled as with JSON/Form , for example provide a string as input data instead of a Hash will raise
    NoMethodError: undefined method `keys’ for “something”:String

This last point seems actually a bug in the :sanitizer input processor? Let me now if I should create an issue ticket for this…

IMHO it seems dry-validation is missing a more generic system to catch unspecified keys and act upon them.

Perhaps an additional step can be added to the pipeline for this, however this means the complete input structure has to be processed again which would most-likely affect the performance…

Perhaps altering the ‘InputProcessing’ pipeline step to somehow store the unspecified keys (independently from the JSON/Form/Schema selection) and present them to the ‘Input’ pipeline step makes more sense.