Catch unexpected Hash keys in dry-validation

pvmeerbe · December 16, 2016, 9:29am

Hi,

This topic is related to https://github.com/dry-rb/dry-validation/issues/179 on trying to get an error on unexpected Hash keys during validation with a dry-validation schema.

It would be nice if the expected keys are automatically derived from the required & optional keys mentioned in the schema. It would even be better if this would work on nested schemes as well.

The proposed ‘input’ predicate solution introduces a problem

the unexpected hash keys are either already removed from the data param presented to the input predicate (weak/symbolized/… Dry::Type hash schema)
or the raised Error is catched and ignored due to the ‘Safe’ class of dry-types ( form.hash/json.hash with strict Dry::Type hash schema --> the Safe class catches the error)
or an error is raised and further error/hint processing is stopped (with strict Dry::Type hash schema)

If I understand it correctly :

validation is handled using a pipeline starting with InputProcessing : https://github.com/dry-rb/dry-validation/blob/master/lib/dry/validation/schema.rb#L57
input processor is selected based on Schema/Form/JSON schema selection or as an alternative the config.input_procesor param can be set
- json/form/sanitizer : https://github.com/dry-rb/dry-validation/blob/master/lib/dry/validation/schema/class_interface.rb#L26
- default NOOP : https://github.com/dry-rb/dry-validation/blob/master/lib/dry/validation/schema/class_interface.rb#L11
the output of the ‘input processor’ pipeline step is used as input for the ‘input rule’ pipeline step
- json/form : here the ‘Safe’ Hash class is used which will catch any error thown for unexpected keys, so none of the Dry::types Hash schemes can be used to detect unexpected keys
- NOOP : just returns the input without any checks so cannot be used
- sanitizer : here the strict Dry::Types Hash scheme can be used which will raise an UnknownKeysError (https://github.com/dry-rb/dry-types/blob/master/lib/dry/types/hash/schema.rb#L140)

So only the last option could work, however:

the UnknownKeysError is not catched so it will stop all further processing including any other errors/hints which could already be detected
there is no way to combine for example symbolize keys with detection of unexpected keys, so additional combinations of Dry:Types Hash schemes should be added

Did I get this correctly?

If so, I tried an alternative approach in this POC commit https://github.com/pvmeerbe/dry-types/commit/e2ac4c07eec4eff692ee1b0eaf13566c80bf08c5

The idea is to always collect unexpected keys during parsing withDry:Types Hash schemes.
The current POC gathers all these keys (even from nested schemes) and returns it as a separate Hash structure

Some thoughts on this:

The storage under the __ignored_data key feels clumsy, alternative could be to return an additional hash structure as second parameter with this info or even construct some kind of result object which contains the parsed data/unexpected keys/success-failure status ? However I’ve got no idea where this Schema#try method is used and whether changing this behavior would affect other code. for sure the dry-validation pipeline handling of Input processing should change
I guess always collecting this data would have an impact on performance? I guess you don’t need this ignored_data info in many cases, so this seems overhead

Perhaps someone can give alternative approaches/improvements/comments on this topic?

solnic · December 20, 2016, 11:35am

Looks like you did get it right

This feature is on the todo list for 1.0.0. We have the infrastructure in place for applying a predicate that will check the whole input for unexpected keys, the problem that must be solved is how to represent errors for this type of failure. Messages structure is based on the hash keys, so this doesn’t really work in case of unexpected keys, as this is the type of failure that’s associated with the whole input. This means we probably want to extend Result API with a method that returns failures associated with the whole input (and really, this doesn’t have to be limited to unexpected keys, ie imagine an array as the input that must have at least 4 elements, but the input has 3 elements).

Does this make sense?

pvmeerbe · December 20, 2016, 2:26pm

As far as I can see

Dry::Logic::Result doesn’t care whether it is a global (aka ‘input’ rule ) or field level result (aka ‘normal’ rule).
dry-validation executor uses the ‘nil’ key to store the Dry::Logic::Result for the ‘input’ rule inside an Dry::Validation::Result (https://github.com/dry-rb/dry-validation/blob/master/lib/dry/validation/executor.rb#L24 & https://github.com/dry-rb/dry-validation/blob/master/lib/dry/validation/schema.rb#L73)

So using the following custom input predicate & the POC above I was able to get the error (although non-very useful message since I cannot pass in which keys are not supported as I cannot pass in variables into the message ) :

class BaseValidator < Dry::Validation.Schema

    def self.messages
         super.merge(en: {
             errors: {
                 only_defined_keys_present?: "non-supported key is provided",
                 changes_requested?: "at least one property to update must be defined",
          }
      })
    end

    def changes_requested?(input)
      input.keys.present?
    end

    def only_defined_keys_present?(input)
      input[:__ignored_data].nil?
    end           
    
    define! do
            # top level validation on being a hash and reporting non-supported keys
            input(:hash?, :only_defined_keys_present?, :changes_requested?)
     end
end

In fact I only used the existing infrastructure, but had to use the ugly trick with the __ignore_data key to get the ignored_data into the ‘input’ rule.

So I don’t really understand why the Result API needs to be extended…

Or do you mean implement the unexpected Hash keys check in a separate executor step (aka ‘unexpected keys’ step) & build new error items in this process? If so, somehow the unexpected Hash keys still need to be kept separately as they are already removed during the ProcessInput executor step.
Doing this ‘unexpected keys’ step before the ProcessInput step seems difficult as the input data structure is not verified/coerced yet…

I think the problem lies in the executor steps. What if these are modified to:

execute the ProcessInput step with input hash & which returns coerced data hash + a structure of ignored_data it encountered during parsing
execute the ApplyRules step with coerced data hash + ignored_data
execute the ApplyChecks step with coerced data hash

This way one can choose to use the ignored_data hash inside a global input rule + the rest of the infrastructure is not affected.

The Dry::Types::Safe#call & Dry::Types::Hash#try should also change since we need a way to get this ignored_data structure. This may have bigger impacts… Always returning an additional hash structure which contains ignored keys is not so easy as the #try routines are shared over multiple Dry::Types, for example the Array::Member…

pvmeerbe · February 14, 2017, 11:05am

Just realized my proposed approach won’t work for normal Dry::Validation.Schema schema’s as they will by default use the NOOP Input processor (https://github.com/dry-rb/dry-validation/blob/master/lib/dry/validation/schema/class_interface.rb#L11). I guess a more generic solution is needed.

Did I correctly understand : when using the default Dry::Validation.Schema input keys not specified as ‘required’ or ‘optional’ will be ignored by validation and still be present in the output ?

I.e. when using default Dry::Validation.Schema:

InputProcessing step : the given input data will simply be returned unaltered by the NOOP processor
InputRule step : the unaltered input data is provided to do custom handling
Rule step: only the specified ‘required’ and ‘optional’ key validation will be handled, thus ignoring unspecified keys
calling output on the result of the validation still contains the unspecified keys

If so , this seems strange as I expected the output to contain only the ‘trusted’ data based on the specified validation rules which is the case when using Form or JSON validation schemas (it actually feels like a bug to me…).

I was able to get ‘trusted’ output by adding the following to my schema :

configure do |config|
      config.input_processor = :sanitizer
end

However this solution has 2 disadvantages :

I cannot act upon unspecified keys, similar to the case with Form & JSON + I cannot reuse the same patch as this time different Dry::Types are used…
specifying uncoercable input data is not safely handled as with JSON/Form , for example provide a string as input data instead of a Hash will raise
NoMethodError: undefined method `keys’ for “something”:String

This last point seems actually a bug in the :sanitizer input processor? Let me now if I should create an issue ticket for this…

IMHO it seems dry-validation is missing a more generic system to catch unspecified keys and act upon them.

Perhaps an additional step can be added to the pipeline for this, however this means the complete input structure has to be processed again which would most-likely affect the performance…

Perhaps altering the ‘InputProcessing’ pipeline step to somehow store the unspecified keys (independently from the JSON/Form/Schema selection) and present them to the ‘Input’ pipeline step makes more sense.

Topic		Replies	Views
Dry-validation (v1.0.0rc3) - dynamic keys dry-validation	4	1745	May 20, 2019
Ignore nested key validation dry-validation	1	525	September 23, 2022
Unexpected behaviour with validate_keys	2	567	February 19, 2021
Excluding invalid key value pairs in dry-validation dry-validation	3	1021	November 8, 2019
[dry-schema] Unexpected results when key ends with question mark Support	1	509	March 7, 2020

Catch unexpected Hash keys in dry-validation

Related topics