It would be nice if the expected keys are automatically derived from the required & optional keys mentioned in the schema. It would even be better if this would work on nested schemes as well.
The proposed ‘input’ predicate solution introduces a problem
the unexpected hash keys are either already removed from the data param presented to the input predicate (weak/symbolized/… Dry::Type hash schema)
or the raised Error is catched and ignored due to the ‘Safe’ class of dry-types ( form.hash/json.hash with strict Dry::Type hash schema --> the Safe class catches the error)
or an error is raised and further error/hint processing is stopped (with strict Dry::Type hash schema)
the output of the ‘input processor’ pipeline step is used as input for the ‘input rule’ pipeline step
json/form : here the ‘Safe’ Hash class is used which will catch any error thown for unexpected keys, so none of the Dry::types Hash schemes can be used to detect unexpected keys
NOOP : just returns the input without any checks so cannot be used
the UnknownKeysError is not catched so it will stop all further processing including any other errors/hints which could already be detected
there is no way to combine for example symbolize keys with detection of unexpected keys, so additional combinations of Dry:Types Hash schemes should be added
The idea is to always collect unexpected keys during parsing withDry:Types Hash schemes.
The current POC gathers all these keys (even from nested schemes) and returns it as a separate Hash structure
Some thoughts on this:
The storage under the __ignored_data key feels clumsy, alternative could be to return an additional hash structure as second parameter with this info or even construct some kind of result object which contains the parsed data/unexpected keys/success-failure status ? However I’ve got no idea where this Schema#try method is used and whether changing this behavior would affect other code. for sure the dry-validation pipeline handling of Input processing should change
I guess always collecting this data would have an impact on performance? I guess you don’t need this ignored_data info in many cases, so this seems overhead
Perhaps someone can give alternative approaches/improvements/comments on this topic?
This feature is on the todo list for 1.0.0. We have the infrastructure in place for applying a predicate that will check the whole input for unexpected keys, the problem that must be solved is how to represent errors for this type of failure. Messages structure is based on the hash keys, so this doesn’t really work in case of unexpected keys, as this is the type of failure that’s associated with the whole input. This means we probably want to extend Result API with a method that returns failures associated with the whole input (and really, this doesn’t have to be limited to unexpected keys, ie imagine an array as the input that must have at least 4 elements, but the input has 3 elements).
So using the following custom input predicate & the POC above I was able to get the error (although non-very useful message since I cannot pass in which keys are not supported as I cannot pass in variables into the message ) :
class BaseValidator < Dry::Validation.Schema
def self.messages
super.merge(en: {
errors: {
only_defined_keys_present?: "non-supported key is provided",
changes_requested?: "at least one property to update must be defined",
}
})
end
def changes_requested?(input)
input.keys.present?
end
def only_defined_keys_present?(input)
input[:__ignored_data].nil?
end
define! do
# top level validation on being a hash and reporting non-supported keys
input(:hash?, :only_defined_keys_present?, :changes_requested?)
end
end
In fact I only used the existing infrastructure, but had to use the ugly trick with the __ignore_data key to get the ignored_data into the ‘input’ rule.
So I don’t really understand why the Result API needs to be extended…
Or do you mean implement the unexpected Hash keys check in a separate executor step (aka ‘unexpected keys’ step) & build new error items in this process? If so, somehow the unexpected Hash keys still need to be kept separately as they are already removed during the ProcessInput executor step.
Doing this ‘unexpected keys’ step before the ProcessInput step seems difficult as the input data structure is not verified/coerced yet…
I think the problem lies in the executor steps. What if these are modified to:
execute the ProcessInput step with input hash & which returns coerced data hash + a structure of ignored_data it encountered during parsing
execute the ApplyRules step with coerced data hash + ignored_data
execute the ApplyChecks step with coerced data hash
This way one can choose to use the ignored_data hash inside a global input rule + the rest of the infrastructure is not affected.
The Dry::Types::Safe#call & Dry::Types::Hash#try should also change since we need a way to get this ignored_data structure. This may have bigger impacts… Always returning an additional hash structure which contains ignored keys is not so easy as the #try routines are shared over multiple Dry::Types, for example the Array::Member…
Did I correctly understand : when using the default Dry::Validation.Schema input keys not specified as ‘required’ or ‘optional’ will be ignored by validation and still be present in the output ?
I.e. when using default Dry::Validation.Schema:
InputProcessing step : the given input data will simply be returned unaltered by the NOOP processor
InputRule step : the unaltered input data is provided to do custom handling
Rule step: only the specified ‘required’ and ‘optional’ key validation will be handled, thus ignoring unspecified keys
calling output on the result of the validation still contains the unspecified keys
If so , this seems strange as I expected the output to contain only the ‘trusted’ data based on the specified validation rules which is the case when using Form or JSON validation schemas (it actually feels like a bug to me…).
I was able to get ‘trusted’ output by adding the following to my schema :
configure do |config|
config.input_processor = :sanitizer
end
However this solution has 2 disadvantages :
I cannot act upon unspecified keys, similar to the case with Form & JSON + I cannot reuse the same patch as this time different Dry::Types are used…
specifying uncoercable input data is not safely handled as with JSON/Form , for example provide a string as input data instead of a Hash will raise
NoMethodError: undefined method `keys’ for “something”:String
This last point seems actually a bug in the :sanitizer input processor? Let me now if I should create an issue ticket for this…
IMHO it seems dry-validation is missing a more generic system to catch unspecified keys and act upon them.
Perhaps an additional step can be added to the pipeline for this, however this means the complete input structure has to be processed again which would most-likely affect the performance…
Perhaps altering the ‘InputProcessing’ pipeline step to somehow store the unspecified keys (independently from the JSON/Form/Schema selection) and present them to the ‘Input’ pipeline step makes more sense.