If the html or image in a xml element, I would expect it's encoded, such as base64 encode. Then you can use xpath assertion to get the encoded string, and then decode it accordingly.
Otherwise, the response should return those contents in MIME multi-part format. Then you may use ${response.parts.x} to get the binary data, and ${response.parts.x.body} to get the text data.