Unless you want loads of weird characters displayed in a php document object it is very important to use encoding and decode the orginal string to be processed first when loading the org. string

eg this uses utf-8

$xml = new DOMDocument();
$xml->encoding = 'utf-8';
@$xml->loadHTML(utf8_decode($string));

Bye bye weird characters (doc objects in php tend to change the character codings, so this avoids the problem)