Categories
PHP

PHP: Do not use fopen() to read XML files into xml_parse(

was using the following function to get a xml url from the web, create a handle,  read the content, and then feed it to xml_parse():

if (!($fp = fopen($xml_file, ‘r’))) {              die(’Cannot open XML data file: ‘.$xml_file);

return false;

}

$bytes_to_parse = 1024;

while ($data = fread($fp, $bytes_to_parse)) {            $parse = xml_parse($this->xml, $data, feof($fp));

if (!$parse) {

die(sprintf(”XML error: %s at line %d”,                    xml_error_string(xml_get_error_code($this->xml)),

xml_get_current_line_number($this->xml)));                        xml_parser_free($this->xml);

}        }

The problem in doing this, is that the 1024K bytes that gets read in every loop of the while may cut out content.

For example, I was capturing the following xml file:

http://www.nytimes.com/services/xml/rss/nyt/podcasts/techtalk.xml

And this is what I was getting:

2006-09-14 15:37:15,699 DEBUG> /dump/include/class.xmlparser.php:40 parse  – This is $data:

29:49

http://podcasts.nytimes.com/podcasts/2006/09/13/14techtalk.mp3       This week: Tech news, new products from Apple, LCD versus Plasma, and reader questions.

Wed, 13 Sep 2006 05:25:00 EDT

Tom Holcomb and J.D. Biersdorfer of The New York Times              2006-09-14 14:32:24,424 DEBUG> /dump/include/class.xmlparser.php:38 parse  – This is $data: uration>25:51

This week: Tech news, online file storage and reader questions.

Wed, 06 Sep 2006 04:43:00 EDT

Tom Holcomb and J.D. Biersdorfer of The New York Times

27:41

This week: Tech news, online movie downloads, and reader questions.

Wed, 30 Aug 2006 04:00:00 EDT

2006-09-14 14:32:24,426 DEBUG> /dump/include/class.xmlparser.php:38 parse  – This is $data: closure length=”26584293″ url=”http://podcasts.nytimes.com/podcasts/2006/08/30/31techtalk.mp3″ type=”audio/mpeg”/>

Notice how the xml file was being cut in several parts, and the beggining of some tags was being included in one section, while the closing of the same tag was included on the next.

Switching to this code solved the problem:

$file_handle = file_get_contents($_POST[’rss_url’]);                         $myFile = new XMLParser($file_handle);

$arr = $myFile->ParseChannelFromXML();

function parse($xml_file)    {

$parse = xml_parse($this->xml, $xml_file);            if (!$parse) {

die(sprintf(”XML error: %s at line %d”,                    xml_error_string(xml_get_error_code($this->xml)),

xml_get_current_line_number($this->xml)));                        xml_parser_free($this->xml

);            }

return true;

}

Now the entire XML file is read once, instead of in several smaller parts, and therefore no cut off happens.

Leave a Reply

Your email address will not be published. Required fields are marked *