Child pages
  • [in-commerce] Product import fails, when fields have line endings in them [5.2.1-RC1]
Skip to end of metadata
Go to start of metadata

When importing a CSV file with a products, where one (or several) fields have line endings in them, then PHP build-in fgetcsv function fails to do that and is considering line ending inside a field value (even if field value is wrapped with double-quotes) as record end.

I'm thinking, that we should switch to alternative CSV file parser to be able to parse any kind of CSV files without problems. Googling a bit about this subject I've found several unhappy users:

and a link to decent CSV parser - https://code.google.com/p/parsecsv-for-php/. Parser itself wasn't changed since 2008 year, but that doesn't matter much, since CSV parsing rules haven't changed either (smile).

I'm open to suggestions and if there is an intelligent CSV parser out there, that follows ALL rules (see http://en.wikipedia.org/wiki/Comma-separated_values#Toward_standardization). And surely PHP's built-in CSV parser is not following all of them, which is bad.

Related Discussions

Related Tasks

INP-1329 - Getting issue details... STATUS

2 Comments

  1. After 3 days long debugging marathon problem was discovered in code, that generated CSV file in first place. We were using custom-coded fputcsv function implementation, that wasn't escaping numbers. Probably it was a quick fix for older Excel numbers, but surely we don't need it anymore after all these years.

    Here is fixed function version:

    function fputcsv2($filePointer, $data, $delimiter = ',', $enclosure = '"', $recordSeparator = "\r\n")
    {
    	fputcsv($filePointer, $data, $delimiter, $enclosure);
    
    	if ( $recordSeparator != "\n" && fseek($filePointer, -1, SEEK_CUR) === 0 ) {
    		fwrite($filePointer, $recordSeparator);
    	}
    }

    This was original function version:

    function fputcsv2($filePointer, $data, $delimiter = ',', $enclosure = '"', $recordSeparator = "\r\n")
    {
    	foreach($data as $field_index => $field_value) {
    		// replaces an enclosure with two enclosures
    		$data[$field_index] = str_replace($enclosure, $enclosure.$enclosure, $field_value);
    	}
    
    	$line = $enclosure.implode($enclosure.$delimiter.$enclosure, $data).$enclosure.$recordSeparator;
    	$line = preg_replace('/'.preg_quote($enclosure, '/').'([0-9\.]+)'.preg_quote($enclosure, '/').'/', '$1', $line);
    	fwrite($filePointer, $line);
    }
  2. After deploying to live server we've discovered that introduced fseek method call revealed another bug with kCatDBItemExportHelper::openFile method, where UTF-8 BOM signature was checked during export as well and was resulting in 2nd+ export step overwriting products written to CSV file on 1st step. Here is fixed method version:

    function openFile(&$event)
    {
    	$file_helper =& $this->Application->recallObject('FileHelper');
    	/* @var $file_helper FileHelper */
    
    	$file_helper->CheckFolder(EXPORT_PATH);
    
    
    	if ($event->Special == 'export') {
    		$first_step = $this->exportOptions['start_from'] == 0;
    		$this->filePointer = fopen($this->getExportFilename(), $first_step ? 'w' : 'r+');
    
    
    		if ( !$first_step ) {
    			fseek($this->filePointer, 0, SEEK_END);
    		}
    	}
    	else {
    		$this->filePointer = fopen($this->getImportFilename(), 'r');
    
    
    		// skip UTF-8 BOM Modifier
    		$first_chars = fread($this->filePointer, 3);
    
    
    		if (bin2hex($first_chars) != 'efbbbf') {
    			fseek($this->filePointer, 0);
    		}
    	}
    }