Friday, June 30, 2006
Importing A Variable Number Of Fields
One of our clients is working on a cool astrophysics project and wants to import large quantities of observations. Fortunately, the data is in a simple percent delimited format:
SEE%5%8 XYZ%6%8 ZZZ%7
The fields represent the time, the instrument, the type of observation (solar flare, brightness measurement), the co-ordinates and the result. There are two types of problem with this format. Firstly, data is not normalized - its just a flatfile. Secondly, many of the fields have a context which depends on other fields. Formats with these properties are quite common in the wild but how to handle them?
The following proof of concept code allows percent delimited data to be read directly by MySQL Server and splits one type of data to a separate table. This is achieved with a case statement in a MySQL Server 5.0 stored procedure, called from a trigger:
DELIMITER //
CREATE TABLE observation (
  id INT PRIMARY KEY AUTO_INCREMENT,
  instrument CHAR(20) NOT NULL,
  x INT,
  y INT
) //
CREATE TABLE observation_xyz (
  id INT PRIMARY KEY AUTO_INCREMENT,
  x INT,
  y INT
) //
CREATE PROCEDURE classify (IN id INT, IN instrument CHAR(20), IN x INT, IN y INT)
BEGIN
  CASE instrument
    WHEN 'XYZ' THEN
      BEGIN
        INSERT INTO observation_xyz (id,x,y) VALUES (id,x,y);
      END;
    ELSE BEGIN END;
  END CASE;
END //
CREATE TRIGGER observation_classify_after AFTER INSERT ON observation
  FOR EACH ROW CALL classify (NEW.id, NEW.instrument, NEW.x, NEW.y) //
LOAD DATA INFILE 'observations.txt' REPLACE INTO TABLE observation FIELDS TERMINATED BY '%' (@ins,@x,@y) SET instrument=@ins, x=@x, y=@y //
SELECT * FROM observation //
SELECT * FROM observation_xyz //Import and export files are visible to the server, not necessarily the client. So, a small amount of PHP would allow form upload and generate a query which references the uploaded file on the server.
Finally, for deployment, the import table could use the Blackhole storage engine. This would eliminate tidying tasks.


