Need Time-Efficient Method of Importing Large CSV File Via PHP Into Multiple MySQL Tables

Jai*_*man 6 php mysql csv import codeigniter

Okay, I'm having some serious issues here. I'm new to this site, and new to dealing with importing CSV data via PHP, but I'm not new to programming.

Currently, I'm working on building a Customer Relationship Manager. I need to create a script to import a file that will populate the database with leads. The main issue here is that the Lead data consists of Companies and Employees of said Company. Also, a few other tables are split off, such as billing information, from the main tables.

I have a working script that will allow users to map the imported data to specific rows and columns.

function mapData($file) {
    // Open the Text File
    $fd = fopen($file, "r");

    // Return FALSE if file not found
    if(!$fd) {
        return FALSE;
    }

    // Get the First Two Lines
    $first = 0;
    $data = array();
    while(!feof($fd)) {
        if($first == 0) {
            $cols = fgetcsv($fd, 4096);
            $data['cols'] = array();
            if(is_array($cols) && count($cols)) {
                foreach($cols as $col) {
                    if(!$col) {
                        continue;
                    }
                    $data['cols'][] = $col;
                }
            }
            if(empty($data['cols'])) {
                return array();
            }
            $first++;
            continue;
        }
        else {
            $data['first'] = fgetcsv($fd, 4096);
            break;
        }
    }
    fclose($fd);

    // Return Data
    return $data;
}
Run Code Online (Sandbox Code Playgroud)

上述脚本仅在CodeIgniter将文件移动到工作目录后激活.到目前为止我已经知道文件名是什么了.该文件进入并返回列列表和第一行.任何空列都将被忽略.

在此之后,进程将传递给映射脚本.完成映射并按下"导入"后,将加载此代码.

function importLeads($file, $map) {
    // Open the Text File
    if(!file_exists($file)) {
        return false;
    }
    error_reporting(E_ALL);
    set_time_limit(240);
    ini_set("memory_limit", "512M");
    $fd = fopen($file, "r");

    // Return FALSE if file not found
    if(!$fd) {
        return FALSE;
    }

    // Traverse Each Line of the File
    $true = false;
    $first = 0;
    while(!feof($fd)) {
        if($first == 0) {
            $cols = fgetcsv($fd);
            $first++;
            continue;
        }

        // Get the columns of each line
        $row = fgetcsv($fd);

        // Traverse columns
        $group = array();
        $lead_status = array();
        $lead_type = array();
        $lead_source = array();
        $user = array();
        $user_cstm = array();
        $user_prof = array();
        $acct = array();
        $acct_cstm = array();
        $acct_prof = array();
        $acct_group = array();
        if(!$row) {
            continue;
        }
        foreach($row as $num => $val) {
            if(empty($map[$num])) {
                continue;
            }
            $val = str_replace('"', """, $val);
            $val = str_replace("'", "'", $val);
            switch($map[$num]) {
            // Company Account
            case "company_name":
                $acct['company_name'] = $val;
                break;
            case "lead_type":
                $lead_type['name'] = $val;
                break;
            case "lead_source":
                $lead_source['name'] = $val;
                break;
            case "lead_source_description":
                $lead_source['name'] = $val;
                break;
            case "campaign":
                $campaign['name'] = $val;
                break;
            case "mcn":
                $acct['mcn'] = $val;
                break;
            case "usdot":
                $acct['usdot'] = $val;
                break;
            case "sic_codes":
                $acct_cstm['sic_codes'] = $val;
                break;
            case "naics_codes":
                $acct_cstm['naics_codes'] = $val;
                break;
            case "agent_assigned":
                $acct_cstm['agent_assigned'] = $val;
                break;
            case "group_assigned":
                $group['name'] = $val;
                break;
            case "rating":
                $acct_cstm['rating'] = $val;
                break;
            case "main_phone":
                $acct['phone'] = $val;
                break;
            case "billing_phone":
                $acct_cstm['billing_phone'] = $val;
                break;
            case "company_fax":
                $acct['fax'] = $val;
                break;
            case "company_email":
                $acct['email2'] = $val;
                break;

            // Company Location
            case "primary_address":
                $acct['address'] = $val;
                break;
            case "primary_address2":
                $acct['address2'] = $val;
                break;
            case "primary_city":
                $acct['city'] = $val;
                break;
            case "primary_state":
                $acct['state'] = $val;
                break;
            case "primary_zip":
                $acct['zip'] = $val;
                break;
            case "primary_country":
                $acct['country'] = $val;
                break;
            case "billing_address":
                $billing['address'] = $val;
                break;
            case "billing_address2":
                $billing['address2'] = $val;
                break;
            case "billing_city":
                $billing['city'] = $val;
                break;
            case "billing_state":
                $billing['state'] = $val;
                break;
            case "billing_zip":
                $billing['zip'] = $val;
                break;
            case "billing_country":
                $billing['country'] = $val;
                break;
            case "company_website":
                $acct_cstm['website'] = $val;
                break;
            case "company_revenue":
                $acct_cstm['revenue'] = $val;
                break;
            case "company_about":
                $acct_prof['aboutus'] = $val;
                break;

            // Misc. Company Data
            case "bols_per_mo":
                $acct_cstm['approx_bols_per_mo'] = $val;
                break;
            case "no_employees":
                $acct_cstm['no_employees'] = $val;
                break;
            case "no_drivers":
                $acct_prof['drivers'] = $val;
                break;
            case "no_trucks":
                $acct_prof['power_units'] = $val;
                break;
            case "no_trailers":
                $acct_cstm['no_trailers'] = $acct_prof['trailers'] = $val;
                break;
            case "no_parcels_day":
                $acct_cstm['no_parcels_day'] = $val;
                break;
            case "no_shipping_locations":
                $acct_cstm['no_shipping_locations'] = $val;
                break;
            case "approves_inbound":
                $acct_cstm['approves_inbound'] = $val;
                break;
            case "what_erp_used":
                $acct_cstm['what_erp_used'] = $val;
                break;
            case "birddog":
                $acct_cstm['birddog_referral'] = $val;
                break;
            case "status_notes":
                $acct_cstm['status_notes'] = $val;
                break;
            case "notes":
                $acct_cstm['notes'] = $val;
                break;
            case "internal_notes":
                $acct_cstm['notes_internal'] = $val;
                break;

            // User Data
            case "salutation":
                $user_cstm['salutation'] = $val;
                break;
            case "first_name":
                $user['first_name'] = $billing['first_name'] = $val;
                break;
            case "last_name":
                $user['last_name'] = $billing['last_name'] = $val;
                break;
            case "user_title":
                $user_prof['title'] = $val;
                break;
            case "user_about":
                $user_prof['about'] = $val;
                break;
            case "user_email":
                $user['email'] = $val;
                break;
            case "home_phone":
                $user_prof['phone'] = $val;
                break;
            case "mobile_phone":
                $user_cstm['mobile_phone'] = $val;
                break;
            case "direct_phone":
                $user_cstm['direct_phone'] = $val;
                break;
            case "user_fax":
                $user_prof['fax'] = $val;
                break;
            case "user_locale":
                $user['location'] = $val;
                break;
            case "user_website":
                $user_prof['website_url'] = $val;
                break;
            case "user_facebook":
                $user_prof['fb_url'] = $val;
                break;
            case "user_twitter":
                $user_prof['twitter_url'] = $val;
                break;
            case "user_linkedin":
                $user_prof['linkedin_url'] = $val;
                break;
            }
        }
        if(empty($acct['company_name']) || empty($user['first_name']) || empty($user['last_name'])) {
            continue;
        }
        $this->db = $this->load->database('crm_db', TRUE);
        if(isset($lead_type['name']) && ($name = $lead_type['name'])) {
            $count = $this->db->count_all("lead_types");
            $check = $this->db->get_where("lead_types", array("name" => $name));
            if($check->num_rows() < 1) {
                $this->db->insert("lead_types", array("name" => $name, "order" => $count));
                $ltype = $this->db->insert_id();
                $acct_cstm['lead_type'] = $acct['account_type'] = $user['company_type'] = $ltype;
            }
        }
        if(isset($lead_source['name']) && ($name = $lead_source['name'])) {
            $count = $this->db->count_all("lead_sources");
            $check = $this->db->get_where("lead_sources", array("name" => $name));
            if($check->num_rows() < 1) {
                $this->db->insert("lead_sources", array("name" => $name, "order" => $count));
                $acct_cstm['lead_source'] = $this->db->insert_id();
            }
        }
        if(isset($campaign['name']) && ($name = $campaign['name'])) {
            $check = $this->db->get_where("campaigns", array("name" => $name));
            if($check->num_rows() < 1) {
                $campaign['id'] = $accounts_cstm['campaign'] = $this->Secure_m->generate_sugar_id();
                $campaign['date_entered'] = time();
                $campaign['date_modified'] = time();
                $campaign['modified_user_id'] = $this->session->userdata('id');
                $campaign['created_by'] = $this->session->userdata('id');
                $this->db->insert("campaigns", $campaign);
            }
        }
        if(isset($group['name']) && ($name = $group['name'])) {
            $order = $this->db->count_all("groups");
            $check = $this->db->get_where("groups", array("name" => $name));
            if($check->num_rows() < 1) {
                $this->db->insert("groups", array("name" => $name, "order" => $order));
                $acct_group['id'] = $this->db->insert_id();
            }
        }
        $mem = new stdclass;
        $uid = 0;
        if(is_array($user) && count($user)) {
            $where = "";
            if(!empty($user['phone'])) {
                $where .= "prof.phone = '{$user['phone']}' OR ";
                $where .= "cstm.mobile_phone = '{$user['phone']}' OR ";
                $where .= "cstm.direct_phone = '{$user['phone']}'";
            }
            if(!empty($user['mobile_phone'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "prof.phone = '{$user['mobile_phone']}' OR ";
                $where .= "cstm.mobile_phone = '{$user['mobile_phone']}' OR ";
                $where .= "cstm.direct_phone = '{$user['mobile_phone']}'";
            }
            if(!empty($user['direct_phone'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "prof.phone = '{$user['direct_phone']}' OR ";
                $where .= "cstm.mobile_phone = '{$user['direct_phone']}' OR ";
                $where .= "cstm.direct_phone = '{$user['direct_phone']}'";
            }
            $query = $this->db->query($this->Account_m->userQuery($where));
            $mem = reset($query->result());
            if($where && !empty($mem->id)) {
                $uid = $mem->id;
                $new = array();
                foreach($user as $k => $v) {
                    if(!empty($mem->$k)) {
                        $new[$k] = $mem->$k;
                        unset($user[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("leads", $user, array("id" => $uid));
                $user = $new;
            }
            else {
                $user['uxtime'] = time();
                $user['isclient'] = 0;
                $user['flag'] = 0;
                $user['activation_code'] = $this->Secure_m->generate_activate_id();
                $uid = $this->Secure_m->generate_activate_id(10);
                $query = $this->db->get_where("leads", array("id" => $uid), 1);
                $data = reset($query->result());
                while(!empty($data->id)) {
                    $uid = $this->Secure_m->generate_activate_id(10);
                    $query = $this->db->get_where("leads", array("id" => $uid), 1);
                    $data = reset($query->result());
                }
                $user['id'] = $uid;
                $this->db->insert("leads", $user);
            }
        }
        if($uid && is_array($user_prof) && count($user_prof)) {
            if(!empty($mem->uid)) {
                $new = array();
                foreach($user_prof as $k => $v) {
                    if(!empty($mem->$k)) {
                        $new[$k] = $mem->$k;
                        unset($user_prof[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("mprofiles", $user_prof, array("uid" => $uid));
                $user_prof = $new;
            }
            else {
                $user_prof['uid'] = $uid;
                $user_prof['flag'] = 0;
                $this->db->insert("ldetails", $user_prof);
            }
        }
        if($uid && is_array($user_cstm) && count($user_cstm)) {
            $query = $this->db->get_where("leads_cstm", array("crm_id" => $cid), 1);
            $data = reset($query->result());
            if(!empty($data->crm_id)) {
                $new = array();
                foreach($user_cstm as $k => $v) {
                    if(!empty($mem->$k)) {
                        $new[$k] = $mem->$k;
                        unset($user_cstm[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("leads_cstm", $acct_prof, array("fa_user_id" => $cid));
                $user_cstm = $new;
            }
            else {
                $user_cstm['crm_id'] = $uid;
                $user_cstm['date_entered'] = time();
                $user_cstm['date_modified'] = time();
                $user_cstm['created_by'] = $this->session->userdata('id');
                $user_cstm['modified_user_id'] = $this->session->userdata('id');
                $this->db->insert("leads_cstm", $user_cstm);
            }
        }
        $cmp = new stdclass;
        $cid = 0;
        if(is_array($acct) && count($acct)) {
            $acct['uid'] = $uid;
            $acct['main_contact'] = "{$user['first_name']} {$user['last_name']}";
            if(!empty($user['email'])) {
                $acct['email'] = $user['email'];
            }
            $acct['isprospect'] = 0;
            $acct['flag'] = 0;
            if(!empty($acct['mcn'])) {
                $where .= "fms.mcn = '{$acct['mcn']}'";
            }
            if(!empty($acct['phone'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "fms.phone = '{$acct['phone']}' OR ";
                $where .= "crm.billing_phone = '{$acct['phone']}'";
            }
            if(!empty($acct['billing_phone'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "fms.phone = '{$acct['billing_phone']}' OR ";
                $where .= "crm.billing_phone = '{$acct['billing_phone']}'";
            }
            if(!empty($acct['company_name'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "fms.company_name = '{$acct['company_name']}'";
            }
            $query = $this->db->query($this->Account_m->acctQuery($where));
            $cmp = reset($query->result());
            if($where && !empty($cmp->id)) {
                $cid = $cmp->id;
                $new = array();
                foreach($acct as $k => $v) {
                    if(!empty($cmp->$k)) {
                        $new[$k] = $cmp->$k;
                        unset($acct[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("accounts", $billing, array("cid" => $cid));
                $acct = $new;
            }
            else {
                $cid = $this->Secure_m->generate_activate_id(10);
                $query = $this->db->get_where("leads", array("id" => $uid), 1);
                $data = reset($query->result());
                while(!empty($data->id)) {
                    $cid = $this->Secure_m->generate_activate_id(10);
                    $query = $this->db->get_where("accounts", array("id" => $cid), 1);
                    $data = reset($query->result());
                }
                $acct['id'] = $cid;
                $this->db->insert("accounts", $acct);
            }
        }
        if($cid && is_array($acct_group) && count($acct_group)) {
            $grp = $this->db->get_where("accounts_groups", array("cid" => $cid, "gid" => $acct_group['id']));
            if(empty($cmp->id)) {
                $acct_group['cid'] = $cid;
                $this->db->insert("accounts_groups", $acct_group);
            }
        }
        if($cid && is_array($acct_prof) && count($acct_prof)) {
            if(!empty($cmp->id)) {
                $new = array();
                foreach($acct_prof as $k => $v) {
                    if(!empty($cmp->$k)) {
                        $new[$k] = $cmp->$k;
                        unset($acct_prof[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("cprofiles", $acct_prof, array("cid" => $cid));
                $acct_prof = $new;
            }
            else {
                $acct_prof['cid'] = $cid;
                $acct_prof['flag'] = 0;
                $this->db->insert("adetails", $acct_prof);
            }
        }
        if($cid && is_array($billing) && count($billing)) {
            $bill = $this->db->get_where("accounts_billing", array("cid" => $cid));
            if(!empty($bill->id)) {
                $new = array();
                foreach($acct_prof as $k => $v) {
                    if(!empty($cmp->$k)) {
                        $new[$k] = $cmp->$k;
                        unset($acct_prof[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("accounts_billing", $billing, array("cid" => $cid));
            }
            else {
                $billing['cid'] = $cid;
                $billing['flag'] = 0;
                $this->db->insert("accounts_billing", $billing);
            }
        }
        if($cid && $uid) {
            $this->db->update("leads", array("cid" => $cid), array("id" => $uid));
        }
        if($cid && is_array($acct_cstm) && count($acct_cstm)) {
            $query = $this->db->get_where("accounts_cstm", array("crm_id" => $cid), 1);
            $data = reset($query->result());
            if(!empty($data->crm_id)) {
                $new = array();
                foreach($acct_cstm as $k => $v) {
                    if(!empty($cmp->$k)) {
                        $new[$k] = $cmp->$k;
                        unset($acct_cstm[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("accounts_cstm", $acct_cstm, array("crm_id" => $cid));
                $acct_cstm = $new;
            }
            else {
                $acct_cstm['crm_id'] = $cid;
                $acct_cstm['date_entered'] = time();
                $acct_cstm['date_modified'] = time();
                $acct_cstm['created_by'] = $this->session->userdata('id');
                $acct_cstm['modified_user_id'] = $this->session->userdata('id');
                if(empty($acct_cstm['rating'])) {
                    $acct_cstm['rating'] = 1;
                }
                $this->db->insert("accounts_cstm", $acct_cstm);
            }
        }
        $true = TRUE;
    }
    fclose($fd);

    return $true;
}
Run Code Online (Sandbox Code Playgroud)

现在,据我所知,脚本运行得非常好.实际的代码本身并没有错.问题是,在大约400-500行之后,脚本就会停止.我没有收到错误,但没有处理更多代码.

我知道这是因为我之后有代码应该通过AJAX返回重定向页面.但是,我在importLeads函数中的循环之后没有任何内容加载.

I'm not sure how to make this script more efficient... I'm positive it is timing out, but I don't know how to make it run more efficiently. I NEED this script to process all the information above separately. I have a variety of separate tables that all link together, and this import script has to set everything up in different ways.

I've talked with my client about this project. This script works when I drop it to around 400 rows. He has some a lot of these CSV files that are around 75,000 rows. The one I am importing is a smaller one, only about 1,200 rows.

I've tried looking into alternate methods, such as MySQL's import script, but I can't do that because this script must import data into separate tables, and it must check for existing data first. I'm also supposed to have all empty fields update with imported information, but that will make this even worse.

If anyone knows of a more efficient method it would be much appreciated. I tried to be as detailed as I could. Of note, I will mention that I'm using CodeIgniter, but if there's a more efficient way that doesn't use CodeIgniter I'll take it (I can still put it into a CI model, though).

Bil*_*win 6

我编写了PHP脚本来批量加载Stack Overflow数据转储发布的数据.我导入了数百万行,并且不需要那么长时间.

以下是一些提示:

  • 不要依赖自动提交. 为每一行启动和提交事务的开销是巨大的.使用显式事务,并在每1000行(或更多行)之后提交.

  • 使用预备陈述. 由于您基本上执行相同的插入数千次,因此可以在开始循环之前准备每个插入,然后在循环期间执行,将值作为参数传递.我不知道如何使用CodeIgniter的数据库库,你必须弄明白.

  • 调整MySQL导入. 增加缓存缓冲区等.有关更多信息,请参阅INSERT语句的速度.

  • 使用LOAD DATA INFILE. 如果可能的话.它比使用INSERT逐行加载数据快20倍.我明白,如果你不能,因为你需要获得最后一个插入ID等等.但在大多数情况下,即使您读取CSV文件,重新排列并将其写入多个临时CSV文件,数据加载仍然比使用INSERT更快.

  • 离线做吧. 不要在Web请求期间运行长时间运行的任务.PHP请求的时间限制将终止作业,如果不是今天,那么下周二当作业延长10%时.而是将Web请求队列作为作业,然后将控制权返回给用户.您应该将数据导入作为服务器进程运行,并定期允许用户瞥见进度.例如,一种廉价的方法是让你的导入脚本输出"." 到临时文件,然后用户可以请求查看临时文件并继续在其浏览器中重新加载.如果你想获得幻想,可以使用Ajax做点什么.