Short intro
Continuous development of Ajax Crawler importer led to data import problems. Oracle SQL Loader was used in all previous versions of my crawler, just this time its a multi-table structure in both: datafile and database. Spent almost 2 weeks on the simple subject and after someone pointed out the solution I was not able to find more then 2 references online, so hopefully its the third one for you: "position(1)" - you must reset the loader if you are importing into multiple tables even though this directive looks like a fixed length argument. Table and datafile examples are shortened, just to give you the explanation of control file.
Tables
Im importing invoices into three tables: inv_invoices_imp, inv_invoice_lines_imp, inv_invoice_comments_imp. Inv_invoices_imp contains invoice header, inv_invoice_lines_imp contains accounting and invoice line data, inv_invoice_comments_imp - user comments.
Datafile
Datafile is a HTML file, containing 6 different tables/blocks: some text, header table, some text, lines table, comments table, some text. One datafile contains one invoice data.
SQL Loader config
Oracle SQL Loader can read, parse and load almost any type of data. Any separation, fixed or delimiter separated, single or multiple sources and destinations. For out case we got multistructured datafile and three different destination tables. Options used: truncate table, skip rows, conditional rows, fillers, sequences, foreign keys. If you got stuck with SQL Loader loading only empty lines and having no errors in log files here is a checklist: column names, data types, missed separators, encoding. Only full list I was able to find is here, except the position(1) part.
Full loader.ctl file
OPTIONS (SKIP=1)
LOAD DATA
CHARACTERSET UTF8
INTO TABLE inv_invoices_imp
TRUNCATE
--APPEND
WHEN (1:1) = 'H'
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' trailing nullcols
(dummy0 FILLER, VENDOR_NUM, ACCOUNT_NUM, VENDOR_NAME, VENDOR_ORG, INVOICE_NUM, INVOICE_DATE "to_date(:INVOICE_DATE,'MM/DD/YYYY')",
DUE_DATE "to_date(:DUE_DATE,'MM/DD/YYYY')", VALUTA, AMOUNT "to_number(:AMOUNT,'99999999999.9999')", VALUTA_EX "to_number(:VALUTA_EX,'99999999999.9999')",
AMOUNT_NOK "to_number(:AMOUNT_NOK,'99999999999.9999')", KID, BILAGSNR, dummy1 FILLER, dummy2 FILLER, TAX "to_number(:TAX,'99999999999.9999')",
dummy3 FILLER, dummy4 FILLER, dummy5 FILLER, dummy6 FILLER, dummy7 FILLER, dummy8 FILLER, dummy9 FILLER, DERESREF, dummy10 FILLER,
dummy11 FILLER, dummy12 FILLER, INVOICE_ID EXPRESSION "INV_INVOICES_IMP_SEQ.nextval"
)
INTO TABLE inv_invoice_lines_imp
TRUNCATE
WHEN (1:1) = 'L'
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' trailing nullcols
(dummy0 FILLER POSITION(1), S1, S1_NAME, S2, S3, BELOP "to_number(:BELOP, '999999999999.9999')", DESCRIPTION, VAT_ID,
VAT_AMOUNT "to_number(:VAT_AMOUNT, '999999999999.9999')", BELOP_NOK "to_number(:BELOP_NOK, '999999999999.9999')",
S4, S5, S6, S7, dummy1 FILLER, FAKTURAID EXPRESSION "INV_INVOICES_IMP_SEQ.currval", ID EXPRESSION "INV_INVOICE_LINES_IMP_SEQ.nextval"
)
INTO TABLE inv_invoice_comments_imp
TRUNCATE
WHEN (1:1) = 'C'
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' trailing nullcols
(dummy FILLER POSITION(1), CUSER, ACTION, CDATE, DESCR, FAKTURAID EXPRESSION "INV_INVOICES_IMP_SEQ.currval",
ID EXPRESSION "INV_INVOICE_COMMENTS_IMP_SEQ.nextval"
)
Previous problems and config explanation
UTF8 - charset spec, I suggest you use it even though your file and database are unicode.
FILLER - useful argument, column name going with it may not exist in database table, for the same table - there cant be duplicates, so use them like dummy1, dummy2, etc. If you dont know it yet - here you specify order of your data in a datafile using destination table columns. use FILLER on the data columns that you want to skip.
to_date, to_number - a must use if your destination column is numeric or date, suggest importing them all as VARCHAR2 at first, then converting to desired datatype and checking them one by one.
EXPRESSION
POSITION(1) - hopefully its the directive you are here for. Its used twice in two related tables and placed after the first column in setup. When loading into more than one table, the position has to be reset for each table after the first one, using POSITION(1) with the first field, even though it looks like fixed length directive. If you miss this directive you will end up with nice empty table lines with sequences and foreign keys, no errors in log file. With some luck you might see "all fields were null" message - but you must be very lucky - usually because of some other related error.
EXPRESSION
Sample datafile (chopped)
<...>
H;Leverandørnr;Bankkontonr;Leverandørnavn;Organisasjonsnr;Fakturanr;Fakturadato;Forfallsdato;Valuta;Fakturabeløp;Valutakurs;FakturabeløpNOK;KID;Bilagsnr;Scannebatch;Duplikat;Mvabeløp;Nettobeløp;Fakturatype;Val.dok;Selskapskode;Selskap;Refusjon postnr sted;refusjon Land;Deres Ref;Refusjon navn;Refusjon adresse;
H;40013;62190581506;TUR-RETUR AS - NO 870 989 587;870989587;105358;2/19/2015;3/1/2015;NOK;5064;1;5064;103071053583;
80746991;;N;403;4661;1;;FT;GatoFly AS;;;;;;
L;Konto;Kontonavn;Avdeling;Prosjekt;Beløp;Bilagstekst;MVA-kode;MVA beløp;Beløp NOK;Anlegg;Produkt;Salgssted;Kanal;Sats
L;7135;Reisekostnader;4500;1400;220;Nye FT. opphold H.Hernes 26-28.2/1-3.3;0;0;220;;;;;0
L;7135;Reisekostnader;4500;1400;4844;Nye FT. opphold H.Hernes 26-28.2/1-3.3;1D;358.81;4844;;;;;8
L;Fakturahistorikk
C;Bruker;Handling;Dato;Kommentarer
C;BTIP Connector ;Lagret av BTIPC ;2/24/2015 11:12:12 AM ;E-invoice saved by BTIPC
C;brigde ;Kommentar lagt til ;2/24/2015 11:28:04 AM ;Autosirk- referanse blank
C;brigde ;Grunnlagsdata endret ;2/24/2015 11:28:04 AM ;fakturatype-1
C;BTHANDLER ;Kommentar lagt til ;2/24/2015 11:28:04 AM ;matchSupplierAccount. match på konto.40013
C;BTHANDLER ;Kommentar lagt til ;2/24/2015 11:28:04 AM ;Endret flytstatus
C;BTHANDLER ;Kommentar lagt til ;2/24/2015 11:28:04 AM ;setCompName OK.
<...>
As you can see data is semicolon separated, first column is destination identifier, date and number separators are visible as well. Data has some crap text lines, but does not matter now. No external ID's or references are used. Invoice line import lines ("L") can be anywhere in the file, does not matter that now they are between header and comments. First column and some others are marked as FILLER in control file.
HTML to datafile
Will reveal some more cards for you. Data file was an old HTML file, it was missing end-tags, using a couple of self aspired tags, data formatting was also not very handy. Here is full source of my bash script used to prepare the file for reading.
#!/bin/sh
file="$1"
echo processing $file
echo converting to unicode
cat $file | iconv -f utf-16 -t utf-8 > "$file".out
echo done
echo HTML cleanup
less "$file".out | tr ',' '.' | sed 's/ //g' | sed 's/\cM//g' | sed 's/\cW//g' | sed 's/<\/TR>/<\/TR> /g' | sed 's| sed 's/ / \n/g' | sed ':a;N;$!ba;s|\n
sed 's/ / \n/g' > "$file".clean
echo cleanup complete
echo header and lines separation
./filter -t 2 -c 2 -f "$file".clean > "$file".tmp
./filter -t 3 -f "$file".clean > "$file".lines
./filter -t 4 -f "$file".clean > "$file".comments
echo done separating
echo transposing headers
cols=2; for((i=1;i<=$cols;i++)); do awk -F ";" 'BEGIN{ORS=";";} {print $'$i'}' "$file".tmp | tr '\n' ' '; echo; done > "$file".header
echo transposed
echo cleanup
rm "$file".out -rf
rm "$file".clean -rf
rm "$file".tmp -rf
echo cleaned up
turinio valdymo sistema (tvs) leidžia patiems redaguoti svetainės turinį, visos funkcijos greitos ir paprastos net mažai įgudusiam vartotojui. pagaminu, sukonfigūruoju, suprojektuoju visą sistemą, užpildau pradiniu turiniu.
modulinė tvs komplektuojama (bet neapsiriboja) iš šių modulių:
- naujienos / blogas
- puslapiai (su ir be komentarų)
- forumas
- automatiniai nuotraukų įrankiai
- krepšelis prekėms (e-parduotuvė)
- seo (standartinis modulis)
- galerijos
- skaidrės
- daugelio kalbų palaikymas
- paypal, paysera (e-parduotuvei)
internetinė aplikacija tai duomenų baze paremta sistema, kurios valdymas atliekamas per naršyklę. tokio tipo aplikacijos nereikalauja jokių papildomų priedų, konfigūracijos, jos yra centralizuotai keičiamos, taisomos, atnaujinamos. mūsų kuriamų aplikacijų pagrindas yra oracle duomenų bazė, ko pasekoje mes galima pasiūlyti galingas, lanksčias, norimo sudėtingumo lygio aplikacijas.
internetinė aplikacijos gali būti pritaikomos šiems sprendimams:
- apskaitos įrankiai (finansų, laiko, prekių, formos)
- internetinė parduotuvė
- interneto portalas
- duomenų surinkimo terminalas
parduotuvės galimybės:
- prekių katalogas
- sandėlis (prekių kiekių apskaita)
- užsakymų valdymas (rankinis pateikimas, vartotojų užsakymų vykdymas ir sekimas)
- nuolaidų katalogas
- atsiskaitymai paypal ir paysera (mokėjimai.lt)
atliekame serverio pradinį paruošimą, aptarnavimą ir monitoringą. konsultuojame serverio įsigijimo klausimais, padedame specifikuoti ir išsirinkti. specializuojamės linux, unix, windows sistemose.
linux redhat/centos/unbreakable/suse
- konfigūravimas
- pradinis diegimas
- soft raid
- monitoringas
- apsauga
- disko kodavimas
elektroninio pašto siuntimo funkcija visiems yra gerai žinoma, daugelis naudojasi interneto tiekėjų paslaugomis. siūlome sprendimą skirtą tiems, kurie nori būti tikri dėl visapusiško savo duomenų privatumo. serveris diegiamas kliento prieigoje arba pasirinktame nuomojamame serveryje. jame talpinamas e-pašto turinys, registras, papildomos apsaugos ar funkcijos:
- antivirusinė apsauga
- brukalo (spam) filtras ir automatizuotas valdymas (naikinti, grąžinti, kaupti vienoje dėžutėje)
- neribotas dėžučių skaičius
- prieinamumas visais žinomais protokolais: pop3, imap, www
- dinaminė ugniasienė
- kiti mechanizmai: postgrey, rbl
Mes kuriame internetines svetaines, elektronines parduotuves, intraneto aplikacijas. Taip pat prižiūrime serverius, juos diegiame. Galime įdiegti papildomus serverio servisus kaip e-pašto funkcija ar statistikos rinkimas.
Internetinės svetainės kuriamos naudojant modulinę turinio valdymo sistemą, Jums sukomplektuojamos reikalingos funkcijos. Svetainė yra automatiškai SEO optimizuota paieškoms ir nereikalauja jokios rankinės priežiūros. Jeigu reikia galima užpildyti ir turinį.
Įmonių aplikacijų pavyzdžiai: CRM, sandėlio valdymas, sąskaitų generavimas, laiko apskaita, intranetas, forumas, registracijos forma, klientų atsiliepimų ar problemų registravimo sistema. Galime suprogramuoti bet kokią jums reikalingą sistemą.