Dovolujeme si Vas informovat, ze byly zverejneny 4 nove
korpusy:
- korpus neformalni mluvene cestiny *ORAL2013* o celkove velikosti 2,79 mil. slov, ktery koncepcne navazuje na predchozi mluvene korpusy, ovsem s nekolika podstatnymi vylepsenimi: zejmena jde o pokryti uzemi cele CR (vcetne Moravy a Slezska) a propojeni prepisu se zvukovou stopou, ktere umoznuje prehravat realizaci kazdeho vyrazu na strance s konkordancemi;
- korpus psane publicistiky *SYN2013PUB* o celkove velikosti 935 mil. slov obsahujici 44 ruznych publicistickych titulu z let 2005-2009;
- srovnatelny korpus *JEROME* sestaveny pro zkoumani prekladove cestiny v porovnani s cestinou neprekladovou;
- korpus *lEstRepublicain* slozeny ze 3 rocniku francouzskeho regionalniho deniku L'Est Republicain o celkove velikosti 120 milionu slov.
Podrobnejsi informace o vsech zverejnenych korpusech najdete na adrese
http://korpus.cz/struktura.phpV teto souvislosti bychom chteli uzivatele znovu upozornit, aby zacali namisto puvodniho Bonita 1 pouzivat *webove rozhrani* na adrese
http://korpus.cz/corpora .
Webove rozhrani nabizi radu moznosti a funkci, ktere v puvodnim Bonitu chybeji, v nejblizsi dobe se navic objevi jeho dalsi vylepseni. Naproti tomu v Bonitu neni mozna prace se zvukem v korpusu
ORAL2013 a rada korpusu v nem z technickych duvodu neni vubec pristupna (napr. vsechny korpusy paralelni nebo novy korpus lEstRepublicain).
Dalsi upozorneni se tyka nereferencniho korpusu *SYN*: brzy po Novem roce bude zverejenena jeho nova *verze 3*, ktera bude zpracovana nejnovejsi lemmatizaci a morfologickym znackovanim a ktera v sobe bude zahrnovat take data korpusu SYN2013PUB (celkova velikost korpusu SYN tak prekroci 2,2 mld.
slov). Tato nova verze se objevi *namisto* stavajici verze 2; pokud byste meli zajem o zachovani pristupu k verzi 2 i v budoucnu, napiste prosim na nize uvedeny e-mail.
Prijemne proziti Vanoc a vsechno nejlepsi do Noveho roku Vam za cele UCNK preje
Michal Kren--------------------
Dear all,
we would like to inform you that 4 new corpora have been made publicly available today:
- corpus of informal spoken Czech *ORAL2013* sized 2,79 mil. words; its design is based on previous spoken corpora, but it also includes some significant improvements: regional coverage of the whole of the Czech Republic (including Moravia and Silesia) and transcriptions aligned with audio, so that it is possible to hear actual realization of every expression on the concordance page;
- newspaper corpus *SYN2013PUB* sized 935 mil. words; it contains 44 different titles from 2005-2009;
- comparable corpus *JEROME* specifically designed for analyzing translated Czech;
- corpus *lEstRepublicain* consisting of 3 volumes of French regional newspaper L'Est Republicain sized 120 mil. words.
Detailed information about all published corpora can be found at
http://korpus.cz/english/struktura.phpOnce again, we would like to encourage our users to use *web interface* at
http://korpus.cz/corpora instead of retiring Bonito 1. The web interface offers a number of possibilities and functionality missing in the original Bonito, with new improvements underway. Bonito does not support playback of audio in ORAL2013 and a number of corpora are not available there for technical reasons (e.g. all parallel corpora or the new lEstRepublicain corpus).
Another notice relates to non-reference corpus *SYN*: its new *version 3* will be published shortly after the New Year. It will be processed by the newest lemmatisation and morphological tagging and it will also include data from SYN2013PUB (the total size of SYN will thus exceed 2,200 mil.
words). Version 3 will *replace* the version 2 that is currently available; in case you would like to retain access to version 2, please send an e-mail to the address below.
Wishing you a Merry Christmas and a Happy New Year
Michal Kren On behalf of the ICNC