DbUnit Export corrupting UTF-8

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

DbUnit Export corrupting UTF-8

golfingal72
I am trying to use the dbunit export ant task to extract data from a mysql database and am not able to get the UTF-8 encoded data out correctly.  

A snippet hex dump of the data in the database:  there is decomposed unicode.. note: 'ef b8 a0'

 4LV-QM2A-FDE - modifyBibMetadata title
Biblioteki i assot︠
42 69 62 6c 69 6f 74 65 6b 69 20 69 20 61 73 73 6f 74 ef b8 a0
s︡iat︠s︡ii v me
73 ef b8 a1 69 61 74 ef b8 a0 73 ef b8 a1 69 69 20 76 20 6d 65
ni︠a︡i︠u︡shch
6e 69 ef b8 a0 61 ef b8 a1 69 ef b8 a0 75 ef b8 a1 73 68 63 68
emsi︠a︡ mire : no
65 6d 73 69 ef b8 a0 61 ef b8 a1 20 6d 69 72 65 20 3a 20 6e 6f

the ant commands:
<?xml version="1.0" encoding="UTF-8"?>
<project name="extract-data"
        xmlns:dbunit="antlib:org.dbunit">
        <target name="extract-circ">
           
            <dbunit:dbunit driver="com.mysql.jdbc.Driver"
                url="jdbc:mysql://${dbhostandport}/${dbschema}?useUnicode=true&amp;characterEncoding=UTF8&amp;autoReconnect=true&amp;serverTimezone=UTC&amp;us
eLegacyDatetimeCode=false"
                userid="${username}"
                password="${password}">
                <dbconfig>
                    <property name="datatypeFactory">org.dbunit.ext.mysql.MySqlDataTypeFactory</property>
                </dbconfig>
                <export dest="utfbib.dtd" format="dtd" encoding="UTF-8"/>
                <export dest="utfbib.xml" format="flat" doctype="utfbib.dtd" encoding=”UTF-8”>
                   <query name="FOO" sql="SELECT * from bib_summary where oclc_number = 38214897"/>
                 </export>

            </dbunit:dbunit>
        </target>
</project>

hexdump snippet of utfbib.xml:  note above data is now: 'c3 af c2'

0  39 37 22 20 74 69 74 6c  65 3d 22 42 69 62 6c 69  |97" title="Bibli|
00000100  6f 74 65 6b 69 20 69 20  61 73 73 6f 74 c3 af c2  |oteki i assot...|
00000110  b8 c2 a0 73 c3 af c2 b8  c2 a1 69 61 74 c3 af c2  |...s......iat...|
120 b8 c2 a0 73 c3 af c2 b8  c2 a1 69 69 20 76 20 6d  |...s......ii v m|

any ideas?

thanks so much...


Reply | Threaded
Open this post in threaded view
|

DbUnit Export corrupting UTF-8

golfingal72
Retrying after subscribing to the list!

-----Original Message-----
From: Cox,Lisa
Sent: Wednesday, June 16, 2010 8:32 AM
To: [hidden email]
Subject: [dbunit-user] DbUnit Export corrupting UTF-8


I am trying to use the dbunit export ant task to extract data from a mysql
database and am not able to get the UTF-8 encoded data out correctly.  

A snippet hex dump of the data in the database:  there is decomposed
unicode.. note: 'ef b8 a0'

 4LV-QM2A-FDE - modifyBibMetadata title
Biblioteki i assot︠
42 69 62 6c 69 6f 74 65 6b 69 20 69 20 61 73 73 6f 74 ef b8 a0
s︡iat︠s︡ii v me
73 ef b8 a1 69 61 74 ef b8 a0 73 ef b8 a1 69 69 20 76 20 6d 65
ni︠a︡i︠u︡shch
6e 69 ef b8 a0 61 ef b8 a1 69 ef b8 a0 75 ef b8 a1 73 68 63 68
emsi︠a︡ mire : no
65 6d 73 69 ef b8 a0 61 ef b8 a1 20 6d 69 72 65 20 3a 20 6e 6f

the ant commands:
<?xml version="1.0" encoding="UTF-8"?>
<project name="extract-data"
        xmlns:dbunit="antlib:org.dbunit">
        <target name="extract-circ">
            <!-- extract the dats in dbunit format -->
            <dbunit:dbunit driver="com.mysql.jdbc.Driver"
               
url="jdbc:mysql://${dbhostandport}/${dbschema}?useUnicode=true&amp;characterEncoding=UTF8&amp;autoReconnect=true&amp;serverTimezone=UTC&amp;us
eLegacyDatetimeCode=false"
                userid="${username}"
                password="${password}">
                <dbconfig>
                    <property
name="datatypeFactory">org.dbunit.ext.mysql.MySqlDataTypeFactory</property>
                </dbconfig>
                <export dest="utfbib.dtd" format="dtd" encoding="UTF-8"/>
                <export dest="utfbib.xml" format="flat" doctype="utfbib.dtd"
encoding="UTF-8">
                   <query name="FOO" sql="SELECT * from bib_summary where
oclc_number = 38214897"/>
                 </export>

            </dbunit:dbunit>
        </target>
</project>

hexdump snippet of utfbib.xml:  note above data is now: 'c3 af c2'

0  39 37 22 20 74 69 74 6c  65 3d 22 42 69 62 6c 69  |97" title="Bibli|
00000100  6f 74 65 6b 69 20 69 20  61 73 73 6f 74 c3 af c2  |oteki i
assot...|
00000110  b8 c2 a0 73 c3 af c2 b8  c2 a1 69 61 74 c3 af c2
|...s......iat...|
120 b8 c2 a0 73 c3 af c2 b8  c2 a1 69 69 20 76 20 6d  |...s......ii v m|

any ideas?

thanks so much...



--
View this message in context: http://old.nabble.com/DbUnit-Export-corrupting-UTF-8-tp28902248p28902248.html
Sent from the DBUnit - Users mailing list archive at Nabble.com.




------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
dbunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dbunit-user
Reply | Threaded
Open this post in threaded view
|

Re: DbUnit Export corrupting UTF-8

Matthias Gommeringer
Hi Lisa,
ok, I'll give it a try, just guessing.
You're using the FlatXmlDataSet for export so I assume that your text is written into XML attributes. It might be worth a try to use the default XMLDataSet with a CDATA section http://www.dbunit.org/components.html
Another thing I`m not sure about is whether your db url is correct. looking at http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html I see that UTF-8 is used (note the minus symbol) - not sure whether this could have any effect.

so sorry for not having a concise answer but maybe this may help...

rgds,
matthias

-----Ursprüngliche Nachricht-----
Von: "Cox,Lisa" <[hidden email]>
Gesendet: 16.06.2010 14:55:34
An: [hidden email]
Betreff: [dbunit-user] DbUnit Export corrupting UTF-8

>Retrying after subscribing to the list!
>
>-----Original Message-----
>From: Cox,Lisa
>Sent: Wednesday, June 16, 2010 8:32 AM
>To: [hidden email]
>Subject: [dbunit-user] DbUnit Export corrupting UTF-8
>
>
>I am trying to use the dbunit export ant task to extract data from a mysql
>database and am not able to get the UTF-8 encoded data out correctly.  
>
>A snippet hex dump of the data in the database:  there is decomposed
>unicode.. note: 'ef b8 a0'
>
> 4LV-QM2A-FDE - modifyBibMetadata title
>Biblioteki i assot︠
>42 69 62 6c 69 6f 74 65 6b 69 20 69 20 61 73 73 6f 74 ef b8 a0
>s︡iat︠s︡ii v me
>73 ef b8 a1 69 61 74 ef b8 a0 73 ef b8 a1 69 69 20 76 20 6d 65
>ni︠a︡i︠u︡shch
>6e 69 ef b8 a0 61 ef b8 a1 69 ef b8 a0 75 ef b8 a1 73 68 63 68
>emsi︠a︡ mire : no
>65 6d 73 69 ef b8 a0 61 ef b8 a1 20 6d 69 72 65 20 3a 20 6e 6f
>
>the ant commands:
><?xml version="1.0" encoding="UTF-8"?>
><project name="extract-data"
>        xmlns:dbunit="antlib:org.dbunit">
>        <target name="extract-circ">
>            <!-- extract the dats in dbunit format -->
>            <dbunit:dbunit driver="com.mysql.jdbc.Driver"
>              
>url="jdbc:mysql://${dbhostandport}/${dbschema}?useUnicode=true&characterEncoding=UTF8&autoReconnect=true&serverTimezone=UTC&us
>eLegacyDatetimeCode=false"
>                userid="${username}"
>                password="${password}">
>                <dbconfig>
>                    <property
>name="datatypeFactory">org.dbunit.ext.mysql.MySqlDataTypeFactory</property>
>                </dbconfig>
>                <export dest="utfbib.dtd" format="dtd" encoding="UTF-8"/>
>                <export dest="utfbib.xml" format="flat" doctype="utfbib.dtd"
>encoding="UTF-8">
>                   <query name="FOO" sql="SELECT * from bib_summary where
>oclc_number = 38214897"/>
>                 </export>
>
>            </dbunit:dbunit>
>        </target>
></project>
>
>hexdump snippet of utfbib.xml:  note above data is now: 'c3 af c2'
>
>0  39 37 22 20 74 69 74 6c  65 3d 22 42 69 62 6c 69  |97" title="Bibli|
>00000100  6f 74 65 6b 69 20 69 20  61 73 73 6f 74 c3 af c2  |oteki i
>assot...|
>00000110  b8 c2 a0 73 c3 af c2 b8  c2 a1 69 61 74 c3 af c2
>|...s......iat...|
>120 b8 c2 a0 73 c3 af c2 b8  c2 a1 69 69 20 76 20 6d  |...s......ii v m|
>
>any ideas?
>
>thanks so much...
>
>
>
>--
>View this message in context: http://old.nabble.com/DbUnit-Export-corrupting-UTF-8-tp28902248p28902248.html
>Sent from the DBUnit - Users mailing list archive at Nabble.com.
>
>
>
>
>------------------------------------------------------------------------------
>ThinkGeek and WIRED's GeekDad team up for the Ultimate
>GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
>lucky parental unit.  See the prize list and enter to win:
>http://p.sf.net/sfu/thinkgeek-promo
>_______________________________________________
>dbunit-user mailing list
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/dbunit-user
___________________________________________________________
GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
dbunit-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dbunit-user