Power tends to corrupt, and absolute power corrupts absolutely, but petty power corrupts all out of proportion to absolute power.

Valid UTF-8 data (hex:) followed by invalid UTF-8 sequence

Posted on 09 April 2007 by Erwin

OK, this one is a bit geeked out again, but it’s relevant to China. If you’re an american, you could probably go your entire life without ever bumping into codepages, but if you’re life crosses paths with asia, you almost certainly will…

As we’re developing a new website,doing our subversion (version control system) check-in, I started bumping into a very unusual error.

e@116843:/spike/public/news/app/webroot/redv1.0/img/menu$ sudo svn up svn: Valid UTF-8 data (hex:) followed by invalid UTF-8 sequence (hex: b8 b4 bc fe)

Unfortunately, google didn’t come up with much. The best hit was a Oct 10th post on the subversion users mailing list. Basically, the answer is that there’s no answer.

Well, I did an svn up in each child directory of the one causing the problem and eventually tracked the error down through my project’s directory tree. It looks like one of the guys using a windows system copied a JPEG with a Chinese GBK encoded filename onto the server. Everything is best kept in UTF-8.

Once finding the right file, you have to figure out how to delete a file with a name that can’t be typed…

e@116843:/spike/public/news/app/webroot/redv1.0/img/menu$ ls
logo02.jpg       ???? logo.jpg  menu_acc_down.jpg      menu_home_down.jpg  menu_work_down.jpg
logo03.jpg       logo.jpg       menu_acc.jpg           menu_home.jpg       menu_work.jpg
logo04.jpg       logo_top1.jpg  menu_cameras_down.jpg  menu_len_down.jpg
logo05.jpg       logo_top2.jpg  menu_cameras.jpg       menu_len.jpg
logo06.jpg       logo_top3.jpg  menu_gall_down.jpg     menu_tech_down.jpg
logo_bottom.jpg  logo_top.jpg   menu_gall.jpg          menu_tech.jpg

In this case, I just used: rm *\ logo.jpg since there was only one file matching this pattern… Next, I could commit again!

e@116843:/spike$ sudo svn up D public/.htaccess Updated to revision 38.

38 Comments For This Post

  1. Barry Hunter Says:

    Thanks, that was just my problem!

    (no other results suggested filename issues :( )

    I wrote a tiny script to enumerate though the directory, outputing the path then running ‘svn status’ on each one to find the culprit. (as found no way to get svn to output which folder it was about to try before doing it – so it would show before the ‘helpful’ error message)

  2. Carl-Erik Says:

    Thanks! I also ran into this problem, and could not see anyone coming up with a solution. Actually thought the problem laid within the files – thus deleting the ones making trouble would fix the problem. Good thing I spotted your blog first :-)

  3. Tom L Says:

    Thanks for this post. You saved me quite a bit of time.

  4. Owen Says:

    Thanks, you saved me a bit of time. You can run into problems importing invalid utf too, I was importing a WordPress sitemap plugin that gave me problems. The last filename before the error was the folder of files that were invalid.

    Hope this helps some one.

  5. Aurore Says:

    I’ve juste had this problem. I force UTF-8 encode to all the last file i change and it’s works !

  6. Brian Mark Says:

    Too bad you’ve got a lot of spam on here, but the answer here was perfect. Thanks.

  7. ryan Says:
    Too bad you’ve got a lot of spam on here, but the answer here was perfect. Thanks.

    I’ve just gone through and cleaned up some of the remaining SPAM, but I must admit that I have no clue where some of it comes from… To submit a comment you should be required to fill out the RE-CAPTSHA field, but it seems that some spammers have found a way around this. Since installing RE-CAPTSHA the SPAM has slowed down dramatically, but it does still arrive and the rate does seem to be increasing again.

    Maybe there’s a backdoor in re-CAPTSHA?

  8. evdsande Says:

    I ran into the same error, however I cannot find the file causing the error. I even now have the same error on all my repositories, even the ones that were not affected. I still don;t have a clue how to recover my repositories I already removed the malicious project from the affected repository using svnfilter with no effect. Is there a way to rebuild the repository form the dump and identfy the malicious directory or file???

    Regards Eric

  9. ryan Says:

    @Eric-

    Eric-

    Your best bet is to go through each of the subdirectories of your project and do an “svn up” on the directory one at a time until you find the subdirectory(s) containing the file(s) that have an incompatible encoding.

    Another useful tool is the Unix “file” command. Just run “file *” in the directory that you locate with the incompatible encoding and search for the file that comes back to you with an encoding type other than UTF-8.

    Best of luck-

    -R

  10. evdsande Says:

    Hi Erwin,

    I did the exercise as you suggested, but I don’t see the UTF-8 encoded types. The file * command gives me “ASCII C++ program text, with CRLF line terminators”, “XML document text” etc… but no UTF-8, besides when I ran into these trouble I removed the latest directory that created this error from my repository, my current repository contains only those parts that where there before the problem popped up, however the problem still exists. The problem even pop’s up on repositories that were never touched?? I switched to an other svn client, tried the commandline and even created a new repository to test, all of them give me the same error and now I’m totally puzzeld and stuck with a messed-up subversion installation.

    this is the part of the logging from tortoisesvn where the error popped up for the first time.

    Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\IBugTraqProvider.idl Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\IBugTraqProvider_i.c Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\IBugTraqProvider_h.h Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\Interop.BugTraqProvider.dll Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\issue-tracker-plugins.txt Error : Valid UTF-8 data Error : (hex:) Error : followed by invalid UTF-8 sequence Error : (hex: c0 a4 01) Finished! : 52 kBytes transferred in 0 minute(s) and 2 second(s)

    I removed the directories and files added here using svnadmin dump and svnfilter, however in the new repository the error still persists (I removed most of the revisions that handled the above actions, but not all of them could be removed). What I cannot understand is, how this can affect a different repository?

  11. ryan Says:

    @Eric-

    I understand your frustration buddy. Just trying to help. You’re not looking for files that ARE UTF-8. You’re looking for files that ARE NOT UTF-8.

    In your case, just run: cd C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin svn up inc svn up … (where “…” is another folder under TortiseRedminePlugin)

    You’ll notice that this will run successfully for most of your files, but there will be some that don’t complete successfully. When you locate the folder where the error starts, then you run “svn up” one file at a time until you find the individual file(s) that are causing the problem. Delete those files from the repository and you’ll be all fixed up.

    As for the UTF-8 issue… Again – the key is to find files that are in encodings OTHER THAN UTF-8. You’ve got such a file – it’s just a matter of finding it.

  12. evdsande Says:

    Hi Erwin,

    Sorry if I sounded offending, It’s subversion that’s frustrating me, giving me no decent clue where to look. I appreciate your help very much. Thanks a lot ! I will go again through my files, but how about the project I removed from the repository, should I add it again to the repository to go though all the files in there ?

  13. evdsande Says:

    Finally I rebuild subversion with a patch that showed me the directory that was causing the trouble (found the patch at: http://www.nabble.com/-PATCH–Issue–2748:-non-UTF-8-filenames-in-the-repository-td19531299.html) however it didn’t help me yet. The error I get now is: Adding D:\Projects\t\TextDocument.txt Error Error converting entry in directory Error ‘/dtm/home/svn/svn/t/db/transactions/0-0.txn’ to UTF8 Error Valid UTF-8 data Error (hex:) Error followed by an invalid UTF-8 sequence Error (hex: c0 a4 01)

    Now there are at least 2 things I don’t get 1: The dir points is a subversion repository transactionlog file not the project file I’m trying to import 2: This is a new repositories, I deleted all my old repositories and did a fresh import of a single text file edited with vi

    And I still get the same error ???? I’m totally puzzeled if you have any clue,

    Please….

    Thankz Eric

  14. Agris Says:

    Yes, filenames with non-latin chars cause the problem. Deleting them fixes the problem.

  15. cooper Says:

    strace svn status will give you the name of the offending file. unfortunately, svn care about name of files that are in one of its directories, even if it’s not under revision.

  16. ryan Says:

    Evdsande-

    For some reason I never got notified about this comment.

    The issue as reported above applies certainly to Mac OS X, Linux and other Unix systems, but I haven’t used Windows for more than a few minutes in the entire 21st Century, so I’m of limited help here.

    The error does look the same though, so it is most likely one of the files under your “TortoiseRedminePlugin” folder. I would go through the sub-folders one at a time (first to “inc”) and then to the other sub-folders and do the commit one by one.

    One of the files should have the encoding issue as described. Delete that file and the commit should proceed properly.

    Best-

    -R

  17. zombat Says:

    Great post, I found it via Google. My problem was almost identical. An image got copied into a checked out tree, and svn update began failing due to a wacky character in the image name.

    Thanks for taking the time to post this! It’s always a great feeling to find a solution to strange issues that actually work.

  18. jo Says:

    also happens for files that are not part of SVN!

    my program spits out a log file each run — an error in the filename string made a bunch of log files with garbled file names. i got the same error as others (valid UTF-8 followed by invalid UTF-8), even though none of these log files were ever checked into the repo!

  19. Dom Says:

    Just joining in with the thanks!

    Top result on Google for svn “followed by invalid UTF-8 sequence”, you should be proud :)

  20. Carlos Barros Says:

    Thanks for this post, this saved me some time. I just want to add my two cents. Sometimes finding the exact file that’s causing the problem is tough. I have a images directory with 2k files and one of these have this problem. [code] svn: Error converting entry in directory 'images/thumbnails' to UTF-8 svn: Valid UTF-8 data (hex: 4e 6f 6b 69 61 2d 35 35 33 30 2d) followed by invalid UTF-8 sequence (hex: 96 2d 58 70) [/code]

    So this told me the directory was images/thumbnails. To find which file, i did:

    [code] $ printf "\x4e\x6f\x6b\x69\x61\x2d\x35\x35\x33\x30\x2d\n" Nokia-5530- [/code]

    So this told me the filename starts with Nokia-5530- :)

    Hope this helps

    Carlos

  21. Joni Says:

    thanks, solved my problem :)

  22. Arjuna Says:

    If this can help somebody I wrote a quick article on how to convert the files without need to delete them: http://arjuna.deltoso.net/articles/subversion-messy-encoding-valid-utf-8-data-followed-by-invalid-utf-8-sequence/ in any case thanks Erwin for the time you spent writing your article that helped me to partly solve the problem. Arjuna

  23. GloW Says:

    thx, solved my problem

  24. vadlib Says:

    All I did was find the revision that was giving me the error. Once I found it I saw in the comments that there were some unreadable characters like some genius cut and pasted from a word doc into the file. I replaced them with the correct characters and then every thing worked.

  25. Ninad Says:

    Gr8!! That solved my problem. Thanks for the post!!

  26. Wyposażenie Łazienek Says:

    Thank You for sharing! It’s a pity that it’s short:)

  27. tagescreme Says:

    This is your best post yet!

    http://www.tages-creme.com/

  28. very Says:

    best info… goooddd best web music dofollow higt PR visit my web

  29. รับทำเว็บ Says:

    Thanks for the info, I’ll watch out for collection bookmark again for good.

  30. hollywood celebrity videos Says:

    That is some inspirational stuff. Never knew that opinions could be this varied. Thanks for all the enthusiasm to offer such helpful information here.

  31. tread climber 3000 Says:

    You know, women think about things differently than men. You might want to consider that next time.

  32. ติดแก๊ส Says:

    Cool content.thank you.

  33. roll up Says:

    Good article i like it. yep!!

  34. wodzirej Says:

    I am no longer sure the place youˇ¦re getting your info, but great topic. I must spend a while finding out much more or figuring out more. Thank you for wonderful information I used to be on the lookout for this info for my mission.

  35. ติดแก๊ส Says:

    Quality content!! Thumps up!!

  36. Volkl Gotama Skis Says:

    You may have thought-about bringing large video on your sites keeping the targeted visitors way more interested? I am talking about A professional claimed car part of yours and it also seemed to be really very good think about I will be even more of an obvious scholar,I uncovered which to get additional valuable nicely tell me the actual way it calculates! I really like precisely what everyone will always be right up way too. These intelligent work and verifying! Proceed the truly great works guys I have forever additional everyone to help my blogroll. This is often a good document appreciate dealing with that educational facts.. Let me see your website routinely for many most up-to-date distribute.

  37. video rousse Says:

    mon site pour adulte entierement gratuit http://www.films-pornos-gratuit.com pour les amoureux de videos porno, amusez vous bien :-)

  38. Stan Says:

    Thank you man! You helped me and saved a lot of time! Really apreciate!

    Best wishes to you!!!

3 Trackbacks For This Post

  1. PDF: Поддержка UTF-8 в fpdf / zavackiy.info Says:

    [...] ErwinErwin.com » Valid UTF-8 data (hex:) followed by invalid UTF-8 … [...]

  2. Rosacea Treatment ? Self Diagnosis Followed by Consultation | Rosacea Support Says:

    [...] Valid UTF-8 data (hex:) followed by invalid UTF-8 sequence | ERWIN.co [...]

  3. CheapOair Coupon Code | Her Shopping Deals.com Says:

    [...] Valid UTF-8 data (hex:) followed by invalid UTF-8 sequence | ERWIN.co [...]

Leave a Reply