OK, this one is a bit geeked out again, but it’s relevant to China. If you’re an american, you could probably go your entire life without ever bumping into codepages, but if you’re life crosses paths with asia, you almost certainly will…
As we’re developing a new website,doing our subversion (version control system) check-in, I started bumping into a very unusual error.
e@116843:/spike/public/news/app/webroot/redv1.0/img/menu$ sudo svn up svn: Valid UTF-8 data (hex:) followed by invalid UTF-8 sequence (hex: b8 b4 bc fe)
Unfortunately, google didn’t come up with much. The best hit was a Oct 10th post on the subversion users mailing list. Basically, the answer is that there’s no answer.
Well, I did an svn up in each child directory of the one causing the problem and eventually tracked the error down through my project’s directory tree. It looks like one of the guys using a windows system copied a JPEG with a Chinese GBK encoded filename onto the server. Everything is best kept in UTF-8.
Once finding the right file, you have to figure out how to delete a file with a name that can’t be typed…
e@116843:/spike/public/news/app/webroot/redv1.0/img/menu$ ls logo02.jpg ???? logo.jpg menu_acc_down.jpg menu_home_down.jpg menu_work_down.jpg logo03.jpg logo.jpg menu_acc.jpg menu_home.jpg menu_work.jpg logo04.jpg logo_top1.jpg menu_cameras_down.jpg menu_len_down.jpg logo05.jpg logo_top2.jpg menu_cameras.jpg menu_len.jpg logo06.jpg logo_top3.jpg menu_gall_down.jpg menu_tech_down.jpg logo_bottom.jpg logo_top.jpg menu_gall.jpg menu_tech.jpg
In this case, I just used: rm *\ logo.jpg since there was only one file matching this pattern… Next, I could commit again!
e@116843:/spike$ sudo svn up D public/.htaccess Updated to revision 38.


May 4th, 2007 at 1:24 am
Thanks, that was just my problem!
(no other results suggested filename issues
)
I wrote a tiny script to enumerate though the directory, outputing the path then running ‘svn status’ on each one to find the culprit. (as found no way to get svn to output which folder it was about to try before doing it – so it would show before the ‘helpful’ error message)
May 4th, 2007 at 2:59 pm
Thanks! I also ran into this problem, and could not see anyone coming up with a solution. Actually thought the problem laid within the files – thus deleting the ones making trouble would fix the problem. Good thing I spotted your blog first
May 9th, 2007 at 2:27 pm
Thanks for this post. You saved me quite a bit of time.
June 29th, 2008 at 10:50 am
Thanks, you saved me a bit of time. You can run into problems importing invalid utf too, I was importing a WordPress sitemap plugin that gave me problems. The last filename before the error was the folder of files that were invalid.
Hope this helps some one.
October 1st, 2008 at 6:45 am
I’ve juste had this problem. I force UTF-8 encode to all the last file i change and it’s works !
November 3rd, 2008 at 11:11 am
Too bad you’ve got a lot of spam on here, but the answer here was perfect. Thanks.
November 3rd, 2008 at 11:33 am
I’ve just gone through and cleaned up some of the remaining SPAM, but I must admit that I have no clue where some of it comes from… To submit a comment you should be required to fill out the RE-CAPTSHA field, but it seems that some spammers have found a way around this. Since installing RE-CAPTSHA the SPAM has slowed down dramatically, but it does still arrive and the rate does seem to be increasing again.
Maybe there’s a backdoor in re-CAPTSHA?
November 11th, 2008 at 5:21 am
I ran into the same error, however I cannot find the file causing the error. I even now have the same error on all my repositories, even the ones that were not affected. I still don;t have a clue how to recover my repositories I already removed the malicious project from the affected repository using svnfilter with no effect. Is there a way to rebuild the repository form the dump and identfy the malicious directory or file???
Regards Eric
November 11th, 2008 at 5:26 am
@Eric-
Eric-
Your best bet is to go through each of the subdirectories of your project and do an “svn up” on the directory one at a time until you find the subdirectory(s) containing the file(s) that have an incompatible encoding.
Another useful tool is the Unix “file” command. Just run “file *” in the directory that you locate with the incompatible encoding and search for the file that comes back to you with an encoding type other than UTF-8.
Best of luck-
-R
November 11th, 2008 at 6:28 am
Hi Erwin,
I did the exercise as you suggested, but I don’t see the UTF-8 encoded types. The file * command gives me “ASCII C++ program text, with CRLF line terminators”, “XML document text” etc… but no UTF-8, besides when I ran into these trouble I removed the latest directory that created this error from my repository, my current repository contains only those parts that where there before the problem popped up, however the problem still exists. The problem even pop’s up on repositories that were never touched?? I switched to an other svn client, tried the commandline and even created a new repository to test, all of them give me the same error and now I’m totally puzzeld and stuck with a messed-up subversion installation.
this is the part of the logging from tortoisesvn where the error popped up for the first time.
Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\IBugTraqProvider.idl Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\IBugTraqProvider_i.c Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\IBugTraqProvider_h.h Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\Interop.BugTraqProvider.dll Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\issue-tracker-plugins.txt Error : Valid UTF-8 data Error : (hex:) Error : followed by invalid UTF-8 sequence Error : (hex: c0 a4 01) Finished! : 52 kBytes transferred in 0 minute(s) and 2 second(s)
I removed the directories and files added here using svnadmin dump and svnfilter, however in the new repository the error still persists (I removed most of the revisions that handled the above actions, but not all of them could be removed). What I cannot understand is, how this can affect a different repository?
November 11th, 2008 at 7:05 am
@Eric-
I understand your frustration buddy. Just trying to help. You’re not looking for files that ARE UTF-8. You’re looking for files that ARE NOT UTF-8.
In your case, just run: cd C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin svn up inc svn up … (where “…” is another folder under TortiseRedminePlugin)
You’ll notice that this will run successfully for most of your files, but there will be some that don’t complete successfully. When you locate the folder where the error starts, then you run “svn up” one file at a time until you find the individual file(s) that are causing the problem. Delete those files from the repository and you’ll be all fixed up.
As for the UTF-8 issue… Again – the key is to find files that are in encodings OTHER THAN UTF-8. You’ve got such a file – it’s just a matter of finding it.
November 11th, 2008 at 7:41 am
Hi Erwin,
Sorry if I sounded offending, It’s subversion that’s frustrating me, giving me no decent clue where to look. I appreciate your help very much. Thanks a lot ! I will go again through my files, but how about the project I removed from the repository, should I add it again to the repository to go though all the files in there ?
November 11th, 2008 at 12:30 pm
Finally I rebuild subversion with a patch that showed me the directory that was causing the trouble (found the patch at: http://www.nabble.com/-PATCH–Issue–2748:-non-UTF-8-filenames-in-the-repository-td19531299.html) however it didn’t help me yet. The error I get now is: Adding D:\Projects\t\TextDocument.txt Error Error converting entry in directory Error ‘/dtm/home/svn/svn/t/db/transactions/0-0.txn’ to UTF8 Error Valid UTF-8 data Error (hex:) Error followed by an invalid UTF-8 sequence Error (hex: c0 a4 01)
Now there are at least 2 things I don’t get 1: The dir points is a subversion repository transactionlog file not the project file I’m trying to import 2: This is a new repositories, I deleted all my old repositories and did a fresh import of a single text file edited with vi
And I still get the same error ???? I’m totally puzzeled if you have any clue,
Please….
Thankz Eric
February 11th, 2009 at 1:20 am
Yes, filenames with non-latin chars cause the problem. Deleting them fixes the problem.
March 7th, 2009 at 2:42 pm
strace svn status will give you the name of the offending file. unfortunately, svn care about name of files that are in one of its directories, even if it’s not under revision.
June 23rd, 2009 at 9:23 pm
Evdsande-
For some reason I never got notified about this comment.
The issue as reported above applies certainly to Mac OS X, Linux and other Unix systems, but I haven’t used Windows for more than a few minutes in the entire 21st Century, so I’m of limited help here.
The error does look the same though, so it is most likely one of the files under your “TortoiseRedminePlugin” folder. I would go through the sub-folders one at a time (first to “inc”) and then to the other sub-folders and do the commit one by one.
One of the files should have the encoding issue as described. Delete that file and the commit should proceed properly.
Best-
-R
August 2nd, 2009 at 11:57 am
Great post, I found it via Google. My problem was almost identical. An image got copied into a checked out tree, and svn update began failing due to a wacky character in the image name.
Thanks for taking the time to post this! It’s always a great feeling to find a solution to strange issues that actually work.
October 8th, 2009 at 12:43 pm
also happens for files that are not part of SVN!
my program spits out a log file each run — an error in the filename string made a bunch of log files with garbled file names. i got the same error as others (valid UTF-8 followed by invalid UTF-8), even though none of these log files were ever checked into the repo!
October 28th, 2009 at 1:50 am
Just joining in with the thanks!
Top result on Google for svn “followed by invalid UTF-8 sequence”, you should be proud
December 1st, 2009 at 4:20 am
Thanks for this post, this saved me some time. I just want to add my two cents. Sometimes finding the exact file that’s causing the problem is tough. I have a images directory with 2k files and one of these have this problem. [code] svn: Error converting entry in directory 'images/thumbnails' to UTF-8 svn: Valid UTF-8 data (hex: 4e 6f 6b 69 61 2d 35 35 33 30 2d) followed by invalid UTF-8 sequence (hex: 96 2d 58 70) [/code]
So this told me the directory was images/thumbnails. To find which file, i did:
[code] $ printf "\x4e\x6f\x6b\x69\x61\x2d\x35\x35\x33\x30\x2d\n" Nokia-5530- [/code]
So this told me the filename starts with Nokia-5530-
Hope this helps
Carlos
May 26th, 2010 at 9:37 pm
thanks, solved my problem
June 30th, 2010 at 7:19 am
If this can help somebody I wrote a quick article on how to convert the files without need to delete them: http://arjuna.deltoso.net/articles/subversion-messy-encoding-valid-utf-8-data-followed-by-invalid-utf-8-sequence/ in any case thanks Erwin for the time you spent writing your article that helped me to partly solve the problem. Arjuna
July 13th, 2010 at 9:06 pm
thx, solved my problem
August 20th, 2010 at 11:35 pm
All I did was find the revision that was giving me the error. Once I found it I saw in the comments that there were some unreadable characters like some genius cut and pasted from a word doc into the file. I replaced them with the correct characters and then every thing worked.
December 21st, 2010 at 11:45 am
Gr8!! That solved my problem. Thanks for the post!!
February 6th, 2011 at 3:33 pm
Thank You for sharing! It’s a pity that it’s short:)
March 19th, 2011 at 4:52 pm
This is your best post yet!
http://www.tages-creme.com/
March 31st, 2011 at 7:00 pm
best info… goooddd best web music dofollow higt PR visit my web
April 23rd, 2011 at 11:53 pm
Thanks for the info, I’ll watch out for collection bookmark again for good.
June 20th, 2011 at 6:39 am
That is some inspirational stuff. Never knew that opinions could be this varied. Thanks for all the enthusiasm to offer such helpful information here.
July 3rd, 2011 at 8:22 pm
You know, women think about things differently than men. You might want to consider that next time.
July 19th, 2011 at 2:57 am
Cool content.thank you.
July 25th, 2011 at 9:55 am
Good article i like it. yep!!
August 6th, 2011 at 3:04 am
I am no longer sure the place youˇ¦re getting your info, but great topic. I must spend a while finding out much more or figuring out more. Thank you for wonderful information I used to be on the lookout for this info for my mission.
August 9th, 2011 at 2:59 am
Quality content!! Thumps up!!
October 7th, 2011 at 4:38 am
You may have thought-about bringing large video on your sites keeping the targeted visitors way more interested? I am talking about A professional claimed car part of yours and it also seemed to be really very good think about I will be even more of an obvious scholar,I uncovered which to get additional valuable nicely tell me the actual way it calculates! I really like precisely what everyone will always be right up way too. These intelligent work and verifying! Proceed the truly great works guys I have forever additional everyone to help my blogroll. This is often a good document appreciate dealing with that educational facts.. Let me see your website routinely for many most up-to-date distribute.
November 6th, 2011 at 11:58 pm
mon site pour adulte entierement gratuit http://www.films-pornos-gratuit.com pour les amoureux de videos porno, amusez vous bien
February 2nd, 2012 at 1:12 am
Thank you man! You helped me and saved a lot of time! Really apreciate!
Best wishes to you!!!