Many websites now use a large number of pictures, and pictures are the main amount of data in web page transmission, and are also the main factors affecting website performance.Therefore, many websites will separate the picture storage from the websites. In addition, one or more servers will be constructed to store the pictures and put the pictures into a virtual directory. The pictures on the web pages will all use a URL address to point to the addresses of the pictures on these servers. In this way, the performance of the websites will be obviously improved, and the concept of ImageServer will come into being.
1.1 Advantages of Picture Server
1, sharing the I/O load of the Web server-separating resource-consuming picture services to improve the performance and stability of the server.
2, which can specially optimize the picture server-set up a targeted caching scheme for the picture service, reduce bandwidth cost and improve access speed.
3, improve the scalability of the website-through the addition of picture servers, improve the image throughput.
1.2 Precautions for Picture Server
1, select suitable physical media and file system for picture storage
2, using physically independent servers
3, if you have more than one picture server, you should consider the picture synchronization between the servers.
4, use a separate domain name
5, formulate reasonable cache strategy
6, using the picture processing module to reprocess the picture uploaded by the user
1.3 Picture Server Architecture
Pictures are an essential part of the website. With the continuous development of the website, the processing of pictures will also be improved with the increase of visits and the increase of pictures. At the beginning of the website, everything will be simplified. The location of the pictures will usually be in the Images folder under the site.
With the increase of access and IIS pressure, we began to split the picture folder as a separate site, such as http://images. *. com/(may be split into multiple picture servers according to needs, which are related to specific business environment). After splitting, we can well share the pressure of a single IIS application pool to two or more, greatly improving the access bottleneck.With the further increase of access, the server pressure has been unable to support, at this time we need to image site as a separate server.In the process of accessing pictures, we may face the requirement that a picture has multiple picture sizes. In the early stage, we usually save the pictures of various sizes we need in the process of saving the page. However, with the different sizes required, more and more pictures need to be saved. How do we deal with this?
Concurrent access to IIS servers means that with the further increase of users, our single picture server is no longer sufficient. How can we further expand at this time?
As shown in the above figure, we can make a unified solution to these two problems at this time, adding squid cache server at the front end and adding one or more dynamic graph cutting servers.Squid or Nginx proxy cache server can greatly improve the concurrent access of the picture system and make the system break through the existing restrictions.The main function of the dynamic image-cutting server is to obtain the original image for accessing different sizes of images and temporarily generate and return the images that meet the requirements.The storage area of the original image can be placed with the picture service or the picture can be placed on a separate server.
In this structure, the maximum concurrent access restriction will be the bottleneck of squid or other proxy servers. When the pressure of graph cutting service increases, only the corresponding graph cutting server needs to be added, and the growth of image storage area can also be solved by adding hard disks or servers.
If your site traffic continues to grow, squid's access bottleneck is about to be broken through, then what should we do?
As shown in the above figure, multiple Squid or Nginx servers are used to add F5 or LVS load balancing at the front end (cache function can also be turned on at the same time).At this time, the concurrency of access will be greatly increased, and the server can be provisioned at any time according to the situation.Of course, there is also a certain flaw at this time, that is, there may be the same picture on multiple Squids, because when accessing the picture, it may be assigned to squid1 for the first time and squid2 or something else for the second time after F5 expires. Of course, this small amount of redundancy is completely within our allowed range with respect to the solution of concurrent problems.After doing a lot of work, if the conditions allow CDN to be set up for the picture server, it will greatly improve the picture access quality of your site.
1.4 Picture Storage Architecture
1.4.1 Necessity of Deploying Independent Picture Server
We know that for Apache or IIS, pictures are always the most system resource consuming. If picture service and application service are placed on the same server, the application server will easily collapse due to the high I/O load of pictures. Therefore, for some large website projects, it is necessary to separate the picture server from the application server.Deploying independent picture servers (or even server clusters) is the most basic solution for large-scale website picture storage, because with independent picture servers, we can make more targeted performance optimization for picture servers. For example, from the perspective of hardware, picture servers can be configured with high-end hard disks, 7200 turns are replaced with 15000 turns, while Central Processor only needs to do so in general.From a software perspective, a special file system can be configured for the picture server to meet the I/O requests for pictures, such as TFS of Taobao, which solves the I/O nightmare caused by large-scale and small picture files. At the same time, we can also use nginx and squid to proxy the picture requests, etc.
1.4.2 Independent Domain Name
Note that this refers to an independent domain name, not a subdomain. For example, the yahoo.com picture server uses the domain name of yimg.com instead of Second-level domain img.yahoo.com. Why is this?I think the main reasons are as follows:
1. The concurrent connections of browsers under the same domain name are limited, generally between 2 and 6. The following figure lists the concurrent connections of each browser (the following figure is for reference)
In this way, if we configure an independent domain name for the picture server, we can break through the limit of browser connections when loading pictures in a page. In theory, adding an independent domain name doubles the number of concurrent connections.
2, unfavorable to cache due to cookie
For example, if there is a picture http://www.test.com/img/xx.gif, then when we make a request to it, we will bring the cookie under the domain name of www.test.com. As most web cache only caches the request without cookie, this will result in that every picture request cannot hit the cache, but we still have to go to the original server to get the picture, which makes the picture cache meaningless.Therefore, it is better to create a separate domain name for pictures. Of course, not only pictures but also css and js files can be created according to this idea.
3, convenient CDN synchronization
1.4.3 How to upload and synchronize pictures after the picture server is separated
Of course, everything has two sides. The separation of the picture server improves the efficiency of picture access and greatly alleviates the I/O bottleneck caused by the pictures on the server. However, the uploading and synchronization of pictures has become a big problem after separation.Let's talk about some solutions to my personal thoughts.
1, NFS sharing method
NFS sharing is the simplest and most practical way if you don't want to synchronize all pictures on each picture server.NFS is a distributed client/server file system. The essence of NFS lies in the sharing of computers among users. Users can connect to the shared computer and access the files on the shared computer just like accessing the local hard disk.
The specific implementation idea is: the web server mounts the directories export by multiple picture servers through nfs. The user uploads the pictures to the web server first, and then copies the uploaded pictures to the mount directory through the program, so that the several picture servers can also access the newly uploaded pictures (note that only the pictures are shared but not actually copied to the picture servers).Then bind separate domain names to those image servers, so that the browser can use the separate domain names to access the images.This method basically has no delay caused by synchronization, but it needs to rely on nfs. The suspension of nfs will affect the web server.The following figure
As for how to configure nfs, let's google it or look at this article. It is to configure NFS http://blog.csdn.net/lixinso/article/details/6639643 under Linux.
2, using FTP synchronization
Unlike nfs above, users use ftp to synchronize to each picture server after uploading pictures. php, java and asp.net can basically operate ftp.In this way, each picture server will keep a copy of the picture and also play a backup role.However, the disadvantage is that ftp images to the server is time consuming, and there will be delay if asynchronous desynchronization occurs. However, ordinary small image files are also fine.
Analysis of URL HASH Architecture of2 Picture Server
2.1 What is url hash Architecture
url hash architecture performs a hash algorithm on the url, and then finds the corresponding server through the hash result.Because the hash result for a single url is the same, theoretically this url will be permanently assigned to a fixed server.In addition, due to the hash algorithm, the url allocation is very uniform, and the traffic volume can be balanced at the same time.
2.2 Why Use url hash Architecture
1, the picture server is characterized by a large number of visits and a large capacity. Through simple load balancing, the problem of large visits can be solved, but the capacity problem has not improved.Therefore, disaster tolerance will be caused.
2, Disaster Recovery Problem: The data accessed in a certain period of time in the system seriously exceeds the capacity of the smallest single machine in the cache cluster, which will cause disaster recovery. Disaster recovery will cause a large number of single links to penetrate and directly affect the IO performance in the background.
3, although disaster tolerance can be solved by increasing the configuration of cache capacity, the memory is always limited, and the cost of adding extra large memory for each machine is also very high. In addition, it is not appropriate to configure a large disk cache in squid, otherwise the hash table in squid will be very large and the performance will be very poor.
4, through hash architecture, the memory of the cache cluster can be fully utilized, and the disaster tolerance problem no longer depends on the capacity of the smallest single machine in the cache cluster, but is the sum of the capacity of all machines in the cache cluster.
2.3 various url hash architectures
1) dns-based hash architecture.
2) Automatic hash Architecture Based on nginx.
3) Manual hash Architecture Based on nginx.
2.3.1 dns-based hash Architecture
dns hash Architecture Diagram
dns hash Architecture Description
This architecture is suitable for user-oriented picture systems, such as uploading pictures in forums, photo albums and blogs.Only in this way can it ensure that file names have consistent specifications.
This architecture diagram is divided into 36 domain names. The file name of the picture starts with md5 value. Taking a letter in md5 value can indicate which domain name inside it is in, and the domain name corresponds to the machine. When uploading and distributing, it is also distributed according to this letter.
dns hash Architecture's Advantages and Disadvantages
1) dns streaming is used, which is low in cost and high in dns performance without maintenance.
2) can break through IE's default limit of 2 threads per host.
1) In terms of availability, if one machine goes down, the request to this machine cannot be read.
2) in terms of shunting, only all can be synchronized, and the cost is relatively high.
3) is only applicable to user-oriented systems
2.3.2 Automatic Manual hash Architecture Based on nginx
Automatic hash Architecture Diagram of Nginx
Automatic hash Architecture Description of Nginx
1, this is a new cache architecture, with nginx as the frontmost agent to the cache machine.
2, nginx is followed by cache group, and nginx divides the request into cache machines after url hash.
3, this architecture is convenient for upgrading pure squid cache, and nginx can be installed on squid machines.
4, nginx has the function of caching, which can cache some links with heavy traffic directly on nginx without having to go through one more proxy request.For example, FAVorites favicon.ico and the logo of the website.
nginx's Automatic hash Architecture Advantages and Disadvantages
1) High Performance
2) easy to use, the background is what kind of relationship is not big
3) has high availability
4) Cache architecture, easy to split
5) Can be directly linked to the proxy cache in nginx
url diffluence is weak in controllability. Increasing or decreasing cache machines will cause cache reallocation, which means that all caches are invalid.
Manual hash Architecture Description of Nginx
1, the architecture diagram is the same as that of the automatic hash. The only difference is the change of hash algorithm. The automatic hash is realized by using the hash algorithm provided by the nginx upstream hash module. This manual architecture is realized by designing an algorithm of its own.
2, the algorithm design idea is to take a character from the url to be used as the basis for shunting, for example, to define the penultimate character of the link to be shunted, which can be equally distributed.
3, manual architecture can avoid cache invalidation caused by adding or subtracting machines in automatic architecture, and can also know exactly which cache a link exists on.
Advantages and Disadvantages of Nginx's Manual hash Architecture
1) Can basically inherit the advantages of automatic architecture
2) Avoid adding or subtracting machines
3) Know exactly on which cache the link is stored
configuration is relatively complex, and it is not easy to distribute it evenly.
Optimizing bbs Architecture with Hash Architecture
1. The bbs architecture mentioned earlier uses lvs+squid as the front end. In this way, squidclient needs to update all squids when updating the cache, which is very inefficient. Using hash architecture can make squidclient only need to clean one squid at a time, thus greatly improving the efficiency.
2, it is recommended to use nginx manual hash architecture, which can accurately know which machine the link will exist on, so that you can configure an accurate backup machine.
3 Architecture Scheme of Nginx Picture Server
Picture service usually has large data capacity and frequent access. In view of this, picture service will have two problems, one is storage problem and the other is traffic problem.
The storage problem is the capacity of the hard disk. It is ok to spend money on the hard disk. It seems simple, but it is also the most bitter problem.According to the current exploration, the best way is: when there is not enough hard disk space at any time, buy a hard disk to plug in, change the configuration at most, and you can use it immediately;In addition, the hard disk should be fully utilized, otherwise the large amount of picture storage plus backup is horrible, and it is best to use 100% of the space on each hard disk.
Traffic is also a big problem. If the service does not allow Security chain, traffic will cause problems such as bandwidth and server pressure. If you have money, throw it directly to CDN. If you have no money or more money, do it yourself.According to the unchanging truth of "the older the picture, the less the number of visits", it is divided into two parts, one dealing with the latest picture and the other with the old picture.The latest pictures have a large number of visits but less storage.Old pictures have low traffic but large storage.
3.1 Develop a Storage Directory Rule
directory added with one more date under the existing hash method like /a/b/abcde.jpg becomes: /200810/16/a/b/abcde.jpg or/2008/10/16/a/b/abcde.jpg.After the catalogue rules are made by date, the machines can be dismantled by year.
3.2 sub-machine, sub-hard disk
According to the previous plan, it is divided into two groups. One group of servers uses lvs for load balancing to be responsible for the new pictures.Another group of servers do old picture access and backup.New map machine to find a few better servers, Small Computer System Interface;The old graphics machines do not require much, just PCs, just enough hard disks. Now IDE's 1T hard disks are not too expensive either. It is better to build a raid and save trouble. The most important thing is that there are more of these machines.The following figure:
1. The picture service is accessed through lvs, and the processing capacity is still guaranteed.
2. Use nginx to provide direct external services without squid.
3. The red line in the figure means that the main nginx will proxy the pictures of /2006 and /2007 to the two archiving servers respectively. If it is found that Central Processor in the main nginx occupies a large amount of space, it can consider using nginx's proxy_store to store the pictures on the main server and clean them regularly.
4. There is a storage distribution server in the figure, which serves as a unified entrance for picture service to update pictures. If there are new pictures or pictures are modified, this server is responsible for putting the pictures on the correct server.
5. The old picture server is currently divided by year. Two servers are added each year, or two hard disks are added. Note, don't believe in raid, there must be two machines, and geographically it is better to divide into two cities.
6. Since the old data for 2006 and 2007 are basically unchanged, if the hard disk is large enough, then the two-year data can be merged together.
7. If carefully customized, it is possible to fill 100% of the hard disk of the old picture server, the capacity of the old data will not increase substantially, and only 1-2G of space will be reserved.